About webinar “Building Real-Time Data Pipelines with Spark Streaming, Kafka, and Cassandra”
I’ve just finished watching the recording of the webinar Building Real-Time Data Pipelines with Spark Streaming, Kafka, and Cassandra and with a great sadness I wouldn’t recommend it. It takes 60 minutes while 20 might’ve been right.
The title and the slides were so catchy that I couldn’t believe to have seen so little during the 60-minute webinar. It’s perhaps me who needs much much more Scala code that would showcase the integration between the triple As of Apache Kafka, Apache Spark and Apache Cassandra.
I really wished to have learnt more!
Below is a short overview of the webinar and the more interesting places with their times. My advice is to skip the other times for other activities.
9:54 is where the fun begins, but finishes quite quickly at 16:19. Alas, it’s shallow and very light.
21:45 starts the show by Nanda Vijaydev. She’s talking about different use cases and how the triple As fit them.
27:42 You may start right here to see the example stack for real-time pipeline with Kafka, Spark and Cassandra.
29:13 starts Kafka chapter.
At long last, 35:19 shows the code!!! It takes just 6 minutes (to 41:34) when Nanda stops presenting the code :( Too bad it passed by so quickly.
50:12 starts Q&A
Apache Flink as an alternative solution for streaming was mentioned merely two or three times (yet Apache Flume had more exposure — not that they’re similar, but their potential use cases seem to place Flink in a better position).
You’d be better off spending your time elsewhere…On to other Spark-related materials.