My very first Kafka Streams Workshop

Jacek Laskowski
3 min readOct 2, 2018

I could not have dreamt of a better shout-out after my recent work with Kafka Streams that I’ve just received from the one and only Gwen Shapira from Confluent (the company behind Apache Kafka and Kafka Streams).

I’m an independent consultant specialising in Apache Spark with some focus on the tools people use with it. Fairly often it is Apache Kafka that is the “shock absorber” for the large amount of events or simply the “storage” and so over time Kafka has found a special place in my heart. That’s how I got interested in Kafka Streams (that is an integral part of the Apache Kafka project) given similar solutions like Spark Structured Streaming or even Apache Flink. Apache Spark has the Spark Structured Streaming module for much the same technical reasons like Apache Kafka has Kafka Streams, i.e. stream processing.

I’ve been doing consulting for Kafka and Kafka Streams for quite some time, mostly with proof-of-concept (PoC) kind of things. And what has always been surprising me the three PoCs I was part of were all in Poland (Bydgoszcz, Lodz and Warsaw). I once had a conversation about a potential project abroad in which Kafka Streams was the stream processing library, but they wanted someone with experience in Kafka Streams teaching and in the end they decided to go the more official route (and spoke to Confluent).

At long last, the dreams came true and I got an inquiry about a Kafka Streams workshop. I then put exploration of Apache Spark on hold and focused on Kafka Streams exclusively with the aim of getting a better understanding of the library.

Just before I started my 2-day Kafka Streams workshop, Bill Bejeck has released the Kafka Streams book with Manning.

That helped me to deliver a much better workshop, but still felt uneasy with the internals, esp. the stateful processing (state stores and time-windowed aggregations) were the most missing parts that turned out fairly important for the participants.

With that gap discovered I decided to focus on time-windowed aggregations and state stores completely for the whole two weeks (after the Kafka Streams workshop had finished). I still have some open questions, but think I’m much better prepared to explain the gory details of the stateful processing topic (I’m yet to experience it in real life though).

As you may have heard, it’s Spark+AI Summit time. The conference is in London UK this week where I have two talks about (the internals of) bucketing support and query execution in Spark SQL 2.3.1.

And just when I switched focus from Kafka Streams to Spark SQL, I found the tweet from Gwen. Try to imagine how I felt about it. These kind words from Gwen were exactly after this fairly intense time with Kafka Streams. I can now feel much better prepared (empowered) for future Kafka Streams activities! Thanks Gwen and thanks the Kafka community for Kafka and Kafka Streams.

On to Apache Spark, Spark SQL and London! It’s Spark time now! It seems the whole month even! Couldn’t be happier! Wish you so much happiness too.

--

--

Jacek Laskowski

Freelance Data Engineer | #ApacheSpark #DeltaLake #Databricks #ApacheKafka #KafkaStreams | Java Champion | @theASF | #DatabricksBeacons