1-day Spark Structured Streaming 2.2 for Developers Workshop at BeeScala Conference in Ljubljana Slovenia

Jacek Laskowski
2 min readNov 23, 2017

What a long Apache Spark day! The group of 3 Spark developers with me as the instructor started the 1-day Spark Structured Streaming 2.2 workshop right at 9am and finished at 8pm.

That gives 11 hours exclusively with Apache Spark’s brand new stream processing engine Spark Structured Streaming and Scala. I didn’t expect we could’ve spent so long and have covered so much. That was as much exhaustive as exhausting.

Thanks Gorazd, Dinko, Dario and Gordan for bearing with me for so long!

From left: Dinko, Gordan, Jacek, Dario

The whole agenda is available at Spark Structured Streaming in Apache Spark 2.2 Workshop (1 day) for Software Developers. Let me know how to make it better.

As this edition was my very first workshop focusing exclusively on Spark Structured Streaming I found it very inspiring and learnt a lot — not only what topics to cover in the following editions but perhaps more importantly how long the workshop should really take in the future (iff I want to keep it at the most possible in-depth and advanced level).

The list of topics to cover in more depth next time includes:

  1. Watermark (aka allowed lateness)
  2. Exercises with output modes
  3. groupBy and window function
  4. Arbitrary Stateful Streaming Aggregation with flatMapWithState
  5. Developing Custom Streaming Sink
  6. Streaming joins
  7. web UI for Streaming Queries
  8. Query Management API
  9. Explaining Streaming Query Plans

As you may have guessed with the topics included the agenda would easily go above 2 days (with 3 days for the most rewarding experience). Some are more important than others but they do have to be included next time. I’m working on it…

Contact me at jacek@japila.pl if you want to start using Apache Spark (or Spark Structured Streaming in particular) in your project and use it in the most efficient and professional way.

Follow @jaceklaskowski on twitter to learn more about the latest and greatest of Apache Spark. #SparkLikePro

--

--

Jacek Laskowski

Freelance Data Engineer | #ApacheSpark #DeltaLake #Databricks #ApacheKafka #KafkaStreams | Java Champion | @theASF | #DatabricksBeacons