In the third post of this series I will share operational and practical issues
that I have encountered when developing and hosting long-running Spark
streaming applications. This includes tips that will be useful to those who are
first starting out with Scala and Spark, as well as insights on performance and
reliability that will be useful to those who are more experienced.
In the second post of this series I explore the strengths and weaknesses of
several popular streaming frameworks. This analysis was performed a couple of
years ago with a particular application in mind. These frameworks have since
improved, but this post should provide some insight into the tradeoffs and
decisions involved when designing streaming applications, and lessons can be
learned from choices that did and did not pay off.
I recently spent two weeks tracking down a subtle bug in a Spark Structured
Streaming application which I have been maintaining for several years. Having
dealt with many such time-consuming bugs over the years, I’ve decided to
compile my experiences working with ordered, stateful streaming applications
into a series of posts. This series will serve as an introductory guide to the
design and operation of stateful streaming pipelines, and hopefully spur some
further development to simplify this process in the future.
Over the last few years I have iterated several times on continuous delivery
pipelines for Rust applications. Designing these pipelines involves balancing
a number of factors including cost, complexity, ergonomics, and rigor. In this
post I will describe several of these iterations, lessons learned, and share my
most recent solution in detail.
My recent guest post on Deis Labs' Blog offers a deep dive into my work on Krustlet, including the design and implementation of a flexible state machine API for specifying custom Kubelet behavior in a type-safe framework.