Introduction to Akka Streams
If you’re new to the world of Akka, I recommend reading the first part of this series, Introduction to Akka Actors, before continuing. The rest of this article assumes some familiarity with the content outlined in that post, as well as a high-level understanding of Akka.
Why Akka Streams?
If you’re new to the world of stream processing, I recommend reading this blog A Journey into Reactive Streams
First and foremost — why use Streams
? What kind of advantage do they give us over the standard ways (e.g. callbacks) of handling data.
The answer is simple — it abstracts away from the imperative nature of how the data is inputted into the application giving us a declarative way of describing, handling it and hiding details that we don’t care about. Streaming helps you ingest, process, analyze, and store data in a quick and responsive manner.
Actors can be seen as dealing with streams as well: they send and receive series of messages in order to transfer knowledge (or data) from one place to another. It is tedious and error-prone to implement all the proper measures in order to achieve stable streaming between actors, since in addition to sending and receiving we also need to take care to not overflow any buffers or mailboxes in the process. Another problem is that Actor messages can be lost and must be retransmitted in that case. Failure to do so would lead to holes at the receiving side. When dealing with streams of elements of a fixed given type, Actors also do not currently offer good static guarantees that no wiring errors are made: type-safety could be improved in this case.
What is Akka Streams
Akka Streams is a module built on top of Akka Actors to make the ingestion and processing of streams easy. It provides easy-to-use APIs to create streams that leverage the power of the Akka toolkit without explicitly defining actor behaviors and messages. This allows you to focus on logic and forget about all of the boilerplate code required to manage the actor. Akka Streams follows the Reactive Streams manifesto, which defines a standard for asynchronous stream processing. Akka Streams provide a higher-level abstraction over Akka’s existing actor model. The Actor model provides an excellent primitive for writing concurrent, scalable software, but it still is a primitive; it’s not hard to find a few critiques of the model.
Akka Streams is a library to process and transfer a sequence of elements using bounded buffer space. This latter property is what we refer to as boundedness and it is the defining feature of Akka Streams. Translated to everyday terms it is possible to express a chain of processing entities, each executing independently (and possibly concurrently) from the others while only buffering a limited number of elements at any given time. This property of bounded buffers is one of the differences from the actor model, where each actor usually has an unbounded, or a bounded, but dropping mailbox. Akka Stream processing entities have bounded “mailboxes” that do not drop.
Akka Streams in nonblocking that means a certain operation does not hinder the progress of the calling thread, even if it takes a long time to finish the requested operation.
The Akka Streams API is completely decoupled from the Reactive Streams interfaces. While Akka Streams focus on the formulation of transformations on data streams the scope of Reactive Streams is to define a common mechanism of how to move data across an asynchronous boundary without losses, buffering or resource exhaustion.The relationship between these two is that the Akka Streams API is geared towards end-users while the Akka Streams implementation uses the Reactive Streams interfaces internally to pass data between the different operators
Stream Terminology
- Source: This is the entry point to your stream. There must be at least one in every stream.
Source
takes two type parameters. The first one represents the type of data it emits and the second one is the type of the auxiliary value it can produce when ran/materialized. If we don’t produce any we use theNotUsed
type provided byAkka
.There are various ways of creating Source:
val source = Source(1 to 10)
source: akka.stream.scaladsl.Source[Int,akka.NotUsed] = ...val s = Source.single("single element")
s: akka.stream.scaladsl.Source[String,akka.NotUsed] = ...
- Sink: This is the exit point of your stream. There must be at least one in every stream.The
Sink
is the last element of ourStream
. Basically it’s a subscriber of the data sent/processed by aSource
. Usually it outputs its input to some system IO.It is the endpoint of a stream and therefore consumes data. ASink
has a single input channel and no output channel.Sinks
are especially needed when we want to specify the behavior of the data collector in a reusable way and without evaluating the stream
val sink = Sink.fold[Int, Int](0)(_ + _) //Creating a sink
sink:akka.stream.scaladsl.Sink[Int,scala.concurrent.Future[akka.Done]] =
- Flow: The flow is a processing step within the stream. It combines one incoming channel and one outgoing channel as well as some transformation of the messages passing through it.If a
Flow
is connected to aSource
a newSource
is the result. Likewise, aFlow
connected to aSink
creates a newSink
. And aFlow
connected with both aSource
and aSink
results in aRunnableFlow
. Therefore, they sit between the input and the output channel but by themselves do not correspond to one of the flavors as long as they are not connected to either aSource
or aSink
.
val source = Source(1 to 3)
val sink = Sink.foreach[Int](println)val doubler = Flow[Int].map(elem => elem * 2)
doubler: akka.stream.scaladsl.Flow[Int(Source Output),Int(Sink Input),akka.NotUsed(Result of Flow Operation)] =val runnable = source via doubler to sink
Via the via
method we can connect a Source
with a Flow
. We need to specify the input type because the compiler can't infer it for us. As we can already see in this simple example, the flowdouble
is completely independent from any data producers and consumers. They only transform the data and forward it to the output channel. This means that we can reuse a flow among multiple streams.
- ActorMaterializer: To run a stream this is required. It is responsible for creating the underlying actors with the specific functionality you define in your stream. Since ActorMaterializer creates actors, it also needs an ActorSystem. It basically allocates all the necessary resources to run a stream. It is important to remember that even after constructing the stream by connecting all the source, sink and different operators, no data will flow through it until it is materialized
Source can be considered as publisher and Sink as subscriber.
Basics and working with Flows
Back-pressure: A possible problematic scenario is when the Source
produces values too fast for the Sink
to handle and can possibly overwhelm it. As it gets more data that it cannot process at the moment it constantly buffers it for processing in the future.To combat this the Sink
would need to communicate with the Source
to inform it that it should “slow down” with pushing new data until it finishes handling the current batch. This enables a constant size buffer for the Sink
as it will inform the Source
to stop sending new data when it’s not ready. In the context of Akka Streams back-pressure is always understood as non-blocking and asynchronous.
When we talk about asynchronous, non-blocking backpressure we mean that the operators available in Akka Streams will not use blocking calls but asynchronous message passing to exchange messages between each other, and they will use asynchronous means to slow down a fast producer, without blocking its thread.
Graph:A description of a stream processing topology, defining the pathways through which elements shall flow when the stream is running.
RunnableGraph: A Flow that has both ends “attached” to a Source and Sink respectively, and is ready to be run()
.
It is possible to attach a Flow
to a Source
resulting in a composite source, and it is also possible to prepend a Flow
to a Sink
to get a new sink. After a stream is properly terminated by having both a source and a sink, it will be represented by the RunnableGraph
type, indicating that it is ready to be executed.
It is important to remember that even after constructing the RunnableGraph
by connecting all the source, sink and different operators, no data will flow through it until it is materialized.
Basic Code
import akka.{Done, NotUsed}
import akka.actor.ActorSystemimport akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import scala.concurrent.Futureobject StreamExample {def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("Sys")
implicit val materializer = ActorMaterializer() val numbers = 1 to 100
//We create a Source that will iterate over the number sequence
val numberSource: Source[Int, NotUsed] = Source.fromIterator(() => numbers.iterator) //Only let pass even numbers through the Flow
val isEvenFlow: Flow[Int, Int, NotUsed] = Flow[Int].filter((num) => num % 2 == 0)//Create a Source of even random numbers by combining the random number Source with the even number filter Flow
val evenNumbersSource: Source[Int, NotUsed] = numberSource.via(isEvenFlow)
//A Sink that will write its input onto the console
val consoleSink: Sink[Int, Future[Done]] = Sink.foreach[Int](println)//Connect the Source with the Sink and run it using the materializer
evenNumbersSource.runWith(consoleSink) }
In the example above we’ve created the:
ActorSystem
andActorMaterializer
instances that we will use to run theStream
. This is needed becauseAkka Streams
is backed by Akka’s Actor model.Source
based of the static number sequence’s iteratorFlow
that filters that only let’s through even numbersSink
that will print out its input to the console usingprintln
- Complete
Stream
by connectingevenNumbers
withconsoleSink
and running it by usingrunWith.
Conclusion
Streaming is the ultimate game changer for data-intensive systems.It is best suited for big data-based applications.Main goal of Akka Stream is to build concurrent and memory bounded computations. Akka Streams also handles much of the complexity of timeouts, failure, backpressure, and so forth, freeing us to think about the bigger picture of how events flow through our systems.Here we learned more about Source,Sink and Flow and how can we run the stream using materializer.For more details you can follow the documentation: https://doc.akka.io/docs/akka/2.5/stream/index.html
In the next blog we will see more about graph and fan in and fan out functions.