Stream output

Spark provides the so-called "Output Sinks" for streams. The sink defines how and where the stream is written; for example, as a parquet file or as a in-memory table. However, for our application, we will simply show the stream output in the console:

outputDataStream.writeStream.format("console").start().awaitTermination()

The preceding code directly starts the stream processing and waits until the termination of the application. The application simply process every new file in a given folder (in our case, given by the environment variable, `APPDATADIR`). For example, given a file with five loan applications, the stream produces a table with five scored events:

The important part of the event is represented by the last columns, which contain predicted values:

If we write another file with a single loan application into the folder, the application will show another scored batch:

In this way, we can deploy trained models and corresponding data-processing operations and let them score actual events. Of course, we just demonstrated a simple use case; a real-life scenario would be much more complex involving a proper model validation, A/B testing with the currently used models, and the storing and versioning of the models.

Table of Contents for Stream output

Create new playlist

Sign In

Sign Up

Table of Contents for
Stream output