Storm simple patterns

While we work with Storm, there are a variety of patterns you can recognize. In the segment here, without using Trident, we are attempting to capture some common patterns with Storm topologies.

Joins

As the name suggests, this is the most common pattern. The output from two or more different streams is joined on some common field and is emitted as a single tuple. In Storm, that's effectively achieved using fields grouping, which ensures that all the tuples with same field value are emitted to the same task. The following figure and code snippet captures its essence:

 TopologyBuilder builder = new TopologyBuilder();
    builder.setSpout("gender", genderSpout);
    builder.setSpout("age", ageSpout);
    builder.setBolt("join", new SingleJoinBolt(new Fields("gender", "age"))).fieldsGrouping("gender", new Fields("id"))
        .fieldsGrouping("age", new Fields("id"));

That's effectively achieved using the fields grouping, which ensures that all tuples with the same value in a common field are emitted by the join bolt.

Joins

Here, we have two streams arriving from two spouts: the gender spout and age spout. The common field is d, over which the streams are joined by the joiner bolt and emitted as a new stream that has fields as age and gender.

Batching

This is another very common pattern that comes into play when we have to hold and perform processing in batches. Let's take an example to understand it better; I have a Storm application that dumps the tuples into database. For efficient utilization of network bandwidth, I would want the database to write in batches of 100. Unless we are using transactional Trident topologies, the simplest mechanism is to hold the records into a local instance data structure and track the count and write to databases in batches. In general, there are two different variants of batching that can be executed:

  • Count-based batching: Create a batch based on the number of records. A plain count initialized in the prepare method that's incremented in the execute method on arrival of the tuple could be used for tracking of the same.
  • Time-based batching: Create a batch based on time. Let's say a batch of five minutes, and if we want to keep the implementation as simple we would create a mechanism to emit a tick tuple to the topology every five minutes.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset