Lambda architecture

A Lambda architecture attempts to balance latency with throughput. Essentially, it mixes batch processing with stream processing. Similar to the general cloud topology of OpenStack or other cloud frameworks, Lambda ingests and stores to an immutable data repository. There are three layers of the topology:

  • Batch layer: The batch layer is usually based on Hadoop clusters. The batch layer is significantly slower in processing than the stream layer. By sacrificing latency, it maximizes throughput and accuracy.
  • Speed layer: This is the real-time in-memory data stream. The data can be erroneous, missing, and out of order. Apache Spark, as we have seen, is very good at providing a stream processing engine.
  • Service layer: Service layer is where the recombination of batch and stream results are stored, analyzed, and visualized. Typical components of the service layer are: Druid, which provides facilities for combining batch and speed layers; Apache Cassandra for scalable database management; and Apache Hive, for data warehousing.
  • Complexities of a Lambda Architecture. Here, a batch layer migrates data to the HDFS storage and the speed layer is delivered directly to a real-time analysis package via Spark.

Lambda architectures are, by nature, more complex than the other analytics engines. They are hybrid, and add additional complexity and resources to run successfully. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset