Other open source technologies for analytics

Airflow is, without a doubt, a very valid platform for building analytics. We can also implement our analytics on the most common Apache Big Data platforms. Here is a short list of them:

  • Apache Flink: Apache Flink is a data-stream processor with great support for streaming analytics. The major advantage of Flink is that it has stateful support.
  • Apache Flume: Flume is normally used for data-ingestion and Extract, Transform and Load (ETL) in the Hadoop Distributed File System (HDFS).
  • Apache Storm: Apache Storm is a distributed computational processing system with good support for real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.
  • Apache Beam: Apache Beam is an abstraction API with support for Apache Flink, Apache Apex, and Google Cloud Dataflow.
  • Apache Spark: Apache Spark is the most popular framework for big data and stream processing. Apache Spark can run on a Yet Another Resource Negotiator (YARN) cluster, a native cluster, or Mesos. Several IoT cloud platforms have support for Spark, including AWS, Azure, Google, Predix, and IBM Bluemix.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset