Airflow is, without a doubt, a very valid platform for building analytics. We can also implement our analytics on the most common Apache Big Data platforms. Here is a short list of them:
- Apache Flink: Apache Flink is a data-stream processor with great support for streaming analytics. The major advantage of Flink is that it has stateful support.
- Apache Flume: Flume is normally used for data-ingestion and Extract, Transform and Load (ETL) in the Hadoop Distributed File System (HDFS).
- Apache Storm: Apache Storm is a distributed computational processing system with good support for real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.
- Apache Beam: Apache Beam is an abstraction API with support for Apache Flink, Apache Apex, and Google Cloud Dataflow.
- Apache Spark: Apache Spark is the most popular framework for big data and stream processing. Apache Spark can run on a Yet Another Resource Negotiator (YARN) cluster, a native cluster, or Mesos. Several IoT cloud platforms have support for Spark, including AWS, Azure, Google, Predix, and IBM Bluemix.