Multidestination pattern

In multisourcing, we saw the raw data ingestion to HDFS, but in most common cases the enterprise needs to ingest raw data not only to new HDFS systems but also to their existing traditional data storage, such as Informatica or other analytics platforms. In such cases, the additional number of data streams leads to many challenges, such as storage overflow, data errors (also known as data regret), an increase in time to transfer and process data, and so on.

The multidestination pattern is considered as a better approach to overcome all of the challenges mentioned previously. This pattern is very similar to multisourcing until it is ready to integrate with multiple destinations (refer to the following diagram). The router publishes the improved data and then broadcasts it to the subscriber destinations (already registered with a publishing agent on the router). Enrichers can act as publishers as well as subscribers:

Deploying routers in the cluster environment is also recommended for high volumes and a large number of subscribers.

The following are the benefits of the multidestination pattern:

  • Highly scalable, flexible, fast, resilient to data failure, and cost-effective
  • Organization can start to ingest data into multiple data stores, including its existing RDBMS as well as NoSQL data stores
  • Allows you to use simple query language, such as Hive and Pig, along with traditional analytics
  • Provides the ability to partition the data for flexible access and decentralized processing
  • Possibility of decentralized computation in the data nodes
  • Due to replication on HDFS nodes, there are no data regrets
  • Self-reliant data nodes can add more nodes without any delay

The following are the impacts of the multidestination pattern:

  • Needs complex or additional infrastructure to manage distributed nodes
  • Needs to manage distributed data in secured networks to ensure data security
  • Needs enforcement, governance, and stringent practices to manage the integrity and consistency of data
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset