Data lake architecture pattern

In established enterprises, the most common business case is to make use of existing data infrastructure along with big data implementations. The data lake architecture pattern provides efficient ways to achieve reusing most of the data infrastructure and, at the same time, get the benefits of big data paradigm shifts.

Data lakes have the following essential characteristics to address:

  • Manage abundant unprocessed data
  • Retain data as long as possible
  • Ability to manage the data transformation
  • Support dynamic schema

The following diagram depicts a data lake pattern implementation. It is getting raw data into data storage from different data sources. Also, the received data needs to be retained as long as possible in the data warehouse. Conditioning is conducted only after a data source has been identified for immediate use in the mainline analytics:

Data lakes provide a mechanism for capturing and exploring potentially useful data without incurring additional transactional systems storage costs, or any conditioning effort to bring data sources into those transactional systems.

Data lake implementation includes HDFS, AWS S3, distributed file systems, and so on. Microsoft, Amazon, EMC, Teradata, and Hortonworks are prominent vendors with data lake implementation among their products and they sell these technologies. Data lakes can also be a cloud Infrastructure as a Service (IaaS).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset