Data ingestion

Data ingestion refers to the process of procuring data into the system. This can be done via manual, semi-automatic, or automatic methods. 

Data ingestion means the process of getting the data into the data system that we are building or using.

In a smaller system, users prefer to have some kind of web form or visual interface that takes input in order to put the data into the system. However, when it comes to a larger system, such as a hospital management system, an airline management system, a government and public record management system, or a social media site, users often prefer to automate the data ingestion process as much as possible. So, when it comes to data ingestion, we need to explore a bunch of questions, such as the following:

  • How many data sources are there?
  • How many large data items are available?
  • Will the number of data sources grow over time?
  • What is the rate at which data will be consumed?

It is quite important to note that the size of an individual record is small, but the volume of data is quite enormous. When it comes to data ingestion, developers like to create a bunch of policies, called ingestion policies, that guide the handling of errors during the data ingestion, as well as the data incompleteness, and so on. Data ingestion (along with its policies) is an integral part of a big data system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset