Data lakes for big data and machine learning

Storing data at scale is made easy in S3 with its unlimited capacity. It can serve any big data and machine learning workloads with the ability to store data on a temporary or permanent basis.

For example, an insurance company needs to give its agents the ability to estimate the risk involved with a certain insurance contract, based upon the historical data of similar contracts. All the data can be easily pooled into an S3 bucket and the agent can call up visualization tools, like for instance Amazon QuickSight, select the data residing in S3, and deliver powerful ways of visualizing and getting a better understanding of the risk of a certain contract they are dealing with. This can help the insurance company be more responsive and more competitive in the way they provide their services to clients, and can be architected as a very low cost solution compared to traditional enterprise-grade onsite storage systems.

Another example is running training for a machine learning system. Let's say we are designing a neural network that needs to identify an image of a person or an object. That would be its only task, but the images it needs to recognize can be color photographs, black and white sketches, and even stylized images such as artwork. We would need to feed a lot of training data for the machine learning (ML) system to be able to start recognizing, so we can feed that training data to an S3 bucket, use it to train the ML system, and then we can easily delete that data as it is no longer needed with a simple CLI or API command, or a few clicks in the management console, so we can stop paying for the space consumed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset