Introduction

Big data pushed the boundaries in 2016. It pushed the boundaries of tools, applications, and skill sets. And it did so because it’s bigger, faster, more prevalent, and more prized than ever.

According to O’Reilly’s 2016 Data Science Salary Survey, the top tools used for data science continue to be SQL, Excel, R, and Python. A common theme in recent tool-related blog posts on oreilly.com is the need for powerful storage and compute tools that can process high-volume, often streaming, data. For example, Federico Castanedo’s blog post “Scalable Data Science with R” describes how scaling R using distributed frameworks—such as RHadoop and SparkR—can help solve the problem of storing massive data sets in RAM.

Focusing on storage, more organizations are looking to migrate their data, and storage and compute operations, from warehouses on proprietary software to managed services in the cloud. There is, and will continue to be, a lot to talk about on this topic: building a data pipeline in the cloud, security and governance of data in the cloud, cluster-monitoring and tuning to optimize resources, and of course, the three providers that dominate this area—namely, Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.

In terms of techniques, machine learning and deep learning continue to generate buzz in the industry. The algorithms behind natural language processing and image recognition, for example, are incredibly complex, and their utility, in the enterprise hasn’t been fully realized. Until recently, machine learning and deep learning have been largely confined to the realm of research and academics. We’re now seeing a surge of interest in organizations looking to apply these techniques to their business use case to achieve automated, actionable insights. Evangelos Simoudis discusses this in his O’Reilly blog post “Insightful applications: The next inflection in big data.” Accelerating this trend are open source tools, such as TensorFlow from the Google Brain Team, which put machine learning into the hands of any person or entity who wishes to learn about it.

We continue to see smartphones, sensors, online banking sites, cars, and even toys generating more data, of varied structure. O’Reilly’s Big Data Market report found that a surprisingly high percentage of organizations’ big data budgets are spent on Internet-of-Things-related initiatives. More tools for fast, intelligent processing of real-time data are emerging (Apache Kudu and FiloDB, for example), and organizations across industries are looking to architect robust pipelines for real-time data processing. Which components will allow them to efficiently store and analyze the rapid-fire data? Who will build and manage this technology stack? And, once it is constructed, who will communicate the insights to upper management? These questions highlight another interesting trend we’re seeing—the need for cross-pollination of skills among technical and nontechnical folks. Engineers are seeking the analytical and communication skills so common in data scientists and business analysts, and data scientists and business analysts are seeking the hard-core technical skills possessed by engineers, programmers, and the like.

Data science continues to be a hot field and continues to attract a range of people—from IT specialists and programmers to business school graduates—looking to rebrand themselves as data science professionals. In this context, we’re seeing tools push the boundaries of accessibility, applications push the boundaries of industry, and professionals push the boundaries of their skill sets. In short, data science shows no sign of losing momentum.

In Big Data Now: 2016 Edition, we present a collection of some of the top blog posts written for oreilly.com in the past year, organized around six key themes:

  • Careers in data

  • Tools and architecture for big data

  • Intelligent real-time applications

  • Cloud infrastructure

  • Machine learning: models and training

  • Deep learning and AI

Let’s dive in!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset