Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12. Scalable Frameworks

The advent of social networking, interactive media, and deep analysis has caused the amount of data processed daily to skyrocket. For data scientists, it's no longer just a matter of finding the most appropriate and accurate algorithm to mine data; it is also about leveraging multi-core CPU architectures and distributed computing frameworks to solve problems in a timely fashion. After all, how valuable is a data mining application if the model does not scale?

There are many options available to Scala developers to build classification and regression applications for very large datasets. This chapter covers the Scala parallel collections, Actor model, Akka framework, and Apache Spark in-memory clusters. The following topics are covered in this chapter:

An introduction to Scala parallel collections
Evaluation of performance of a parallel collection on multi-core CPUs
The actor model and reactive systems
Clustered and reliable distributed computing using Akka
A design of the computational workflow using Akka routers
An introduction to Apache Spark clustering and its design principles
Using Spark MLlib for clustering
Relative performance tuning and evaluation of Spark
Benefits and limitations of the Apache Spark framework

An overview

The support for distributing and concurrent processing is provided by different stacked frameworks and libraries. Scala concurrent and parallel collections' classes leverage the threading capabilities of the Java virtual machine. Akka.io implements a reliable action model originally introduced as part of the Scala standard library. The Akka framework supports remote actors, routing, load balancing protocols, dispatchers, clusters, events, and configurable mailbox management. This framework also provides support for different transport modes, supervisory strategies, and typed actors. Apache Spark's resilient distributed datasets with advanced serialization, caching, and partitioning capabilities leverage Scala and Akka libraries.

The following stack representation illustrates the interdependencies between frameworks:

The Stack representation of scalable frameworks using Scala

Each layer adds a new functionality to the previous one to increase scalability. The Java virtual machine runs as a process within a single host. Scala concurrent classes support effective deployment of an application by leveraging multicore CPU capabilities without the need to write multithreaded applications. Akka extends the Actor paradigm to clusters with advanced messaging and routing options. Finally, Apache Spark leverages Scala higher-order collection methods and the Akka implementation of the Actor model to provide large-scale data processing systems with better performance and reliability, through its resilient distributed datasets and in-memory persistency.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12. Scalable Frameworks

Create new playlist

Sign In

Sign Up

Chapter 12. Scalable Frameworks

An overview

Table of Contents for
12. Scalable Frameworks