Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Index

Bahir project, Sources and Sinks
batch intervals, Batch Intervals
batch predictions, Predicting
batch serialization, Python debugging
big data ecosystem, Spark's place in, How Spark Fits into the Big Data Ecosystem
Binarizer pipeline stage, explain params, Explain Params
broadcast hash joins, Speeding up joins using a broadcast hash join
- in Spark SQL, Broadcast hash joins
- partial manual join, Partial manual broadcast hash join
broadcast variables, Broadcast Variables
- high availability mode and, High Availability Mode (or Handling Driver Failure or Checkpointing)
broadcasting training models, Preparing textual data
builds, adding Spark SQL and Hive components to regular sbt build, Spark SQL Dependencies

Hadoop
- input formats, using in Spark SQL, Additional formats
- MapReduce, Spark versus, How Spark Works
- Yarn MiniClusters, Yarn MiniCluster
hash partitioning, Partitioners and Key/Value Data
- groupByKeyAndSortValues function, Leveraging repartitionAndSortWithinPartitions for a Group by Key and Sort Values Function
- HashPartitioner object, Hash Partitioning
hashcode function, Custom Partitioning
HashingTF, Preparing textual data
- change in default hashing algorithm, Preparing textual data
- using with Tokenizer in ML library, Data Encoding
HDFS (Hadoop Distributed File System), Docker-based
- RDDs representing HDFS files, Immutability and the RDD Interface
- writing to HDFS store in Parquet format, Data sources
high availability and stream processing, High Availability Mode (or Handling Driver Failure or Checkpointing)
Hive
- enableHiveSupport function, Getting Started with the SparkSession (or HiveContext or SQLContext)
- existing Hive Metastore, connecting to Spark, Managing Spark Dependencies
- HiveServer2, JDBC/ODBC Server
- loading and saving Hive tables in Spark SQL, Hive tables
- Spark SQL dependency, Spark SQL Dependencies
- using plain old SQL queries on data, Plain Old SQL Queries and Interacting with Hive Data
HiveContext, Getting Started with the SparkSession (or HiveContext or SQLContext), Avoiding Hive JARs
- starting JDBC server from existing HiveContext, JDBC/ODBC Server
- using in Spark SQL, Spark SQL Dependencies

Janino, Code Generation
JARs (Java Archives)
- adding to class path, Accessing the backing Java objects and mixing Scala code
- avoiding Hive JARs, Avoiding Hive JARs
- building with sbt-jni, JNI
- for JDBC data sources, JDBC
- Hive JARs for use in Spark SQL, Getting Started with the SparkSession (or HiveContext or SQLContext)
Java, Spark Components
- accessing backing Java objects in PySpark, Accessing the backing Java objects and mixing Scala code
- Iterable versus Iterator objects, Space and Time Advantages
- iterator implementation, java.util.Iterator, What Is an Iterator-to-Iterator Transformation?
- object serialization, Tungsten versus, Tungsten
- RDDs composed of Java objects, converting to DataFrames, RDDs
- Scala API versus Java API, The Spark Scala API Is Easier to Use Than the Java API
- simple Java JNI, JNI
- System.loadLibrary function, JNI
- writing Spark code in, Beyond Scala within the JVM
  - converting RDDs between Scala and Java, Beyond Scala within the JVM
  - Java APIs, Beyond Scala within the JVM
  - Spark SQL and ML pipeline APIs, Beyond Scala within the JVM
  - word count program example, Beyond Scala within the JVM
Java Native Access (JNA), Java Native Access (JNA)
Java Native Interface (JNI), Going Beyond Scala, JNI-Java Native Access (JNA)
java.util.Properties object, JDBC
JavaBeans, RDDs composed of, converting to DataFrames, RDDs
JavaConverters object, Beyond Scala within the JVM
JavaDoubleRDD, Beyond Scala within the JVM
javah command, JNI
JavaPairRDD, Beyond Scala within the JVM
JavaRDD class, Types of RDDs
JavaScript, Eclair JS, Going Beyond Scala, How Eclair JS Works
JBLAS library, JNI
JDBC
- data source for Spark SQL, JDBC
- JDBC/ODBC server in Spark SQL, JDBC/ODBC Server
JdbcDialect, JDBC
JDWP (Java Debug Wire Protocol), Attaching debuggers
JNI (see Java Native Interface)
jobs
- anatomy of Spark jobs, The Anatomy of a Spark Job-Tasks
  - DAG (directed acyclic graph), The DAG
  - jobs, Jobs
  - Spark application tree, The Anatomy of a Spark Job
  - stages, Stages
  - tasks, Tasks
- performance testing, Projects for Verifying Performance
- scheduling in Spark, Spark Job Scheduling-The Anatomy of a Spark Job, Noisy Cluster Considerations
  - default scheduler, Default Spark Scheduler
  - in Spark application, The Spark Application
  - resource allocation across applications, Resource Allocation Across Applications
- validation of, Job Validation
join function, Wide Versus Narrow Dependencies
joins, Joins (SQL and Core)-Conclusion, Implications for Performance
- co-located and co-partitioned RDDs, Leveraging Co-Located and Co-Partitioned RDDs
- core Spark, Core Spark Joins-Partial manual broadcast hash join
  - choosing a join type, Choosing a Join Type
  - choosing an execution plan, Choosing an Execution Plan
  - speeding up by assigning a known partitioner, Speeding up joins by assigning a known partitioner
  - speeding up by using broadcast hash join, Speeding up joins using a broadcast hash join
- implementation by cogroup function, Co-Grouping
- Spark SQL, Spark SQL Joins-Dataset Joins
  - broadcast hash joins, Broadcast hash joins
  - DataFrame joins, DataFrame Joins
  - Dataset joins, Dataset Joins
  - self joins in DataFrames, Self joins
JPMML evaluator project, PMML, General Serving Considerations
JSON, Using Pipe and Friends
- equivalent Spark SQL schema, Basics of Schemas
- loading and writing in Spark SQL, JSON
- loading JSON data in Spark SQL, Avoiding Hive JARs
- toJson function and streaming Datasets, Considerations for Structured Streaming
Julia (Spark.jl), Spark.jl (Julia Spark)
Jupyter notebook, Debugging in notebooks
JVMs (Java Virtual Machines), The Spark Application
- including dependencies using Spark Packages, PySpark dependency management
- Tungsten and, Tungsten
- user-defined functions written in non-JVM languages, Extending with User-Defined Functions and Aggregate Functions (UDFs, UDAFs)
- using langauges other than Scala in, Beyond Scala within the JVM-Beyond Scala within the JVM
Jython, PySpark and, PySpark DataFrames and Datasets

queries
- cutting large DataFrame query plans with Python, Accessing the backing Java objects and mixing Scala code
- debugging Spark SQL queries, Debugging Spark SQL Queries
- evil stream query manager example, Machine learning with Structured Streaming
query optimizer, Query Optimizer
- (see also Catalyst query optimizer)
QueueStreams, Sources and Sinks

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

Symbols

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

Y

Z

Table of Contents for
Index