Explicit schema

A schema is described using StructType, which is a collection of StructField objects.

StructType and StructField belong to the org.apache.spark.sql.types package.
DataTypes such as IntegerType, StringType also belong to the org.apache.spark.sql.types package.

Using these imports, we can define a custom explicit schema.

First, import the necessary classes:

scala> import org.apache.spark.sql.types.{StructType, IntegerType, StringType}
import org.apache.spark.sql.types.{StructType, IntegerType, StringType}

Define a schema with two columns/fields-an Integer followed by a String:

scala> val schema = new StructType().add("i", IntegerType).add("s", StringType)
schema: org.apache.spark.sql.types.StructType = StructType(StructField(i,IntegerType,true), StructField(s,StringType,true))

It's easy to print the newly created schema:

scala> schema.printTreeString
root
 |-- i: integer (nullable = true)
 |-- s: string (nullable = true)

There is also an option to print JSON, which is as follows, using prettyJson function:

scala> schema.prettyJson
res85: String =
{
 "type" : "struct",
 "fields" : [ {
 "name" : "i",
 "type" : "integer",
 "nullable" : true,
 "metadata" : { }
 }, {
 "name" : "s",
 "type" : "string",
 "nullable" : true,
 "metadata" : { }
 } ]
}

All the data types of Spark SQL are located in the package org.apache.spark.sql.types. You can access them by doing:

import org.apache.spark.sql.types._

Table of Contents for Explicit schema

Create new playlist

Sign In

Sign Up

Table of Contents for
Explicit schema