As mentioned earlier, we can create DataFrame using external data source APIs as well. For the following example, we used com.databricks.spark.csv API as follows:
flightDF<- read.df(dataPath,
header='true',
source = "com.databricks.spark.csv",
inferSchema='true')
Let's see the structure by exploring the schema of the DataFrame:
printSchema(flightDF)
The output is as follows:
Figure 23: The same schema of the NYC flight dataset using external data source API
Now let's see the first 10 rows of the DataFrame:
showDF(flightDF, numRows = 10)
The output is as follows:
Figure 24: Same sample data from NYC flight dataset using external data source API
So, you can see the same structure. Well done! Now it's time to explore something more, such as data manipulation using SparkR.