.textFile(...) method

To read the file, we are using SparkContext's textFile() method via this command:

(
    sc
    .textFile(
        '~/data/flights/airport-codes-na.txt'
        , minPartitions=4
        , use_unicode=True
    )
)

Only the first parameter is required, which indicates the location of the text file as per ~/data/flights/airport-codes-na.txt. There are two optional parameters as well:

minPartitions: Indicates the minimum number of partitions that make up the RDD. The Spark engine can often determine the best number of partitions based on the file size, but you may want to change the number of partitions for performance reasons and, hence, the ability to specify the minimum number.
use_unicode: Engage this parameter if you are processing Unicode data.

Note that if you were to execute this statement without the subsequent map() function, the resulting RDD would not reference the tab-delimiter—basically a list of strings that is:

myRDD = sc.textFile('~/data/flights/airport-codes-na.txt')
myRDD.take(5)

# Out[35]:  [u'City	State	Country	IATA', u'Abbotsford	BC	Canada	YXX', u'Aberdeen	SD	USA	ABR', u'Abilene	TX	USA	ABI', u'Akron	OH	USA	CAK']

Table of Contents for .textFile(...) method

Create new playlist

Sign In

Sign Up

Table of Contents for
.textFile(...) method