.repartition(...) transformation

The repartition(n) transformation repartitions the RDD into n partitions by randomly reshuffling and uniformly distributing data across the network. As noted in the preceding recipes, this can improve performance by running more parallel threads concurrently. Here's a code snippet that does precisely that:

# The flights RDD originally generated has 2 partitions 
flights.getNumPartitions()

# Output
2

# Let's re-partition this to 8 so we can have 8
# partitions
flights2 = flights.repartition(8)

# Checking the number of partitions for the flights2 RDD
flights2.getNumPartitions()

# Output
8
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset