Parallelizing a collection can be done by calling parallelize() on the collection inside the driver program. The driver, when it tries to parallelize a collection, splits the collection into partitions and distributes the data partitions across the cluster.
The following is an RDD to create an RDD from a sequence of numbers using the SparkContext and the parallelize() function. The parallelize() function essentially splits the Sequence of numbers into a distributed collection otherwise known as an RDD.
scala> val rdd_one = sc.parallelize(Seq(1,2,3))
rdd_one: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24
scala> rdd_one.take(10)
res0: Array[Int] = Array(1, 2, 3)