The .withColumn(...) transformation

The .withColumn(...) transformation applies a function to some other columns and/or literals (using the .lit(...) method) and stores it as a new function. In SQL, this could be any method that applies any transformation to any of the columns and uses AS to assign a new column name. This transformation extends the original DataFrame.

Look at the following code snippet:

# split the HDD into size and type

(
sample_data_schema
.withColumn('HDDSplit', f.split(f.col('HDD'), ' '))
.show()
)

It produces the following output:

You could achieve the same result with the .select(...) transformation. The following code will produce the same result:

# do the same as withColumn

(
sample_data_schema
.select(
f.col('*')
, f.split(f.col('HDD'), ' ').alias('HDD_Array')
).show()
)

The SQL (T-SQL) equivalent would be:

SELECT *
, STRING_SPLIT(HDD, ' ') AS HDD_Array
FROM sample_data_schema
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset