Here's the snippet to create the regression RDD of labeled points that we will be using to predict the number of hours people work:
mu, std = sModel.mean[3], sModel.std[3]
final_data_hours = ( final_data .map(lambda row: reg.LabeledPoint( row[1][3] * std + mu , ln.Vectors.dense([row[0]] + list(row[1][0:3]) + list(row[1][4:])) ) )