There's more...

Another approach that aims at finding the best performing model is called train-validation split. This method performs a split of the training data into two smaller subsets: one that is use to train the model, and another one that is used to validate whether the model is not overfitting. The split is only performed once, thus in contrast to cross-validation, it is less expensive:

train_v = tune.TrainValidationSplit(
estimator=logReg_obj
, estimatorParamMaps=logReg_grid
, evaluator=logReg_ev
, parallelism=4
)

logReg_modelTrainV = (
train_v
.fit(data_trans.transform(forest_train))

results = logReg_modelTrainV.transform(data_trans_test)

print(logReg_ev.evaluate(results, {logReg_ev.metricName: 'weightedPrecision'}))
print(logReg_ev.evaluate(results, {logReg_ev.metricName: 'weightedRecall'}))
print(logReg_ev.evaluate(results, {logReg_ev.metricName: 'accuracy'}))

The preceding code is not that dissimilar from what we saw with .CrossValidator(...). The only additional parameter we specify for the .TrainValidationSplit(...) method is the level of parallelism that controls how many threads are spun up when you select the best model.

Using the .TrainValidationSplit(...) method produces the same results as the  .CrossValidator(...) approach:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset