Best practices for algorithms

The choice of which algorithm to deploy to answer a business question depends on a variety of parameters, and there is no one good answer. The choice of algorithm generally depends on the nature of the predictor and output variables; also, the overarching nature of the business problem at hand—whether it is a numerical prediction, classification, or an aggregation problem. Based on these preliminary criteria, one can shortlist a few existing methods to apply on the dataset.

Each method will have its own pros and cons, and the final decision should be taken keeping in mind the business context. The decision for the best-suited algorithm is usually taken based on the following two requirements:

  • Sometimes, the user of the result is interested only in the accuracy of the results. In such cases, the choice of the algorithm is done based on the accuracy of the algorithms. All the qualifying models are run and the one with the maximum accuracy is finalized.
  • At other times, the user is interested in knowing the details of the algorithms as well. In such cases, the complexity of the algorithm also becomes a concern. The selected algorithms shouldn't be too complex to explain to the user and should also be decently accurate.

The following table summarizes the algorithms that should be chosen depending upon the type of predictor and outcome variables and the question needed to be answered in the business context:

Type of variables

Business contexts/questions

Algorithm/Model

A continuous numerical variable as an output variable; a mix of categorical and numerical variables as predictor variables.

To answer quantifiable questions such as how many, how much, and so on.

Linear regression, polynomial regression, and regression tree.

A binary or categorical variable as an output variable; a mix of categorical and numerical variables as predictor variables.

Classification problems. To answer questions with yes/no, fail/success, and 0/1 answers.

Logistic regression.

No output variable; a mix of categorical and numerical variables as predictor variables.

Grouping/aggregation and targeted marketing. To answer what data points are similar to each other? How many such groups can be created? These groups are earlier non-existent.

Clustering and segmentation.

A categorical or numerical variable as an output variable; a mix of categorical and numerical variables as predictor variables.

Classification problems. Classifying data points into already existing groups.

Decision Trees, k-Nearest Neighbor, Bayes' Classifier, Support Vector Machines, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset