When instantiating polynomial features, there are three parameters to keep in mind:
- degree
- interaction_only
- include_bias
Degree corresponds to the degree of the polynomial features, with the default set to two.
interaction_only is a boolean that, when true, only interaction features are produced, meaning features that are products of degree distinct features. The default for interaction_only is false.
include_bias is also a boolean that, when true (default), includes a bias column, the feature in which all polynomial powers are zero, adding a column of all ones.
Let's set up a polynomial feature instance by first importing the class and instantiating with our parameters. At first, let's take a look at what features we get when setting interaction_only to False:
from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
Now, we can fit_transform these polynomial features to our dataset and look at the shape of our extended dataset:
X_poly = poly.fit_transform(X) X_poly.shape
(162501, 9)
Our dataset has now expanded to 162501 rows and 9 columns.
Let's place our data into a DataFrame, setting the column headers to the feature_names, and taking a look at the first few rows:
pd.DataFrame(X_poly, columns=poly.get_feature_names()).head()
This shows us:
x0 |
x1 |
x2 |
x0^2 |
x0 x1 |
x0 x2 |
x1^2 |
x1 x2 |
x2^2 |
|
0 |
1502.0 |
2215.0 |
2153.0 |
2256004.0 |
3326930.0 |
3233806.0 |
4906225.0 |
4768895.0 |
4635409.0 |
1 |
1667.0 |
2072.0 |
2047.0 |
2778889.0 |
3454024.0 |
3412349.0 |
4293184.0 |
4241384.0 |
4190209.0 |
2 |
1611.0 |
1957.0 |
1906.0 |
2595321.0 |
3152727.0 |
3070566.0 |
3829849.0 |
3730042.0 |
3632836.0 |
3 |
1601.0 |
1939.0 |
1831.0 |
2563201.0 |
3104339.0 |
2931431.0 |
3759721.0 |
3550309.0 |
3352561.0 |
4 |
1643.0 |
1965.0 |
1879.0 |
2699449.0 |
3228495.0 |
3087197.0 |
3861225.0 |
3692235.0 |
3530641.0 |