Models in Pipelines

Pipeline are useful for model building

Data transfromation is a key step in any ML problem. It can quickly become cumbersome and error prone if this is done by hand. Scikit-learn gives the pipeline object to solve this problem.

Let’s say in a given problem, we would like to do the following:

  • Impute missing values using the mean
  • Transform features to quadratic
  • Fit a lineear regression

Example

1
2
3
4
5
6
from sklearn.pipeline import make_pipeline

model = make_pipeline(Imputer(strategy='mean'),
                      PolynomialFeatures(degree=2),  
                      LinearRegression())

The pipeline object behaves like a Sklearn object and can take the fit and predict steps.

1
2
model.fit(X, y)
pred = model.predict(X)

We could apply GridSearch for Hyperparameters selection on this pipeline object do get the best performing model.