Automated Machine Learning (MLBox)

Automated Machine Learning

Automated Machine Learning provides tools to automatically discover good machine learning model pipelines for a dataset with very little user intervention. There is a large Libraries of automated ML such as Auto-Sklearn, Tree-based Pipeline Optimization Tool (TPOT), mlbox and more….

MLBox

MLBox main package contains 3 sub-packages: preprocessing, optimization and prediction. Each one of them are respectively aimed at reading and preprocessing data, testing or optimizing a wide range of learners and predicting the target on a test dataset.

Installation

Run to install MLBox

pip install mlbox

If you are facing any issue in the installation then go to the link given below.

Installation guide — MLBox Documentation

Code MLBox

We will predict the house price you can download the dataset by going to the link given below.

House Price Dataset | Kaggle

from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *

paths = ["train.csv", "test.csv"] 
target_name = "SalePrice"

data = Reader(sep=",").train_test_split(paths, target_name)

data = Drift_thresholder().fit_transform(data)

opt = Optimiser()
params = {
        'ne__numerical_strategy' : {"space" : [0, 'mean']},
        'ce__strategy' : {"space" : ["label_encoding", "random_projection", "entity_embedding"]},
        'fs__strategy' : {"space" : ["variance", "rf_feature_importance"]},
        'fs__threshold': {"search" : "choice", "space" : [0.1, 0.2, 0.3]},
        'est__strategy' : {"space" : ["LightGBM"]},
        'est__max_depth' : {"search" : "choice", "space" : [5,6]},
        'est__subsample' : {"search" : "uniform", "space" : [0.6,0.9]}
        }
best_params = opt.optimise(params, data, max_evals = 5)

Predictor().fit_predict(best_params, data)

Code Explanation

First, we imported the mlbox package (1, 3). After that give the path of train and test csv file and target column name (5, 6). After that it will split data and clean it and how many categorical features and how many numerical features are there in the data given to you (8). Next in that Automatically drops ids and drifting variables between train and test datasets (10). After that we created the pipeline as we need and we will get the best parameters from here (12, 23). After that we will train the model, it will predict the house price (25). mlbox automatically finds out that we have to apply classification or regression based on our target variable. After training it saves the model in save folder.

Others

For More blog click here

More Info : Preprocessing — MLBox Documentation