Automated Machine Learning
Automated Machine Learning provides tools to automatically discover good machine learning model pipelines for a dataset with very little user intervention. There is a large Libraries of automated ML such as Auto-Sklearn, Tree-based Pipeline Optimization Tool (TPOT), mlbox and more….
MLBox
MLBox main package contains 3 sub-packages: preprocessing, optimization and prediction. Each one of them are respectively aimed at reading and preprocessing data, testing or optimizing a wide range of learners and predicting the target on a test dataset.
Installation
Run to install MLBox
- pip install mlbox
If you are facing any issue in the installation then go to the link given below.
Installation guide — MLBox Documentation
Code MLBox
We will predict the house price you can download the dataset by going to the link given below.
from mlbox.preprocessing import * from mlbox.optimisation import * from mlbox.prediction import * paths = ["train.csv", "test.csv"] target_name = "SalePrice" data = Reader(sep=",").train_test_split(paths, target_name) data = Drift_thresholder().fit_transform(data) opt = Optimiser() params = { 'ne__numerical_strategy' : {"space" : [0, 'mean']}, 'ce__strategy' : {"space" : ["label_encoding", "random_projection", "entity_embedding"]}, 'fs__strategy' : {"space" : ["variance", "rf_feature_importance"]}, 'fs__threshold': {"search" : "choice", "space" : [0.1, 0.2, 0.3]}, 'est__strategy' : {"space" : ["LightGBM"]}, 'est__max_depth' : {"search" : "choice", "space" : [5,6]}, 'est__subsample' : {"search" : "uniform", "space" : [0.6,0.9]} } best_params = opt.optimise(params, data, max_evals = 5) Predictor().fit_predict(best_params, data)
Code Explanation
First, we imported the mlbox package (1, 3). After that give the path of train and test csv file and target column name (5, 6). After that it will split data and clean it and how many categorical features and how many numerical features are there in the data given to you (8). Next in that Automatically drops ids and drifting variables between train and test datasets (10). After that we created the pipeline as we need and we will get the best parameters from here (12, 23). After that we will train the model, it will predict the house price (25). mlbox automatically finds out that we have to apply classification or regression based on our target variable. After training it saves the model in save folder.
Others
For More blog click here
More Info : Preprocessing — MLBox Documentation