EDA using pandas-profiling

EDA(Exploratory Data Analysis)

EDA (Exploratory Data Analysis) is a process where we analyze datasets using visual methods. EDA should be performed in order to find the patterns, visual insights etc. There is different library used for EDA. pandas-profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame.

source: https://github.com/pandas-profiling/pandas-profiling

Install pandas-profiling

pip install pandas-profiling[notebook]

or

pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Load Data

Here we will be using the cancer dataset. Whose shape is (569,31). In which 569 rows and 31 columns are.

from sklearn.datasets import load_breast_cancer
import pandas as pd
cancer = load_breast_cancer()
df = pd.DataFrame(data = cancer.data, columns=cancer.feature_names)
df['target'] = cancer.target
df['target'] = df['target'].replace({0:'malignant',1:'benign'})

To generate report,

from pandas_profiling import ProfileReport
profile = ProfileReport(df, title="Pandas Profiling Report")

Explorative configuration, that includes many features for text, files and images.

from pandas_profiling import ProfileReport
profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True)

To save report in HTML file,

profile.to_file("cancer_data_report.html")

Resources

For more blog click here

Feel free to share if you find any issue.👍👍👍👍

Leave a Reply