EDA(Exploratory Data Analysis)
EDA (Exploratory Data Analysis) is a process where we analyze datasets using visual methods. EDA should be performed in order to find the patterns, visual insights etc. There is different library used for EDA. pandas-profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame.
Install pandas-profiling
pip install pandas-profiling[notebook]
or
pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
Load Data
Here we will be using the cancer dataset. Whose shape is (569,31). In which 569 rows and 31 columns are.
from sklearn.datasets import load_breast_cancer import pandas as pd cancer = load_breast_cancer() df = pd.DataFrame(data = cancer.data, columns=cancer.feature_names) df['target'] = cancer.target df['target'] = df['target'].replace({0:'malignant',1:'benign'})
To generate report,
from pandas_profiling import ProfileReport profile = ProfileReport(df, title="Pandas Profiling Report")
Explorative configuration, that includes many features for text, files and images.
from pandas_profiling import ProfileReport profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True)
To save report in HTML file,
profile.to_file("cancer_data_report.html")
Resources
- https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/getting_started.html
- https://github.com/pandas-profiling/pandas-profiling
For more blog click here
Feel free to share if you find any issue.👍👍👍👍