What is EDA (Exploratory Data Analysis)?
EDA (Exploratory Data Analysis) is a process where we analyze datasets using visual methods. EDA should be performed in order to find the patterns, visual insights etc. There is different library used for EDA. Today we will see one of them is Sweetviz.
Sweetviz
It is python library that generates visualizations to start your EDA with a single line of code. Letβs explore Sweetviz in detail.
Install Sweetviz
Like any other library we can install sweetviz using pip command
pip install sweetviz
Load Data
Today we are going to work with breast cancer dataset. Whose shape is (569,31). In which 569 rows and 31 columns are.
import pandas as pd from sklearn.datasets import load_breast_cancer cancer = load_breast_cancer() df = pd.DataFrame(data = cancer.data, columns=cancer.feature_names) df['target'] = cancer.target df['target'] = df['target'].replace({0:'malignant',1:'benign'})
Sweetviz has 3 main functions for creating reports:
- Analyze
- Compare
- Compare_intra
In Analyze function pass your dataframe. Show_html() for visualizing report or save report. Β
import sweetviz as sv my_report = sv.analyze(df) my_report.show_html('filename.html')
for compare 2 dataframe use function compare. it take 2 dataframe as input
import sweetviz as sv train_data = df[:400] test_data = df[400:] my_report = sv.compare(train_data, test_data) my_report.show_html('filename.html')
compare_intra comapre all data based on particular classes(Boolean) Ex. [Male, Female]
import sweetviz as sv my_report = sv.compare_intra(df, df['target']=='malignant',['malignant','benign']) my_report.show_html('filename.html')
For more information click here
For more blog click here
If you find any issue. Please let us know
πππππππππ