eda_mds.cat_var_stats
Module Contents
Functions
|
Generate summary statistics for categorical variables in a DataFrame. |
- eda_mds.cat_var_stats.cat_var_stats(df, binning_threshold=2)[source]
Generate summary statistics for categorical variables in a DataFrame.
This function analyzes categorical columns in the provided DataFrame and prints out the number of unique values, the frequency of these values, and gives recommendations for binning low frequency categorical values based on a specified threshold.
- Parameters:
df (pandas.DataFrame) – The DataFrame for which categorical variable stats are calculated.
binning_threshold (int, optional) – The percentage frequency threshold below which categories will be recommended for binning. Default is 2.
- Returns:
The function prints the statistics and returns None.
- Return type:
None
Examples
>>> import pandas as pd >>> df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv') >>> cat_var_stats(df) Column: sex Number of unique values: 2 Frequency of values: male: 64.76% female: 35.24% ------------------------------------