eda_mds.cat_var_stats

Module Contents

Functions

cat_var_stats(df[, binning_threshold])

Generate summary statistics for categorical variables in a DataFrame.

eda_mds.cat_var_stats.cat_var_stats(df, binning_threshold=2)[source]

Generate summary statistics for categorical variables in a DataFrame.

This function analyzes categorical columns in the provided DataFrame and prints out the number of unique values, the frequency of these values, and gives recommendations for binning low frequency categorical values based on a specified threshold.

Parameters:
  • df (pandas.DataFrame) – The DataFrame for which categorical variable stats are calculated.

  • binning_threshold (int, optional) – The percentage frequency threshold below which categories will be recommended for binning. Default is 2.

Returns:

The function prints the statistics and returns None.

Return type:

None

Examples

>>> import pandas as pd
>>> df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
>>> cat_var_stats(df)
Column: sex
Number of unique values: 2
Frequency of values:
male: 64.76%
female: 35.24%
------------------------------------