eda_mds.cor_eda
Module Contents
Functions
|
Calculate the correlation between numerical variables in a DataFrame. |
- eda_mds.cor_eda.cor_eda(dataset, na_handling='drop')[source]
Calculate the correlation between numerical variables in a DataFrame.
This function processes a given DataFrame to isolate numerical variables, handles missing values according to the specified method, calculates the correlation between each pair of numerical variables, and returns the results in a new DataFrame.
- Parameters:
dataset (DataFrame) – The DataFrame to be analyzed. It should include a variety of variable types.
na_handling (str, optional) – Method for handling missing values (NAs). The following options are available: - ‘drop’: Drop rows with any NAs (default). - ‘mean’: Replace NAs with the mean value of the column. - ‘median’: Replace NAs with the median value of the column.
- Returns:
A DataFrame containing the correlation coefficients between each pair of numerical variables.
- Return type:
DataFrame
Examples
>>> cor_eda(data, na_handling='mean') age salary age 1.0000 0.9769 salary 0.9769 1.0000