eda_mds.cor_eda

Module Contents

Functions

cor_eda(dataset[, na_handling])

Calculate the correlation between numerical variables in a DataFrame.

eda_mds.cor_eda.cor_eda(dataset, na_handling='drop')[source]

Calculate the correlation between numerical variables in a DataFrame.

This function processes a given DataFrame to isolate numerical variables, handles missing values according to the specified method, calculates the correlation between each pair of numerical variables, and returns the results in a new DataFrame.

Parameters:
  • dataset (DataFrame) – The DataFrame to be analyzed. It should include a variety of variable types.

  • na_handling (str, optional) – Method for handling missing values (NAs). The following options are available: - ‘drop’: Drop rows with any NAs (default). - ‘mean’: Replace NAs with the mean value of the column. - ‘median’: Replace NAs with the median value of the column.

Returns:

A DataFrame containing the correlation coefficients between each pair of numerical variables.

Return type:

DataFrame

Examples

>>> cor_eda(data, na_handling='mean')
         age    salary
age     1.0000   0.9769
salary  0.9769   1.0000