The corrr package in R: A powerful tool for exploring correlations

It is a tool for exploring correlations, and it makes it possible to easily perform routine tasks when exploring correlation matrices.

Author
Affiliation

AgriTech Bolivia

Published

September 20, 2023

Introduction

Correlations are a fundamental tool for data analysis, and they can be used to measure the strength and direction of the relationship between two variables. The corrr package in R is a powerful tool for exploring correlations, and it makes it possible to easily perform routine tasks when exploring correlation matrices, such as:

  • Ignoring the diagonal
  • Focusing on the correlations of certain variables against others
  • Rearranging and visualizing the matrix in terms of the strength of the correlations

The corrr package also provides a number of functions for working with correlation data frames, including:

  • Filtering and sorting
  • Computing descriptive statistics
  • Creating plots

Body

This blog post will provide an overview of how to use the corrr package in R to explore correlations. We will cover the following topics:

  • Installing and loading the corrr package
  • Creating a correlation data frame
  • Filtering and sorting the correlation data frame
  • Computing descriptive statistics for the correlation data frame
  • Creating plots of the correlation data frame

Conclusion

The corrr package is a powerful tool for exploring correlations in R. It is highly recommended for anyone who works with correlation data.

Example:

Here is an example of how to use the corrr package to create a heatmap of the correlation matrix for the mtcars dataset:

# Install the corrr package
install.packages("corrr")
# Load the corrr package
library(tidyverse)
library(corrr)

Data

data("airquality")
head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Compute correlation matrix

res.cor = correlate(airquality)
Correlation computed with
• Method: 'pearson'
• Missing treated using: 'pairwise.complete.obs'
res.cor
# A tibble: 6 × 7
  term      Ozone Solar.R    Wind   Temp    Month      Day
  <chr>     <dbl>   <dbl>   <dbl>  <dbl>    <dbl>    <dbl>
1 Ozone   NA       0.348  -0.602   0.698  0.165   -0.0132 
2 Solar.R  0.348  NA      -0.0568  0.276 -0.0753  -0.150  
3 Wind    -0.602  -0.0568 NA      -0.458 -0.178    0.0272 
4 Temp     0.698   0.276  -0.458  NA      0.421   -0.131  
5 Month    0.165  -0.0753 -0.178   0.421 NA       -0.00796
6 Day     -0.0132 -0.150   0.0272 -0.131 -0.00796 NA      

fashion(), this function is for pleasant correlation viewing:

res.cor %>%
  fashion()
     term Ozone Solar.R Wind Temp Month  Day
1   Ozone           .35 -.60  .70   .16 -.01
2 Solar.R   .35         -.06  .28  -.08 -.15
3    Wind  -.60    -.06      -.46  -.18  .03
4    Temp   .70     .28 -.46        .42 -.13
5   Month   .16    -.08 -.18  .42       -.01
6     Day  -.01    -.15  .03 -.13  -.01     

Create a correlation network

The R function network_plot() can be used to visualize and explore correlations.

airquality %>%
  correlate() %>%
  network_plot(min_cor = 0.3)
Correlation computed with
• Method: 'pearson'
• Missing treated using: 'pairwise.complete.obs'

The option min_cor indicates the required minimum correlation value for a correlation to be plotted.

Each point reprents a variable. Variable that are highly correlated are clustered together. The positioning of variables is handled by multidimensional scaling of the absolute values of the correlations.

For example, it can be seen from the above plot that the variables Ozone, Wind and Temp are clustering together (which makes sense).

Each path represents a correlation between the two variables that it joins. Blue color represents a positive correlation, and a red color corresponds to a negative correlation.

The width and transparency of the path represent the strength of the correlation (wider and less transparent = stronger correlation).

For example, it can be seen that the positive correlation between Ozone and Temp is stronger than the positive correlation between Ozone and Solar.R.

Cleaning up the correlation network

We can clean this up by increasing the min_cor, thus plotting fewer correlation paths:

mtcars %>%
  correlate() %>%
  network_plot(min_cor = .7)
Correlation computed with
• Method: 'pearson'
• Missing treated using: 'pairwise.complete.obs'

Citation

BibTeX citation:
@online{santos2023,
  author = {Santos, Franklin},
  title = {The Corrr Package in {R:} {A} Powerful Tool for Exploring
    Correlations},
  date = {2023-09-20},
  url = {https://franklinsantosm.com/posts/corrr},
  langid = {en}
}
For attribution, please cite this work as:
Santos, Franklin. 2023. “The Corrr Package in R: A Powerful Tool for Exploring Correlations.” September 20, 2023. https://franklinsantosm.com/posts/corrr.