# Install the corrr package
install.packages("corrr")
Introduction
Correlations are a fundamental tool for data analysis, and they can be used to measure the strength and direction of the relationship between two variables. The corrr package in R is a powerful tool for exploring correlations, and it makes it possible to easily perform routine tasks when exploring correlation matrices, such as:
- Ignoring the diagonal
- Focusing on the correlations of certain variables against others
- Rearranging and visualizing the matrix in terms of the strength of the correlations
The corrr package also provides a number of functions for working with correlation data frames, including:
- Filtering and sorting
- Computing descriptive statistics
- Creating plots
Body
This blog post will provide an overview of how to use the corrr package
in R to explore correlations. We will cover the following topics:
- Installing and loading the corrr package
- Creating a correlation data frame
- Filtering and sorting the correlation data frame
- Computing descriptive statistics for the correlation data frame
- Creating plots of the correlation data frame
Conclusion
The corrr package is a powerful tool for exploring correlations in R. It is highly recommended for anyone who works with correlation data.
Example:
Here is an example of how to use the corrr package to create a heatmap of the correlation matrix for the mtcars dataset:
# Load the corrr package
library(tidyverse)
library(corrr)
Data
data("airquality")
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Compute correlation matrix
= correlate(airquality) res.cor
Correlation computed with
• Method: 'pearson'
• Missing treated using: 'pairwise.complete.obs'
res.cor
# A tibble: 6 × 7
term Ozone Solar.R Wind Temp Month Day
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Ozone NA 0.348 -0.602 0.698 0.165 -0.0132
2 Solar.R 0.348 NA -0.0568 0.276 -0.0753 -0.150
3 Wind -0.602 -0.0568 NA -0.458 -0.178 0.0272
4 Temp 0.698 0.276 -0.458 NA 0.421 -0.131
5 Month 0.165 -0.0753 -0.178 0.421 NA -0.00796
6 Day -0.0132 -0.150 0.0272 -0.131 -0.00796 NA
fashion()
, this function is for pleasant correlation viewing:
%>%
res.cor fashion()
term Ozone Solar.R Wind Temp Month Day
1 Ozone .35 -.60 .70 .16 -.01
2 Solar.R .35 -.06 .28 -.08 -.15
3 Wind -.60 -.06 -.46 -.18 .03
4 Temp .70 .28 -.46 .42 -.13
5 Month .16 -.08 -.18 .42 -.01
6 Day -.01 -.15 .03 -.13 -.01
Create a correlation network
The R function network_plot()
can be used to visualize and explore correlations.
%>%
airquality correlate() %>%
network_plot(min_cor = 0.3)
Correlation computed with
• Method: 'pearson'
• Missing treated using: 'pairwise.complete.obs'
The option min_cor
indicates the required minimum correlation value for a correlation to be plotted.
Each point reprents a variable. Variable that are highly correlated are clustered together. The positioning of variables is handled by multidimensional scaling of the absolute values of the correlations.
For example, it can be seen from the above plot that the variables Ozone
, Wind
and Temp
are clustering together (which makes sense).
Each path represents a correlation between the two variables that it joins. Blue color represents a positive correlation, and a red color corresponds to a negative correlation.
The width and transparency of the path represent the strength of the correlation (wider and less transparent = stronger correlation).
For example, it can be seen that the positive correlation between Ozone
and Temp
is stronger than the positive correlation between Ozone
and Solar.R
.
Cleaning up the correlation network
We can clean this up by increasing the min_cor
, thus plotting fewer correlation paths:
%>%
mtcars correlate() %>%
network_plot(min_cor = .7)
Correlation computed with
• Method: 'pearson'
• Missing treated using: 'pairwise.complete.obs'
Citation
@online{santos2023,
author = {Santos, Franklin},
title = {The Corrr Package in {R:} {A} Powerful Tool for Exploring
Correlations},
date = {2023-09-20},
url = {https://franklinsantosm.com/posts/corrr},
langid = {en}
}