View source: R/IC_clean_data.R
IC_clean_data | R Documentation |
This function must be used if missing values are present in the dataset.
It ensures that all correlations and partial correlations can be calculated.
The columns of the dataframe are removed one per one until all can be calculated without error.
It is possible to say that one or more columns must be retained because they are of particular importance in the analysis.
The use and method parameters are used by cor() function. The function uses by default a parallel computing in Unix or MacOSX systems.
If progress is TRUE and the package pbmcapply is present, a progress bar is displayed. If debug is TRUE, some informations are shown during the process.
https://fr.wikipedia.org/wiki/Iconographie_des_corrélations
IC_clean_data(
data = stop("A dataframe object is required"),
use = c("pairwise.complete.obs", "everything", "all.obs", "complete.obs",
"na.or.complete"),
method = c("pearson", "kendall", "spearman"),
variable.retain = NULL,
test.partial.correlation = TRUE,
progress = TRUE,
debug = FALSE
)
data |
The data.frame to be cleaned |
use |
an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". |
method |
a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated. |
variable.retain |
a vector with the name of columns to keep |
test.partial.correlation |
should the partial correlations be tested ? |
progress |
Show a progress bar |
debug |
if TRUE, information about progression of cleaning are shown |
IC_clean_data checks and corrects the dataframe to be used with IC_threshold_matrix
A dataframe
Marc Girondot marc.girondot@gmail.com
Lesty, M., 1999. Une nouvelle approche dans le choix des régresseurs de la régression multiple en présence d’interactions et de colinéarités. Revue de Modulad 22, 41-77.
Other Iconography of correlations:
IC_correlation_simplify()
,
IC_threshold_matrix()
,
plot.IconoCorel()
## Not run:
library("HelpersMG")
# based on https://fr.wikipedia.org/wiki/Iconographie_des_corrélations
es <- structure(list(Student = c("e1", "e2", "e3", "e4", "e5", "e6", "e7", "e8"),
Mass = c(52, 59, 55, 58, 66, 62, 63, 69),
Age = c(12, 12.5, 13, 14.5, 15.5, 16, 17, 18),
Assiduity = c(12, 9, 15, 5, 11, 15, 12, 9),
Note = c(5, 5, 9, 5, 13.5, 18, 18, 18)),
row.names = c(NA, -8L), class = "data.frame")
es
df_clean <- IC_clean_data(es, debug = TRUE)
cor_matrix <- IC_threshold_matrix(data=df_clean, threshold = NULL, progress=FALSE)
cor_threshold <- IC_threshold_matrix(data=df_clean, threshold = 0.3)
plot(cor_threshold, show.legend.strength=FALSE, show.legend.direction = FALSE)
cor_threshold_Note <- IC_correlation_simplify(matrix=cor_threshold, variable="Note")
plot(cor_threshold_Note, show.legend.strength=FALSE, show.legend.direction = FALSE)
cor_threshold <- IC_threshold_matrix(data=df_clean, threshold = 0.6)
plot(cor_threshold,
layout=matrix(data=c(53, 53, 55, 55,
55, 53, 55, 53), ncol=2, byrow=FALSE),
show.legend.direction = FALSE,
show.legend.strength = FALSE, xlim=c(-2, 2), ylim=c(-2, 2))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.