plot_corr: Plots to explore the correlations between features

Description Usage Arguments Value

View source: R/corrExplorationPlots.R

Description

Plot the correlations focusing on a variable x vs all the rest of the variables. The workflow is: 1. remove small groups if "min.group.size" is defined; 2. calculate the p values for all pairs of variables 3. select the ones that pass pvalue threshold for plotting. Pvalues are by default non-parametric. Can choose if p.adjust should be used. 4. calculate padj and save the plots and pvalue tables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
plot_corr_one(
  plotdf,
  x.coln,
  y.coln,
  cat.num.test = "kruskal.test",
  num.num.test = "spearman",
  plot.it = F,
  plot.nrow = NULL,
  plot.ncol = NULL,
  signif.cutoff = 0.05,
  plot.stattest = "np",
  p.adj.method = NULL,
  plot.signif.only = F,
  plot.max = 40,
  min.group.size.x = 3,
  min.group.size.y = 3,
  seed = 999,
  outpdir = NULL,
  plot.w = 7,
  plot.h = 7.5,
  fn.suffix = "",
  ...
)

plot_corr(
  plotdf,
  x.coln,
  y.coln = NULL,
  cat.num.test = "kruskal.test",
  cat.cat.test = "both",
  num.num.test = "spearman",
  plot.it = F,
  plot.nrow = NULL,
  plot.ncol = NULL,
  signif.cutoff = 0.05,
  plot.stattest = "np",
  plot.signif.only = F,
  p.adj.method.each = NULL,
  p.adj.method.all = "bonferroni",
  plot.max = 50,
  seed = 999,
  outpdir = NULL,
  plot.w = 7,
  plot.h = 7.5,
  min.group.size.x = 3,
  min.group.size.y = 3,
  fn.suffix = "",
  ...
)

Arguments

plotdf

dataframe with rows of samples and columns of features.

x.coln

column name of the x axis of the plot

y.coln

character vector of the column names of features to be plotted as y axis

cat.num.test

the significance test to be used for categorical vs numerical variables. Use the name of the r basic tests (Default "kruskal.test").

num.num.test

the significance test to be used for numerical vs numerical variables. Should be "spearman"(Default), "pearson", "kendall", or "lm"(using the pvalue of the independent variable in lm).

plot.it

Whether to plot it out (T/F)

plot.nrow, plot.ncol

The number of rows and columns in the combined plot

plot.stattest

Pass to the "type" parameter in ggbarstats , ggbetweenstats ,ggscatterstats, defining the stats test to be used. Default "np" is non-parametric.

plot.signif.only

whether to plot only the significant items

plot.max

maximum how many plots to be plotted. If set to NULL then plot all.

min.group.size.x

for categorical x, remove groups that are smaller than this number

min.group.size.y

for categorical y, remove groups that are smaller than this number

outpdir

If not NULL, save all plots and pvalues (as table) to the outpdir

plot.w, plot.h

Width and height of each individual plot

fn.suffix

filename suffix

...

pass to ggbarstats , ggbetweenstats ,ggscatterstats

cat.cat.test

the significance test to be used for categorical vs categorical variables. Should be "fisher","chi" or "both"(Default)

padj.method, padj.method.each, padj.method.all

pvalue adjustment method. Should follow ggbetweenstats. In plot_corr_one, if padj.method is specified, pvalue adjustment will be done and only those pass the padj threshold will be plotted. In plot_corr, padj.method.each will be passed to plot_corr_one while padj.method.all will be used to genearate the final padj table, adjusting for all pvalues.

Value

For plot_corr, List of returns from plot_corr_one. For plot_corr_one: List of two: "plot" of ggarrange object which arrange all plot into one, and "pvalues" of a named vector.


brightchan/cjbmisc documentation built on Nov. 5, 2021, 4:12 p.m.