report.PCA: Report PCA

Description Usage Arguments Value Examples

View source: R/PCA_report.R

Description

A function for running PCA and optionally get plots out of it and Davies-Bouldin index.

Usage

1
2
3
4
5
6
7
report.PCA(data, what = c("cast.and.pca", "plot.pca", "plot.variance",
  "plot.extremes", "plot.loading", "DBindex"), measure.var = NULL,
  condition.var = NULL, time.var = NULL, center.pca = T,
  scale.pca = F, PC.db = c(1, 2), var.db = NULL, PC.biplot = c(1,
  2), PC.extremes = 1, PC.loading = PC.biplot, log.loading = F,
  na.fill = NULL, label.color.pca = NULL, group.db = label.color.pca,
  n.extremes = NULL, ...)

Arguments

data

A data table which contains time series in long format or a numeric matrix with time series per row.

what

A character vector describing how the data should be handled before running pca, as well as which representations should be plotted. Valid isntructions are: c("cast.and.pca", "nocast.and.pca", "pca.only", "plot.pca", "plot.variance", "plot.extremes", "plot.loading", "DBindex") If data are a data.table in long format, use "cast.and.pca", if data are in a matrix use "nocast.and.pca". The "plot.xxx" settings call biplot.PCA and visualize.extremes.PCA.

measure.var

Character. Column name of the measurement that defines the time series in time.

condition.var

Character vector. Column names used for casting from long to wide. Should also contain the name of the column used for coloring the PCA plot if requested. A combination of these variables must be sufficient to identify unambiguously a single trajectroy in long data table.

time.var

Character. Column name of the time measure.

center.pca

Should variables be centered before running PCA. Default is TRUE.

scale.pca

Should variables be scaled before running PCA. Default is TRUE, but is susceptible to be changed.

PC.db

Numeric vector. PC from which computing DBindex

var.db

A numeric between 0 and 1. If provided, Davies-Bouldin will be computed on as many PCs as necessary to reach the value.

PC.biplot

Numeric vector of length 2. PCs to use for biplot.

PC.extremes

Numeric, PC from which to plot the extreme individuals.

PC.loading

Numeric, plot loading ('composition') of these PCs.

log.loading

Logical, should loading be log?

na.fill

Value to replace NA after casting data table from wide to long.

label.color.pca

Character or Vector used for coloring PCA. If data is long data.table (i.e. 'what' is set to "cast.and.pca") should contain the name of the column used for coloring; note that this column should also be provided in 'condition.var'. If data is a matrix (i.e. 'what' is set to "nocast.and.pca') vector of length equal to number of rows in data.

group.db

A character. Variable to be use as grouping factor when computing Davies-Bouldin

n.extremes

Numeric. How many extremes trajectories to plot.

...

additional parameters for biplot.PCA, visualize.extremes and. For example var.axes=F to remove variable arrows or tails = "positive" to plot only extremes trajectories on positive tail of the PCs.

Value

If 'what' is set to "pca.only", returns PCA object. Otherwise plot PCA result.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
library(data.table)
library(ggplot2)
# Create some dummy data, imagine 20 time series under four conditions A, B, C or D in a long data.table
number.measure <- 101
mydata <- data.table(Condition = rep(LETTERS[1:4], each = 20*number.measure),
 Label = rep(1:20, each = number.measure),
 Time=rep(seq(0,100), 80))

# A: oscillate around 1 for 10 time units, then shift to oscillation around 1.3
# B: oscillate around 1 for 10 time units, then shift to oscillation around 1.25
# C: oscillate around 1 for 10 time units, then peak to 1.3 and gets back to 1
# D: oscillate around 1 all along trajectory

mydata[Condition=="A", Measure := c(rnorm(10, 1, 0.05), rnorm(91, 1.3, 0.05)), by = "Label"]
mydata[Condition=="B", Measure := c(rnorm(10, 1, 0.05), rnorm(91, 1.25, 0.05)), by = "Label"]
mydata[Condition=="C", Measure := c(rnorm(10, 1, 0.05), rnorm(91, 1.3, 0.05) - seq(0, 0.3, length.out = 91)), by = "Label"]
mydata[Condition=="D", Measure := rnorm(101, 1, 0.05), by = "Label"]
ggplot(mydata, aes(x=Time, y=Measure)) + geom_line(aes(group=Label), alpha = 0.3) + facet_wrap("Condition") +
 stat_summary(fun.y = mean, geom = "line", col = "red", size = 1.25)

report.PCA(mydata, what = c("cast.and.pca", "plot.variance", "plot.pca", "plot.extremes", "plot.loading", "DBindex"),
measure.var="Measure", condition.var=c("Condition", "Label"), time.var="Time",
center.pca = T, scale.pca = F,
PC.biplot=c(1,2), label.color.pca = "Condition", var.axes = F, n.extremes = 2,
PC.db = NULL, var.db = 0.8)

majpark21/TSexploreR documentation built on Oct. 16, 2019, 2:46 p.m.