Description Usage Arguments Value Examples
A function for running PCA and optionally get plots out of it and Davies-Bouldin index.
1 2 3 4 5 6 7 | report.PCA(data, what = c("cast.and.pca", "plot.pca", "plot.variance",
"plot.extremes", "plot.loading", "DBindex"), measure.var = NULL,
condition.var = NULL, time.var = NULL, center.pca = T,
scale.pca = F, PC.db = c(1, 2), var.db = NULL, PC.biplot = c(1,
2), PC.extremes = 1, PC.loading = PC.biplot, log.loading = F,
na.fill = NULL, label.color.pca = NULL, group.db = label.color.pca,
n.extremes = NULL, ...)
|
data |
A data table which contains time series in long format or a numeric matrix with time series per row. |
what |
A character vector describing how the data should be handled before running pca, as well as which representations should be plotted. Valid isntructions are: c("cast.and.pca", "nocast.and.pca", "pca.only", "plot.pca", "plot.variance", "plot.extremes", "plot.loading", "DBindex") If data are a data.table in long format, use "cast.and.pca", if data are in a matrix use "nocast.and.pca". The "plot.xxx" settings call biplot.PCA and visualize.extremes.PCA. |
measure.var |
Character. Column name of the measurement that defines the time series in time. |
condition.var |
Character vector. Column names used for casting from long to wide. Should also contain the name of the column used for coloring the PCA plot if requested. A combination of these variables must be sufficient to identify unambiguously a single trajectroy in long data table. |
time.var |
Character. Column name of the time measure. |
center.pca |
Should variables be centered before running PCA. Default is TRUE. |
scale.pca |
Should variables be scaled before running PCA. Default is TRUE, but is susceptible to be changed. |
PC.db |
Numeric vector. PC from which computing DBindex |
var.db |
A numeric between 0 and 1. If provided, Davies-Bouldin will be computed on as many PCs as necessary to reach the value. |
PC.biplot |
Numeric vector of length 2. PCs to use for biplot. |
PC.extremes |
Numeric, PC from which to plot the extreme individuals. |
PC.loading |
Numeric, plot loading ('composition') of these PCs. |
log.loading |
Logical, should loading be log? |
na.fill |
Value to replace NA after casting data table from wide to long. |
label.color.pca |
Character or Vector used for coloring PCA. If data is long data.table (i.e. 'what' is set to "cast.and.pca") should contain the name of the column used for coloring; note that this column should also be provided in 'condition.var'. If data is a matrix (i.e. 'what' is set to "nocast.and.pca') vector of length equal to number of rows in data. |
group.db |
A character. Variable to be use as grouping factor when computing Davies-Bouldin |
n.extremes |
Numeric. How many extremes trajectories to plot. |
... |
additional parameters for biplot.PCA, visualize.extremes and. For example var.axes=F to remove variable arrows or tails = "positive" to plot only extremes trajectories on positive tail of the PCs. |
If 'what' is set to "pca.only", returns PCA object. Otherwise plot PCA result.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | library(data.table)
library(ggplot2)
# Create some dummy data, imagine 20 time series under four conditions A, B, C or D in a long data.table
number.measure <- 101
mydata <- data.table(Condition = rep(LETTERS[1:4], each = 20*number.measure),
Label = rep(1:20, each = number.measure),
Time=rep(seq(0,100), 80))
# A: oscillate around 1 for 10 time units, then shift to oscillation around 1.3
# B: oscillate around 1 for 10 time units, then shift to oscillation around 1.25
# C: oscillate around 1 for 10 time units, then peak to 1.3 and gets back to 1
# D: oscillate around 1 all along trajectory
mydata[Condition=="A", Measure := c(rnorm(10, 1, 0.05), rnorm(91, 1.3, 0.05)), by = "Label"]
mydata[Condition=="B", Measure := c(rnorm(10, 1, 0.05), rnorm(91, 1.25, 0.05)), by = "Label"]
mydata[Condition=="C", Measure := c(rnorm(10, 1, 0.05), rnorm(91, 1.3, 0.05) - seq(0, 0.3, length.out = 91)), by = "Label"]
mydata[Condition=="D", Measure := rnorm(101, 1, 0.05), by = "Label"]
ggplot(mydata, aes(x=Time, y=Measure)) + geom_line(aes(group=Label), alpha = 0.3) + facet_wrap("Condition") +
stat_summary(fun.y = mean, geom = "line", col = "red", size = 1.25)
report.PCA(mydata, what = c("cast.and.pca", "plot.variance", "plot.pca", "plot.extremes", "plot.loading", "DBindex"),
measure.var="Measure", condition.var=c("Condition", "Label"), time.var="Time",
center.pca = T, scale.pca = F,
PC.biplot=c(1,2), label.color.pca = "Condition", var.axes = F, n.extremes = 2,
PC.db = NULL, var.db = 0.8)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.