pvcaBatchAssess: Principal Variance Component Analysis (PVCA)

Description Usage Arguments Details Value Note Author(s) Examples

Description

This package contains the function to assess the batch sources by fitting all "sources" as random effects including two-way interaction terms in the Mixed Model(depends on lme4 package) to selected principal components, which were obtained from the original data correlation matrix. This package accompanies the book "Batch Effects and Noise in Microarray Experiements, chapter 12.

Usage

1
pvcaBatchAssess(abatch, batch.factors, threshold)

Arguments

abatch

an instance of ExpresseionSet which can be imported from Biobase

batch.factors

A vector of factors that the mixed linear model will be fit on

threshold

the percentile value of the minimum amount of the variabilities that the selected principal components need to explain

Details

Often times "batch effects" are present in microarray data due to any number of factors, including e.g. a poor experimental design or when the gene expression data is combined from different studies with limited standardization. To estimate the variability of experimental effects including batch, a novel hybrid approach known as principal variance component analysis (PVCA) has been developed. The approach leverages the strengths of two very popular data analysis methods: first, principal component analysis (PCA) is used to efficiently reduce data dimension while maintaining the majority of the variability in the data, and variance components analysis (VCA) fits a mixed linear model using factors of interest as random effects to estimate and partition the total variability. The PVCA approach can be used as a screening tool to determine which sources of variability (biological, technical or other) are most prominent in a given microarray data set. Using the eigenvalues associated with their corresponding eigenvectors as weights, associated variations of all factors are standardized and the magnitude of each source of variability (including each batch effect) is presented as a proportion of total variance. Although PVCA is a generic approach for quantifying the corresponding proportion of variation of each effect, it can be a handy assessment for estimating batch effect before and after batch normalization.

Value

dat

A numerica vector contains the percentile of sources of batch effect for each term

label

A character vector containing the name for each term for plot label purpose

Note

Modified and maintained by Jianying Li

Author(s)

Pierre Bushel

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(golubEsets)
data(Golub_Merge)
pct_threshold <- 0.6
batch.factors <- c("ALL.AML", "BM.PB", "Source")

pvcaObj <- pvcaBatchAssess (Golub_Merge, batch.factors, pct_threshold) 
bp <- barplot(pvcaObj$dat,  xlab = "Effects",
       ylab = "Weighted average proportion variance", ylim= c(0,1.1),
       col = c("blue"), las=2, main="PVCA estimation bar chart")
axis(1, at = bp, labels = pvcaObj$label, xlab = "Effects", cex.axis = 0.5, las=2)
values = pvcaObj$dat
new_values = round(values , 3)
text(bp,pvcaObj$dat,labels = new_values, pos=3, cex = 0.8) 
print(sessionInfo())

Example output

Warning message:
In read.dcf(con) :
  URL 'http://bioconductor.org/BiocInstaller.dcf': status was 'Couldn't resolve host name'
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
singular fit
sh: 1: cannot create /dev/null: Permission denied
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] golubEsets_1.18.0   Biobase_2.36.2      BiocGenerics_0.22.0
[4] pvca_1.16.0        

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19          nloptr_1.2.1          pillar_1.3.1         
 [4] compiler_3.4.4        BiocInstaller_1.26.1  plyr_1.8.4           
 [7] bindr_0.1.1           tools_3.4.4           zlibbioc_1.22.0      
[10] lme4_1.1-19           nlme_3.1-137          tibble_2.0.0         
[13] preprocessCore_1.38.1 gtable_0.2.0          lattice_0.20-38      
[16] pkgconfig_2.0.2       rlang_0.3.1           Matrix_1.2-15        
[19] bindrcpp_0.2.2        dplyr_0.7.8           grid_3.4.4           
[22] tidyselect_0.2.5      glue_1.3.0            R6_2.3.0             
[25] minqa_1.2.4           limma_3.32.7          ggplot2_3.1.0        
[28] purrr_0.2.5           magrittr_1.5          scales_1.0.0         
[31] splines_3.4.4         MASS_7.3-51.1         assertthat_0.2.0     
[34] colorspace_1.3-2      affy_1.54.0           lazyeval_0.2.1       
[37] munsell_0.5.0         vsn_3.44.0            crayon_1.3.4         
[40] affyio_1.46.0        

pvca documentation built on Nov. 8, 2020, 5:49 p.m.