knitr::opts_chunk$set(
  tidy = TRUE,
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

HDTD: Analyzing High-Dimensional Transposable Data

Travis-CI Build Status Project Status: Active The project has reached a stable, usable state and is being actively developed.

Installation

You can install the release version of HDTD:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("HDTD")

The source code for the release version of HDTD is available on Bioconductor at:

Or you can install the development version of HDTD:

if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")
devtools::install_github("AnestisTouloumis/HDTD")

To use HDTD, you should load the package as follows:

library("HDTD")

Usage

This package offers functions to estimate and test the matrix parameters of transposable data in high-dimensional settings. The term transposable data refers to datasets that are structured in a matrix form such that both the rows and columns correspond to variables of interest and dependencies are expected to occur among rows, among columns and between rows and columns. For example, consider microarray studies in genetics where multiple RNA samples across different tissues are available per subject. In this case, a data matrix can be created with row variables the genes, column variables the tissues and measurements the corresponding expression levels. We expect dependencies to occur among genes, among tissues and between genes and tissues. For more examples of transposable data see references in @Touloumis2013, @Touloumis2015 and @Touloumis2016.

There are four core functions:

There are also three utility functions:

Example

We replicate the analysis that can be found in the vignette based on the mouse dataset

data(VEGFmouse)

This dataset contains expression levels for $40$ mice. For each mouse, the expression levels of $46$ genes (rows) that belong to the vascular endothelial growth factor signalling pathway were measured across $9$ tissues (adrenal gland, cerebrum, hippocampus, kidney, lung, muscle, spinal cord, spleen and thymus) that are displayed in the columns.

One can estimate the mean relationship of the gene expression levels across the $9$ tissues

sample_mean <- meanmat.hat(datamat = VEGFmouse,N=40)
sample_mean

and test whether the overall gene expression is constant across the $9$ tissues:

tissue_mean_test <- meanmat.ts(datamat = VEGFmouse,N=40,group.sizes=9)
tissue_mean_test

In this case, the overall gene expression is not conserved.

To analyze the gene-wise and tissue-wise dependence structure, one needs to estimate the two covariance matrices:

est_cov_mat <- covmat.hat(datamat=VEGFmouse,N=40)
est_cov_mat

Finally, the package allows users to perform hypothesis tests for the covariance matrix of the genes

genes_cov_test <- covmat.ts(VEGFmouse,N=40)
genes_cov_test

and of the tissues:

tissues_cov_test <- covmat.ts(VEGFmouse,N=40,voi="columns")
tissues_cov_test

At a $5\%$ significance level, it appears that the genes are correlated but we do not have enough evidence to reject the hypothesis that the tissues are uncorrelated.

Getting help

The statistical methods implemented in HDTD are described in @Touloumis2013, @Touloumis2015 and @Touloumis2016. Detailed examples of HDTD can be found in @Touloumis2016 or in the vignette:

browseVignettes("HDTD")

How to cite

print(citation("HDTD"), bibtex = TRUE)

References



AnestisTouloumis/HDTD documentation built on Aug. 2, 2021, 2:15 p.m.