Introduction to omicwas

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Installation

install.packages("devtools")
devtools::install_github("fumi-github/omicwas")

If you encounter dependency error for sva package, install it from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
   install.packages("BiocManager")
BiocManager::install("sva")

Usage

omicwas is a package for cell-type-specific disease association testing, using bulk tissue data as input. The package accepts DNA methylation data for epigenome-wide association studies (EWAS), as well as gene expression data for differential gene expression analyses. The main function is ctassoc

library(omicwas)

See description.

?ctassoc

Analyzing DNA methylation

Let's load a sample data.

data(GSE42861small)
X = GSE42861small$X
W = GSE42861small$W
Y = GSE42861small$Y
Y = Y[seq(1, 20), ] # for brevity
C = GSE42861small$C

See description.

?GSE42861small

The conventional way is to use ordinary linear regression.

result = ctassoc(X, W, Y, C = C,
                 test = "full")
result$coefficients

We recommend nonlinear regression with ridge regularization. For DNA methylation, we use the logit function for normalization, and the test option is nls.logit

result = ctassoc(X, W, Y, C = C,
                 test = "nls.logit",
                 regularize = TRUE)
print(result$coefficients, n = 20)

The first 19 lines show the result for CpG site cg10543797. Line 1 shows that the basal methylation level in CD4. (actually CD4+ T cells) is plogis(1.498) = 0.817, so this cell type is 81% methylated. Line 8 shows that the CD4.-specific effect of the disease is 7.10e-04 (in logit scale). Since the p.value is 0.64, this is not significant. Line 15 shows that the effect of sexF (female compared to male) is -4.14e-02 with a small p.value 9.15e-05. Since sexF is a covariate that has uniform effect across cell types, the celltype column is NA.

Analyzing gene expression

Let's load a sample data.

data(GTExsmall)
X = GTExsmall$X
W = GTExsmall$W
Y = GTExsmall$Y + 1
Y = Y[seq(1, 20), ] # for brevity
C = GTExsmall$C

See description.

?GTExsmall

The conventional way is to use ordinary linear regression.

result = ctassoc(X, W, Y, C = C,
                 test = "full")
result$coefficients

We recommend nonlinear regression with ridge regularization. For DNA methylation, we use the log function for normalization, and the test option is nls.log

result = ctassoc(X, W, Y, C = C,
                 test = "nls.log",
                 regularize = TRUE)
print(result$coefficients, n = 15)

The first 13 lines show the result for transcript ENSG00000059804. Line 1 shows that the basal expression level in Granulocytes is exp(8.847) = 6955. Line 7 shows that the Granulocytes-specific effect of age is 4.62e-04 (in log scale). Since the p.value is 0.82, this is not significant. Line 13 shows that the effect of sexF (female compared to male) is -2.57e-03 with p.value 0.97 Since sexF is a covariate that has uniform effect across cell types, the celltype column is NA.

Analyzing mQTL

For QTL analyses, we use ctcisQTL function instead of ctassoc. To speed up computation, we perform linear ridge regression, thus the statistical test is almost identical to ctassoc(test = "nls.identity", regularize = TRUE). We analyze only in the linear scale. Association analysis is performed between each row of Y and each row of X. See description.

?ctcisQTL

Let's load a sample data.

data(GSE79262small)
X    = GSE79262small$X
Xpos = GSE79262small$Xpos
W    = GSE79262small$W
Y    = GSE79262small$Y
Ypos = GSE79262small$Ypos
C    = GSE79262small$C
X    = X[seq(1, 3001, 100), ] # for brevity
Xpos = Xpos[seq(1, 3001, 100)]
Y    = Y[seq(1, 501, 100), ]
Ypos = Ypos[seq(1, 501, 100)]

See description.

?GSE79262small

Analyze mQTL.

ctcisQTL(X, Xpos, W, Y, Ypos, C = C)

The result is stored in a file.

head(
  read.table(file.path(tempdir(), "ctcisQTL.out.txt"),
             header = TRUE,
             sep ="\t"))

The first 3 lines show the result for the association of SNP rs6678176 with CpG site cg19251656. Line 1 shows that the CD4T-specific effect of the SNP is -0.003. Since the p.value is 0.75, this is not significant.



Try the omicwas package in your browser

Any scripts or data that you put into this service are public.

omicwas documentation built on Jan. 13, 2021, 8:48 a.m.