permutationMultipleLm: permutationMultipleLm
In zhushijia/GIGSEA: Genotype Imputed Gene Set Enrichment Analysis

Description Usage Arguments Value Author(s) See Also Examples

permutationMultipleLm is a permutation test to calculate the empirical p values for a weighted multiple linear regression.

1 2	permutationMultipleLm(fc, net, weights = rep(1, nrow(net)), num = 100, verbose = TRUE)

`fc`	a vector of numeric values representing gene expression fold change
`net`	a matrix of numeric values in the size of gene number x gene set number, representing the connectivity between genes and gene sets
`weights`	a vector of numeric values representing the weights of permuated genes
`num`	an integer value representing the number of permutations
`verbose`	an boolean value indicating whether or not to print output to the screen

a data frame comprising the following columns:

term a vector of character incidating the names of gene sets.
usedGenes a vector of numeric values indicating the number of genes used in the model.
Estimate a vector of numeric values indicating the regression coefficients.
Std..Error a vector of numeric values indicating the standard errors of regression coefficients.
t.value a vector of numeric values indicating the t-statistics of regression coefficients.
observedPval a vector of numeric values [0,1] indicating the p values from the multiple weighted regression model.
empiricalPval a vector of numeric values [0,1] indicating the empirical p values from the permutation test.

Shijia Zhu, shijia.zhu@mssm.edu

orderedIntersect; permutationMultipleLmMatrix;

# load data
data(heart.metaXcan)
gene <- heart.metaXcan$gene_name

# extract the imputed Z-score of differential gene expression, which follows 
# the normal distribution
fc <- heart.metaXcan$zscore

# use as weights the prediction R^2 and the fraction of imputation-used SNPs 
usedFrac <- heart.metaXcan$n_snps_used / heart.metaXcan$n_snps_in_cov
r2 <- heart.metaXcan$pred_perf_r2
weights <- usedFrac*r2

# build a new data frame for the following weighted linear regression-based 
# enrichment analysis
data <- data.frame(gene,fc,weights)
head(data)

net <- MSigDB.KEGG.Pathway$net

# intersect the imputed genes with the gene sets of interest
data2 <- orderedIntersect( x = data[,c("fc","weights")] , by.x = data$gene , 
by.y = rownames(net)  )
net2 <- orderedIntersect( x = net , by.x = rownames(net) , 
by.y = data$gene  )
all( rownames(net2) == rownames(data2) )

# the MGSEA.res1 uses the weighted multiple linear regression to do 
# permutation test, 
# while MGSEA.res2 used the solution of weighted matrix operation. The 
# latter one takes substantially less time.
# system.time( MGSEA.res1<-permutationMultipleLm(fc=data2$fc, net=net2, 
# weights=data2$weights, num=1000))
# system.time( MGSEA.res2<-permutationMultipleLmMatrix(fc=data2$fc, 
# net=net2, weights=data2$weights, num=1000))
# head(MGSEA.res2)