README.md

LRT-q

LRT-q is used for identifying regulatory effects of rare variants on genes with likelihood ratio test.

To install the R package:

library(devtools)
install_github("avallonking/LRTq")

To see the manual:

library(LRTq)
?LRTq

Note that LRT-q performs the rare variant association test for one quantitative phenotype/trait (the expression levels for one gene). So users need to run LRT-q function for multiple times if there are different phenotypes or genes.

Input: - Phenotypes E: a vector of quantitative traits of N individuals, such as gene expression levels. It should be standardized and normalized. - Genotypes G: an N by k matrix of individual genotypes, where N represents the population size and k stands for the number of rare variants. Rows are individuals, and columns are variants. - Weights W: a vector of the weights of k rare variants. - Permutations perm: the number of permutations to perform to calculate the p-value.

Output: - The p-value for this rare variant association test. If testing the association between gene expression and rare variants, then a significant p-value, such as a p-value smaller than 0.05, means that the gene expression is regulated by rare variants. Otherwise, there is no regulatory effect of rare variants on this gene.

Example:

library(LRTq)

## use sample data provided by SKAT
require(SKAT)
data("SKAT.example")
attach(SKAT.example)

## use the quantitative trait (y.c), and extract the genotypes of rare variants (Z)
E = y.c # Phenotypes
maf = colMeans(Z) / 2
Z = Z[, maf > 0 & maf < 0.05]
G = Z # Genotypes
## weight all variants equally with 0.30
W = rep(0.30, ncol(G)) # Weights
## and use 1000 permutations
perm = 1000 # Permutations

## run LRT-q with the simulated inputs
LRTq(expr = E, geno = G, causal_ratio = rep(0.30, ncol(G)), perm = 1000)
## the results could be 0.000999001, 0.001998002, or 0.002997003, 
## due to the randomness in the permutation test

Organization of this repository

# download the scripts
git clone https://github.com/avallonking/LRTq
# go to the directory with the scripts to generate figures
cd LRTq/analysis_scripts/figures
# download the required data from Google Drive 
# https://drive.google.com/drive/folders/13HVdPpyOxCQCHxjfsGk3_gf4nrZfrTc1?usp=sharing
# rename the downloaded folder as "data"
mv LRTq-Data data
# decompose the pvals.tar.gz in the data/ folder
cd data
tar xvzf pvals.tar.gz
cd ..
# make the directory for storing the generated plots
mkdir ../materials
# run the scripts to generate figures and tables

To re-run the simulation study, users can use the R scripts in LRTq/analysis_scripts/simulation/: - power.simulate.new.R: power simulation. It runs LRT-q and other methods on the simulated data assuming there are at least one rare variants regulating gene expression. Usage: Rscript power.simulate.new.R [simulated haplotypes] [repeats] [causal ratio] [a (constant)] [output file name]The simulated haplotypes could be data/simulation/haplotype/len5k_110var/processed.sim.hap1.100var.tsv - typeIerror.simulation.new.R: type I error simulation. It runs LRT-q and other methods on the simulated data assuming there are no rare variants affecting gene expression. Usage: Rscript typeIerror.simulation.new.R [simulated haplotyes] [permutations] [repeats] [output file name]The simulated haplotypes could be data/simulation/haplotype/len5k_110var/processed.sim.hap1.100var.tsv

To re-run the analysis of GTEx, users can use the R scripts in LRTq/analysis_scripts/gtex/association_tests/: - acat.R: run ACAT on the GTEx dataset - acat.regress_out_common_eqtls.R: run ACAT on the GTEx dataset and regress out the effects of common eQTLs - faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.R: run LRT-q, SKAT-O, and VT on the GTEx dataset - faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.maf01.R: run LRT-q, SKAT-O, and VT on the GTEx dataset, only considering rare variants with MAF < 0.01 - faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.regress_out_common_eqtls.R: run LRT-q, SKAT-O, and VT on the GTEx dataset and regress out the effects of common eQTLs

Usage (output: p-values of each gene for different weights):

# The general analysis of GTEx
Rscript $rscript $tissue_gene_expression $genotype_matrix $gene_snp_set $covariates $gene_list $start $end $result_file_prefix $weight_file_prefix
# Regress out the effects of common eQTLs
Rscript $rscript $tissue_gene_expression $genotype_matrix $gene_snp_set $covariates $gene_list $start $end $result_file_prefix $weight_file_prefix $common_eqtl_file $common_geno_matrix_file


avallonking/LRTq documentation built on April 30, 2021, 1:48 a.m.