eQTL | R Documentation |
This function performs an eQTL analysis.
eQTL(gex=NULL, xAnnot = NULL, xSamples = NULL, geno=NULL, genoSamples = NULL,
windowSize = 0.5, method = "directional", mc = 1, sig = NULL, which = NULL,
testType = "asymptotic", nper = 2000, verbose = TRUE, MAF=0.05,
IHaveSpace = FALSE)
gex |
Matrix or Vector with expression values. |
xAnnot |
Location annotations for the expression values. |
xSamples |
Sample names for the expression values (optional). |
geno |
Genotype data. |
genoSamples |
Sample names for the genotype values (optional). |
windowSize |
Size of the window around the center gene (in Mb). |
method |
Method of choice for the eQTL. |
mc |
Amount of cores for parallel computing. |
sig |
Significance level for the eQTL testing. |
which |
Names of genes for that the eQTL should be performed. |
testType |
Type of test, either |
nper |
Sets the amount of permutations, if permuation tests are used. |
verbose |
Logical, shall the method give extensive feedback. |
MAF |
Filter for Minor Allele Frequency. |
IHaveSpace |
Logical, security switch for operations that require lots of memory. |
This function performs an eQTL analysis and offers different types of tests. The type of test
can be specified with the method
option and possible options are "LM"
and "directional"
.
The option "LM"
fits for each SNP within a predefined window of size windowSize
(in MB) around a gene
a linear model for the genotype information and the corresponding gene expression. The null hypothesis
for each test is then that the slope is equal to zero and the alternative is that it is not zero.
The option "directional"
applies a new directional test based on probabilistic indices for triples as described
in Fischer, Oja, et al. (2014). Being \mathbf{x}_0=(x_{01},x_{02},\dots,x_{0N_0})', \mathbf{x}_1=(x_{11},x_{12},\dots,x_{1N_1})'
and \mathbf{x}_2=(x_{21},x_{22},\dots,x_{2N_2})'
the expression values that are linked to the three genotype
groups 0,1
and 2
with underlying distributions F_0, F_1
and F_2
. We first calculate the probabilistic
indices P_{0,1,2} = \frac{1}{N_0 N_1 N_2} \sum_i \sum_j \sum_k I(x_{0i} < x_{1j} < x_{2k})
and P_{2,1,0} = \frac{1}{N_0 N_1 N_2} \sum_i \sum_j \sum_k I(x_{2i} < x_{1j} < x_{0k})
. These are the probabilities that the expression
values of the three groups follow a certain order, that is what we would expect for possible eQTLs. The null hypothesis that we have then
in mind is that the expression values from these three groups have the same distribution H_0: F_0 = F_1 = F_2
and the
two alternatives are that the distributions have a certain stochastical order H_1: F_0 < F_1 < F_2
and H_2: F_2 < F_1 < F_0
.
The test is applied for the two probabilistic indices P_{0,1,2}
and
P_{2,1,0}
and combines the two resulting p-values p_{012}=p_1
and p_{210}=p_2
from previous tests then as overall p-value
\min(2 \min(p_1 , p_2 ), 1)
. In the two-group case (this means only two different genotypes are present for a certain SNP) a two-sided
Wilcoxon rank-sum test is applied.
The gene expressions used in the eQTL are specified in gex
. If several genes should be tested, then gex
is a matrix and each
column refers to a gene and each row to an individuum. The column names of this matrix must match then with the
names used in the annotation object xAnnot
. Sample names can either be given as row names in the matrix or as separate
vector in xSamples
. If only gene expressions of one gene should be tested then gex
can be a vector.
The genotype information is provided in the geno
object. Here one can i.a. specify the ped-file
name of a ped/map file pair. The function then imports the genotype information using the
function importPED
. In that case, the map file has to have the same filename as the ped file (despite the file extension...).
In case the genotype information has been imported already earlier using importPED
the resulting PedMap
object can
also be given as a parameter for geno
.
The xAnnot
object carries the annotation information for the gene expressions. Usually, the annotations are downloaded e.g. from the
Ensembl page and then imported to R with the importGTF
function. The resulting object can then be used as input for xAnnot
.
Casting options to feed in also other formats/objects are currently under development.
The option genoSamples
is used in case that the sample names in the ped/map file (or geno
object)
do not match with rownames(gex)
given in the expression matrix. The vector genoSamples
is as
long as the geno
object has samples, but gives then for each row in geno
the corresponding
name in the gex
object. The function finds then also the smallest union between the two data objects.
If there are repeated measurements per individual for the genotypes we take by default only the first
appearance in the data and neglect all successive values. Currently this cannot be changed. In case
this behavior is not desired, the user has to remove the corresponding rows from geno
before starting the
calculation.
If the code is executed on a Linux OS the user can specify with the mc
option the amount of CPU cores used for the
calculation.
If the sig
option is set to a certain significance level, then the method only reports those SNPs that are tested to be
significant. This can reduce the required memory drastically, especially in the case of trans-eQTL.
The method tests for trans-eQTLs (all combinations of SNPs and genes) if the windowSize
is set to NULL
.
Be aware that this might lead to long lasting calculations. For trans-eQTL calculations it is advisable to define a significance level in
sig
so that only significant results are stored. If all results are required, the option sig=NULL
has to be set and in
addition IHaveSpace=TRUE
. This additional parameter is necessary as a trans-eQTL with full output creates a huge output that often
exceeds the system properties of a normal desktop PC.
Note: The directional test currently supports only exact p-values based on permutation tests, but asymptotic implementations are developed and will be soon available also.
A list of class eqtl
containing the values
gex |
The |
geno |
The |
xAnnot |
The |
genoSamples |
The |
windowSize |
The |
and an incapsulated list eqtl
where each list item is a tested gene location and contains the items
ProbeLoc |
Used position of that gene. (Only different from 1 if multiple locations are considered.) |
TestedSNP |
Details about the considered SNPs. |
p.values |
P values of the test. |
GeneInfo |
Details about the center gene. |
Daniel Fischer
Fischer, D., Oja, H., Sen, P.K., Schleutker, J., Wahlfors, T. (2014): Generalized Mann-Whitney Type Tests for Microarray Experiments, Scandinavian Journal of Statistics, 3, pages 672-692, doi: 10.1111/sjos.12055.
Fischer, D., Oja, H. (2013): Mann-Whitney Type Tests for Microarray Experiments: The R Package gMWT, submitted article.
## Not run:
# Make the example data available
data("annotTrack") # Standard gtf file, imported with importGTF
data("geneEXP") # Matrix with gene expression
data("genotData") # A imported Ped/Map filepair, using importPED
# Transform gtf to bed format (not necessarily required)
annot.bed <- gtfToBed(annotTrack)
# cis-eQTL
###############################################
# Most basic cis-eQTL runs:
EQTL1 <- eQTL(gex=geneEXP[,1:10] , xAnnot = annotTrack, geno= genotData)
# Same run, if gtf has been transformed to bed previously
EQTL1.1 <- eQTL(gex=geneEXP[,1:10] , xAnnot = annot.bed, geno= genotData)
# Same run, when the genotype data wasn't loaded and should be loaded
# here instead
EQTL1.2 <- eQTL(gex=geneEXP[,1:10] , xAnnot = annotTrack,
+ geno= file.path("Datasets","genotypes.ped"))
# Full set of genes, this time filtered with column names
EQTL2 <- eQTL(gex=geneEXP, xAnnot = annot.bed, geno= genotData,
+ which = colnames(geneEXP)[1:20])
# Single vector of gene expression values, underlying gene is specified
# in the which option
EQTL3 <- eQTL(gex=as.vector(geneEXP[,1]) , xAnnot = annot.bed,
+ geno= genotData, which="ENSG00000223972")
# Same call, but this time is the corresponding column not casted
EQTL3.1 <- eQTL(gex=geneEXP[,1] , xAnnot = annot.bed, geno= genotData,
+ which="ENSG00000223972")
# Same call, but instead of the name the row number in the gtf/bed
# file is provided
EQTL3.2 <- eQTL(gex=geneEXP[,1] , xAnnot = annot.bed, geno= genotData,
+ which=1)
# The same expression values are now assigned to three different genes
EQTL4 <- eQTL(gex=as.vector(geneEXP[,1]) , xAnnot = annot.bed,
+ geno= genotData, which=1:3)
# Trans-eQTL
######################################
# Trans eQTL for the first and the last gene in our expression matrix
EQTL5 <- eQTL(gex=geneEXP[,c(1,1000)] , xAnnot = annot.bed,
+ geno= genotData, windowSize = NULL)
# Same call, this time distributed to 8 cores (ony available on
# Linux computers)
EQTL5 <- eQTL(gex=geneEXP[,c(1,1000)] , xAnnot = annot.bed,
+ geno= genotData, windowSize = NULL, mc=8)
# Expression values from the first gene are used to test the 100st
# gene for trans-eQTL
EQTL6 <- eQTL(gex=as.vector(geneEXP[,1]) , xAnnot = annot.bed,
+ geno= genotData, windowSize = NULL, which=100)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.