Description Usage Arguments Details Value Author(s) References Examples
This function performs an eQTL analysis.
1 2 3 |
gex |
Matrix or Vector with expression values. |
geno |
Genotype data. |
xAnnot |
Location annotations for the expression values. |
xSamples |
Sample names for the expression values, see details (optional). |
genoSamples |
Sample names for the genotype values, see details (optional). |
windowSize |
Size of the window around the center gene, see details. |
method |
Method of choice for the eQTL, see details. |
mc |
Amount of cores for parallel computing. |
sig |
Significance level for the eQTL testing, see details. |
which |
Names of genes for that the eQTL should be performed. |
nper |
Sets the amount of permutations, if permuation tests are used. |
verbose |
Logical, if the method should report intermediate results. |
This function performs an eQTL analysis and offers different types of tests. The type of test
can be specified with the method
option and possible options are "LM"
and "directional"
.
The option "LM"
fits for each SNP within a predefined window of size windowSize
(in MB) around a gene
a linear model for the genotype information and the corresponding gene expression. The null hypothesis
for each test is then that the slope is equal to zero and the alternative is that it is not zero.
The "directional"
option applies a new directional test based on probabilistic indices for triples as described
in Fischer, Oja, et al. (2013). Being \mathbf{x}_0=(x_{01},x_{02},…,x_{0N_0})', \mathbf{x}_1=(x_{11},x_{12},…,x_{1N_1})'
and \mathbf{x}_2=(x_{21},x_{22},…,x_{2N_2})' the expression values that are linked to the three genotype
groups 0,1 and 2 with underlying distributions F_0, F_1 and F_2. We first calculate the probabilisic
indices P_{0,1,2} = \frac{1}{N_0 N_1 N_2} ∑_i ∑_j ∑_k I(x_{0i} < x_{1j} < x_{2k})
and P_{2,1,0} = \frac{1}{N_0 N_1 N_2} ∑_i ∑_j ∑_k I(x_{2i} < x_{1j} < x_{0k}). These are the probabilities that the expression
values of the three groups follow a certain order what we would expect for possible eQTLs. The null hypothesis that we have then
in mind is that the expression values from these three group have the same distribution H_0: F_0 = F_1 = F_2 and the
two alternatives are that the distributions have a certain stochastical order H_1: F_0 < F_1 < F_2 and H_2: F_2 < F_1 < F_0.
The test is applied for the two probabilistic indices P_{0,1,2} and P_{2,1,0} and combines the two resulting p-values p_{012}=p_1 and p_{210}=p_2 from previous tests then as overall p-value \min(2 \min(p_1 , p_2 ), 1). In the two-group case (this means only two different genotypes are present for a certain SNP) a two-sided Wilcoxon rank-sum test is applied.
The gene expressions are specified in gex
. If several genes should be tested, then gex
is a matrix and each
column refers to a gene and each row to an individuum. The column names of this matrix should match then with the
names used in the annotation object xAnnot
. Sample names can either be given as row names in the matrix or as separate
vector in xSamples
. If only gene expressions of one gene should be tested then gex
can be a vector.
The genotype information is provided in the geno
object. Here one can either specify the file
name of a ped/map file pair. In that case the function imports the genotype information using the
SnpStats
package. In case the genotype information has been imported already earlier using
SnpStats::read.pedfile()
the resulting SnpMatrix
can also be given as a parameter for geno
.
The xAnnot
object carries the annotation information for the gene expressions. In case of multiple locations per gene
it is of type list and each list item stores the information for one gene in form of a data.frame
in bed format. This data.frame
has then the three columns Chr
, Start
, End
and each row refers to one matching chromosomal postion of the underlying gene.
Especially when probes of ssRNAs are considered the chromosomal positions of a probe are not necessarily unique. The names
of the list xAnnot
are the names of the genes and they have to match with the column names of gex
. However, the order
does not have to be the same, and xAnnot
can include more annotations of genes than given in gex
. The function finds
and uses then the union between the column names of gex
and the list entries of xAnnot
.
Alternative xAnnot
can also be a data frame if unique locations are considered. In that case xAnnot
has
to be a data frame with the four columns Gene
, Chr
, Start
, End
.
The option genoSamples
is used in case that the sample names in the ped/map file (or SnpMatrix)
do not match with rownames(gex)
given in the expression matrix. The vector genoSamples
is as
long as the geno
object has samples, but gives then for each row in geno
the corresponding
name in the gex
object. The function finds then also the smallest union between the two data objects.
If there are repeated measurements per individual for the genotypes we take by default only the first
appearance in the data and neglect all successive values. Currently this cannot be changed. In case
this behavior is not desired, the user has to remove the corresponding rows from geno
before starting the
calculation.
If the code is executed on a Linux OS the user can specify with the mc
option the amount of CPU cores used for the
calculation.
If the sig
option is set to a certain significance level, then the method only reports those SNPs that are tested to be
significant. This can reduce the required memory drastically, especially in the case of trans-eQTL.
The method tests for trans-eQTLs (all combinations of SNPs and genes) if the windowSize
is set to 0
or NULL
.
Be aware that this might lead to long lasting calculations.
Note: The directional test currently supports only exact p-values based on permutation tests, but asymptotic implementations are developed and will be soon available also.
A list of class eqtl
containing the values
gex |
The |
geno |
The |
xAnnot |
The |
genoSamples |
The |
windowSize |
The |
and an incapsulated list eqtl
where each list item is a tested gene location and contains the items
ProbeLoc |
Used position of that gene. (Only different from 1 if multiple locations are considered.) |
TestedSNP |
Details about the considered SNPs. |
p.values |
P values of the test. |
GeneInfo |
Details about the center gene. |
Daniel Fischer
Fischer, D., Oja, H., Sen, P.K., Schleutker, J., Wahlfors, T. (2013): Generalized Mann-Whitney Type Tests for Microarray Experiments, Scandinavian Journal of Statistics, to appear.
Fischer, D., Oja, H. (2013): Mann-Whitney Type Tests for Microarray Experiments: The R Package gMWT, submitted article.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Please, see also the package vignette for a more descriptive example section on this.
# Make the example data available
data(Xgene)
data(genotData)
data(annotTrack)
# We need to have the gene annotation in bed format (Please notice the change to the
# official convention, this is on high priority of the ToDo list of the package to change
# this.)
## Not run:
annotBed <- gtfToBed(annotTrack)
# Perform a basic cis-eQTL with the minimum required input linear model:
lm.myEQTL <- eQTL(gex=Xgene,geno=genotData, xAnnot=annotBed,method="LM",windowSize=1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.