eQTL: Perform an eQTL Analysis
In GeneticTools: Collection of Genetic Data Analysis Tools

Description Usage Arguments Details Value Author(s) References Examples

This function performs an eQTL analysis.

1
2
3

  eQTL(gex, geno, xAnnot = NULL, xSamples = NULL, genoSamples = NULL,
       windowSize = 0.5, method = "LM", mc = 1, sig = NULL, which = NULL,
       nper = 2000, verbose = TRUE)

`gex`	Matrix or Vector with expression values.
`geno`	Genotype data.
`xAnnot`	Location annotations for the expression values.
`xSamples`	Sample names for the expression values, see details (optional).
`genoSamples`	Sample names for the genotype values, see details (optional).
`windowSize`	Size of the window around the center gene, see details.
`method`	Method of choice for the eQTL, see details.
`mc`	Amount of cores for parallel computing.
`sig`	Significance level for the eQTL testing, see details.
`which`	Names of genes for that the eQTL should be performed.
`nper`	Sets the amount of permutations, if permuation tests are used.
`verbose`	Logical, if the method should report intermediate results.

This function performs an eQTL analysis and offers different types of tests. The type of test can be specified with the method option and possible options are "LM" and "directional". The option "LM" fits for each SNP within a predefined window of size windowSize (in MB) around a gene a linear model for the genotype information and the corresponding gene expression. The null hypothesis for each test is then that the slope is equal to zero and the alternative is that it is not zero.

The "directional" option applies a new directional test based on probabilistic indices for triples as described in Fischer, Oja, et al. (2013). Being \mathbf{x}_0=(x_{01},x_{02},…,x_{0N_0})', \mathbf{x}_1=(x_{11},x_{12},…,x_{1N_1})' and \mathbf{x}_2=(x_{21},x_{22},…,x_{2N_2})' the expression values that are linked to the three genotype groups 0,1 and 2 with underlying distributions F_0, F_1 and F_2. We first calculate the probabilisic indices P_{0,1,2} = \frac{1}{N_0 N_1 N_2} ∑_i ∑_j ∑_k I(x_{0i} < x_{1j} < x_{2k}) and P_{2,1,0} = \frac{1}{N_0 N_1 N_2} ∑_i ∑_j ∑_k I(x_{2i} < x_{1j} < x_{0k}). These are the probabilities that the expression values of the three groups follow a certain order what we would expect for possible eQTLs. The null hypothesis that we have then in mind is that the expression values from these three group have the same distribution H_0: F_0 = F_1 = F_2 and the two alternatives are that the distributions have a certain stochastical order H_1: F_0 < F_1 < F_2 and H_2: F_2 < F_1 < F_0.

The test is applied for the two probabilistic indices P_{0,1,2} and P_{2,1,0} and combines the two resulting p-values p_{012}=p_1 and p_{210}=p_2 from previous tests then as overall p-value \min(2 \min(p_1 , p_2 ), 1). In the two-group case (this means only two different genotypes are present for a certain SNP) a two-sided Wilcoxon rank-sum test is applied.

The gene expressions are specified in gex. If several genes should be tested, then gex is a matrix and each column refers to a gene and each row to an individuum. The column names of this matrix should match then with the names used in the annotation object xAnnot. Sample names can either be given as row names in the matrix or as separate vector in xSamples. If only gene expressions of one gene should be tested then gex can be a vector.

The genotype information is provided in the geno object. Here one can either specify the file name of a ped/map file pair. In that case the function imports the genotype information using the SnpStats package. In case the genotype information has been imported already earlier using SnpStats::read.pedfile() the resulting SnpMatrix can also be given as a parameter for geno.

The xAnnot object carries the annotation information for the gene expressions. In case of multiple locations per gene it is of type list and each list item stores the information for one gene in form of a data.frame in bed format. This data.frame has then the three columns Chr, Start, End and each row refers to one matching chromosomal postion of the underlying gene. Especially when probes of ssRNAs are considered the chromosomal positions of a probe are not necessarily unique. The names of the list xAnnot are the names of the genes and they have to match with the column names of gex. However, the order does not have to be the same, and xAnnot can include more annotations of genes than given in gex. The function finds and uses then the union between the column names of gex and the list entries of xAnnot. Alternative xAnnot can also be a data frame if unique locations are considered. In that case xAnnot has to be a data frame with the four columns Gene, Chr, Start, End.

The option genoSamples is used in case that the sample names in the ped/map file (or SnpMatrix) do not match with rownames(gex) given in the expression matrix. The vector genoSamples is as long as the geno object has samples, but gives then for each row in geno the corresponding name in the gex object. The function finds then also the smallest union between the two data objects. If there are repeated measurements per individual for the genotypes we take by default only the first appearance in the data and neglect all successive values. Currently this cannot be changed. In case this behavior is not desired, the user has to remove the corresponding rows from geno before starting the calculation.

If the code is executed on a Linux OS the user can specify with the mc option the amount of CPU cores used for the calculation.

If the sig option is set to a certain significance level, then the method only reports those SNPs that are tested to be significant. This can reduce the required memory drastically, especially in the case of trans-eQTL.

The method tests for trans-eQTLs (all combinations of SNPs and genes) if the windowSize is set to 0 or NULL. Be aware that this might lead to long lasting calculations.

Note: The directional test currently supports only exact p-values based on permutation tests, but asymptotic implementations are developed and will be soon available also.

A list of class eqtl containing the values

`gex`	The `gex` object from the function call.
`geno`	The `geno` object from the function call.
`xAnnot`	The `xAnnot` object from the function call.
`genoSamples`	The `genoSamples` object from the function call.
`windowSize`	The `windowSize` object from the function call.

and an incapsulated list eqtl where each list item is a tested gene location and contains the items

`ProbeLoc`	Used position of that gene. (Only different from 1 if multiple locations are considered.)
`TestedSNP`	Details about the considered SNPs.
`p.values`	P values of the test.
`GeneInfo`	Details about the center gene.

Daniel Fischer

Fischer, D., Oja, H., Sen, P.K., Schleutker, J., Wahlfors, T. (2013): Generalized Mann-Whitney Type Tests for Microarray Experiments, Scandinavian Journal of Statistics, to appear.

Fischer, D., Oja, H. (2013): Mann-Whitney Type Tests for Microarray Experiments: The R Package gMWT, submitted article.

# Please, see also the package vignette for a more descriptive example section on this.

# Make the example data available
  data(Xgene) 
  data(genotData)
  data(annotTrack)

# We need to have the gene annotation in bed format (Please notice the change to the
# official convention, this is on high priority of the ToDo list of the package to change
# this.)

## Not run: 
  annotBed <- gtfToBed(annotTrack)

# Perform a basic cis-eQTL with the minimum required input linear model:
  lm.myEQTL <- eQTL(gex=Xgene,geno=genotData, xAnnot=annotBed,method="LM",windowSize=1)

## End(Not run)