Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/Matrix_eQTL_engine.R
Matrix_eQTL_engine
function tests association of
every row of the snps
dataset with every row of the
gene
dataset using a linear regression model
defined by the useModel
parameter (see below).
The testing procedure accounts for extra
covariates specified by the cvrt
parameter.
The errorCovariance
parameter can be set to the
error variance-covariance matrix to account
for heteroskedastic and/or correlated errors.
Associations significant at pvOutputThreshold
(pvOutputThreshold.cis
) levels are saved to
output_file_name
(output_file_name.cis
),
with corresponding estimates of effect size (slope coefficient),
test statistics, p-values, and q-values (false discovery rate).
Matrix eQTL can perform separate analysis for
local (cis) and distant (trans) eQTLs.
For such analysis one has to set the cis-analysis specific
parameters pvOutputThreshold.cis > 0
,
cisDist
, snpspos
and
genepos in the call of Matrix_eQTL_main
function.
A gene-SNP pair is considered local if the
distance between them is less or equal to cisDist
.
The genomic location of genes and SNPs is defined by
the data frames snpspos
and genepos.
Depending on p-value thresholds pvOutputThreshold
and
pvOutputThreshold.cis
Matrix eQTL runs in
one of three different modes:
Set pvOutputThreshold > 0
and
pvOutputThreshold.cis = 0
(or use Matrix_eQTL_engine
)
to perform eQTL analysis without using gene/SNP locations.
Associations significant at the pvOutputThreshold
level
are be recorded in output_file_name
and in the returned object.
Set pvOutputThreshold = 0
and
pvOutputThreshold.cis > 0
to perform eQTL analysis for
local gene-SNP pairs only. Local associations significant at
pvOutputThreshold.cis
level will be recorded in
output_file_name.cis
and in the returned object.
Set pvOutputThreshold > 0
and
pvOutputThreshold.cis > 0
to perform eQTL analysis
with separate p-value thresholds for local and distant eQTLs.
Distant and local associations significant at corresponding
thresholds are recorded in output_file_name
and
output_file_name.cis
respectively and in the returned object.
In this case the false discovery rate is calculated
separately for these two sets of eQTLs.
Matrix_eQTL_engine
is a wrapper for Matrix_eQTL_main
for eQTL analysis without regard to gene/SNP location and provided
for compatibility with the previous versions of the package.
The parameter pvalue.hist
allows to record information sufficient
to create a histogram or QQ-plot of all the p-values
(see plot
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | Matrix_eQTL_main(
snps,
gene,
cvrt = SlicedData$new(),
output_file_name = "",
pvOutputThreshold = 1e-5,
useModel = modelLINEAR,
errorCovariance = numeric(),
verbose = TRUE,
output_file_name.cis = "",
pvOutputThreshold.cis = 0,
snpspos = NULL,
genepos = NULL,
cisDist = 1e6,
pvalue.hist = FALSE,
min.pv.by.genesnp = FALSE,
noFDRsaveMemory = FALSE)
Matrix_eQTL_engine(
snps,
gene,
cvrt = SlicedData$new(),
output_file_name,
pvOutputThreshold = 1e-5,
useModel = modelLINEAR,
errorCovariance = numeric(),
verbose = TRUE,
pvalue.hist = FALSE,
min.pv.by.genesnp = FALSE,
noFDRsaveMemory = FALSE)
|
snps |
| |||||||||||||||||
gene |
| |||||||||||||||||
cvrt |
| |||||||||||||||||
output_file_name |
| |||||||||||||||||
output_file_name.cis |
| |||||||||||||||||
pvOutputThreshold |
| |||||||||||||||||
pvOutputThreshold.cis |
| |||||||||||||||||
useModel |
| |||||||||||||||||
errorCovariance |
| |||||||||||||||||
verbose |
| |||||||||||||||||
snpspos |
| |||||||||||||||||
genepos |
| |||||||||||||||||
cisDist |
| |||||||||||||||||
pvalue.hist |
| |||||||||||||||||
min.pv.by.genesnp |
| |||||||||||||||||
noFDRsaveMemory |
|
Note that the columns of
gene
, snps
, and cvrt
must match.
If they do not match in the input files,
use ColumnSubsample
method to subset and/or reorder them.
The detected eQTLs are saved in output_file_name
and/or output_file_name.cis
if they are not NULL
.
The method also returns a list with a summary of the performed analysis.
param |
Keeps all input parameters and also records the number of degrees of freedom for the full model. |
time.in.sec |
Time difference between the start and the end of the analysis (in seconds). |
all |
Information about all detected eQTLs. |
cis |
Information about detected local eQTLs. |
trans |
Information about detected distant eQTLs. |
The elements all
, cis
, and trans
may contain the following components
ntests
Total number of tests performed. This is used for FDR calculation.
eqtls
Data frame with recorded significant associations.
Not available if noFDRsaveMemory=FALSE
neqtls
Number of significant associations recorded.
hist.bins
Histogram bins used for recording p-value distribution.
See pvalue.hist
parameter.
hist.counts
Number of p-value that fell in each histogram bin.
See pvalue.hist
parameter.
min.pv.snps
Vector with the best p-value for each SNP.
See min.pv.by.genesnp
parameter.
min.pv.gene
Vector with the best p-value for each gene.
See min.pv.by.genesnp
parameter.
Andrey A Shabalin andrey.shabalin@gmail.com
The package website: http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/
The code below is the sample code for eQTL analysis NOT using gene/SNP locations.
See MatrixEQTL_cis_code
for sample code for
eQTL analysis that separates local and distant tests.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | # Matrix eQTL by Andrey A. Shabalin
# http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/
#
# Be sure to use an up to date version of R and Matrix eQTL.
# source("Matrix_eQTL_R/Matrix_eQTL_engine.r");
library(MatrixEQTL)
## Location of the package with the data files.
base.dir = find.package('MatrixEQTL');
## Settings
# Linear model to use, modelANOVA, modelLINEAR, or modelLINEAR_CROSS
useModel = modelLINEAR; # modelANOVA, modelLINEAR, or modelLINEAR_CROSS
# Genotype file name
SNP_file_name = paste0(base.dir, "/data/SNP.txt");
# Gene expression file name
expression_file_name = paste0(base.dir, "/data/GE.txt");
# Covariates file name
# Set to character() for no covariates
covariates_file_name = paste0(base.dir, "/data/Covariates.txt");
# Output file name
output_file_name = tempfile();
# Only associations significant at this level will be saved
pvOutputThreshold = 1e-2;
# Error covariance matrix
# Set to numeric() for identity.
errorCovariance = numeric();
# errorCovariance = read.table("Sample_Data/errorCovariance.txt");
## Load genotype data
snps = SlicedData$new();
snps$fileDelimiter = "\t"; # the TAB character
snps$fileOmitCharacters = "NA"; # denote missing values;
snps$fileSkipRows = 1; # one row of column labels
snps$fileSkipColumns = 1; # one column of row labels
snps$fileSliceSize = 2000; # read file in slices of 2,000 rows
snps$LoadFile(SNP_file_name);
## Load gene expression data
gene = SlicedData$new();
gene$fileDelimiter = "\t"; # the TAB character
gene$fileOmitCharacters = "NA"; # denote missing values;
gene$fileSkipRows = 1; # one row of column labels
gene$fileSkipColumns = 1; # one column of row labels
gene$fileSliceSize = 2000; # read file in slices of 2,000 rows
gene$LoadFile(expression_file_name);
## Load covariates
cvrt = SlicedData$new();
cvrt$fileDelimiter = "\t"; # the TAB character
cvrt$fileOmitCharacters = "NA"; # denote missing values;
cvrt$fileSkipRows = 1; # one row of column labels
cvrt$fileSkipColumns = 1; # one column of row labels
if(length(covariates_file_name)>0){
cvrt$LoadFile(covariates_file_name);
}
## Run the analysis
me = Matrix_eQTL_engine(
snps = snps,
gene = gene,
cvrt = cvrt,
output_file_name = output_file_name,
pvOutputThreshold = pvOutputThreshold,
useModel = useModel,
errorCovariance = errorCovariance,
verbose = TRUE,
pvalue.hist = TRUE,
min.pv.by.genesnp = FALSE,
noFDRsaveMemory = FALSE);
unlink(output_file_name);
## Results:
cat('Analysis done in: ', me$time.in.sec, ' seconds', '\n');
cat('Detected eQTLs:', '\n');
show(me$all$eqtls)
## Plot the histogram of all p-values
plot(me)
|
Rows read: 15 done.
Rows read: 10 done.
Rows read: 2 done.
Processing covariates
Task finished in 0.00700000000000001 seconds
Processing gene expression data (imputation, residualization)
Task finished in 0.00600000000000001 seconds
Creating output file(s)
Task finished in 0.00699999999999995 seconds
Performing eQTL analysis
100.00% done, 5 eQTLs
Task finished in 0.01 seconds
Analysis done in: 0.024 seconds
Detected eQTLs:
snps gene statistic pvalue FDR beta
1 Snp_05 Gene_03 38.812160 5.515519e-14 8.273279e-12 0.4101317
2 Snp_13 Gene_09 -3.914403 2.055817e-03 1.541863e-01 -0.2978847
3 Snp_11 Gene_06 -3.221962 7.327756e-03 2.853368e-01 -0.2332470
4 Snp_04 Gene_10 3.201666 7.608981e-03 2.853368e-01 0.2321123
5 Snp_14 Gene_01 3.070005 9.716705e-03 2.915011e-01 0.2147077
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.