multithreadACME: Fast Parallelized Estimation of ACME Model for All Local...

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/parallel_process.r

Description

This function estimates the ACME model (see the vignette for model details) for all gene-SNP pairs within pre-defined distance (cisdist). The input data must be stored in filematrices (see filematrix package) and the results are also saved in a filematrix. This allows the function to perform estimation using multiple CPU cores in parallel without having to duplicate the data across all jobs.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
multithreadACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    cisdist = 1e+06, 
    threads = -1,
    workdir = ".",
    verbose = TRUE)

Arguments

genefm

Name of the filematrix with gene expression data. One column per gene and one row per sample.

snpsfm

Name of the filematrix with SNP data. One column per SNP and one row per sample.

glocfm

Name of the filematix with gene location information. Must contain two columns, first with gene start location and second with the gene end. The locations must be stored as numbers, the locations for different chromosomes must differ greatly. We suggest encoding (location = 1e9 * chromosome + position_on_chromosome). The rows must match the columns of the genefm filematrix.

slocfm

Name of the filematrix with SNP locations. Must have one column and rows matching columns of snpsfm filematrix. See the instructions for glocfm above.

cvrtfm

Name of the filematirx with covariates. Must not include constant (it is added automatically). One column per covariate and one row per sample.

acmefm

Name of the filematrix to store the estimates. The filemarix will be created. If the filematrix exists, it will be overwritten.

cisdist

The maximum allowed distance between genes and SNPs. Gene-SNP pairs further than cisdist apart will not be tested.

threads

The number of local jobs (CPU cores) used for calculation. If negative, threads is set to the number of cores of the host machine.

workdir

Directory where the input filematrices are located.

verbose

Set to TRUE to indicate progress.

Value

The function creates a filematrix named acmefm with 10 rows and a column for each tested gene-SNP pair. The rows contain gene-SNP ids and the estimates by effectSizeEstimationC:

geneid

The gene id - the column number for the gene in the genefm filematrix.

snp_id

The SNP id - the column number for the SNP in the snpsfm filematrix.

beta0

The constant parameter in the non-linear model.

beta1

The effect size parameter in the non-linear model.

nits

Number of iterations till convergence of the estimation algorithm.

SSE

Sum of squared residuals of the fitted model.

SST

Sum of squared residuals of the model with zero effect.

F

The F test for the significance of the genotype effect.

eta

The effect size parameter for simplified model (beta1/beta0).

SE_eta

Standard error of the eta estimate.

Note

The rows of genefm, snpsfm, and cvrtfm filematrices must match. The SNPs must have increasing locations.

Author(s)

Andrey A Shabalin andrey.shabalin@gmail.com, John Palowitch

References

The manuscript is available at: http://onlinelibrary.wiley.com/doi/10.1111/biom.12810/full

See Also

For package overview and code examples see the package vignette via:
browseVignettes("ACMEeqtl")
or
RShowDoc("doc/ACMEeqtl.html", "html", "ACMEeqtl")

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# First we generate a eQTL dataset in filematrix format 
tempdirectory = tempdir()
z = create_artificial_data(
    nsample = 50,
    ngene = 11,
    nsnp = 51,
    ncvrt = 1,
    minMAF = 0.2,
    saveDir = tempdirectory,
    returnData = FALSE,
    savefmat = TRUE,
    savetxt = FALSE,
    verbose = FALSE)

# In this example, we use 2 CPU cores (threads) 
# for testing of all gene-SNP pairs within 100,000 bp.
multithreadACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    cisdist = 10e+06,
    threads = 1, # Use more for faster run
    workdir = file.path(tempdirectory, "filematrices"),
    verbose = FALSE)

# Now the filematrix `ACME` holds estimations for all local gene-SNP pairs.

fm = fm.open(file.path(tempdirectory, "filematrices", "ACME"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)

show(t(TenResults))

ACMEeqtl documentation built on May 2, 2019, 4:03 p.m.