ramwasParameters: Function for Convenient Filling of the RaMWAS Parameter List.

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/param.r

Description

RaMWAS parameter vector which is used by major functions of the pipeline is a regular R list and setting it does not require a special function. However, using this function makes it much simpler in RStudio as the names and role of every parameter is showed in the RStudio IDE.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
ramwasParameters(
    dirproject,
    dirfilter,
    dirrbam,
    dirrqc,
    dirqc,
    dircoveragenorm,
    dirtemp,
    dirpca,
    dirmwas,
    dircv,
    dirbam,
    filebamlist,
    bamnames,
    filebam2sample,
    bam2sample,
    filecpgset,
    filenoncpgset,
    filecovariates,
    covariates,
    cputhreads,
    diskthreads,
    usefilelock,
    scoretag,
    minscore,
    maxrepeats,
    minavgcpgcoverage,
    minnonzerosamples,
    buffersize,
    doublesize,
    modelcovariates,
    modeloutcome,
    modelPCs,
    modelhasconstant,
    qqplottitle,
    toppvthreshold,
    mmncpgs,
    mmalpha,
    cvnfolds,
    bihost,
    bimart,
    bidataset,
    biattributes,
    bifilters,
    biflank,
    fileSNPs,
    dirSNPs,
    ...)

Arguments

dirproject

The project directory. Default is currect directory.
Files specified by "file*" parameters are looked for here, unless they have full path specified.

dirfilter

By default, the same as "dirproject".
All files created by RaMWAS are created within this directory.
If the user wants to test different read filtering rules, they can dirfilter to TRUE. This will set it to something like "Filter_MAPQ_4", there "MAPQ" is the BAM field used for filtering and "4" is the thredhold.

dirrbam

Directory where RaMWAS saves RaMWAS raw data files (read start locations) after scanning BAMs.
It is "rds_rbam" by default and located in "dirfilter".

dirrqc

Directory where RaMWAS saves QC files in R format after scanning BAMs.
It is "rds_qc" by default and located in "dirfilter".

dirqc

Directory where RaMWAS saves QC plots and text files (BAM QC info) after scanning BAMs.
It is "qc" by default and located in "dirfilter".

dircoveragenorm

Directory where RaMWAS saves coverage matrix at Step 3 of the pipeline.
It is "coverage_norm_123" by default (123 is the number of samples) and located in "dirfilter".

dirtemp

Directory where RaMWAS stores temporary files during construction of coverage matrix at Step 3 of the pipeline.
It is "temp" by default and located in "dircoveragenorm".
For better performance it can be set to a location on a different hard drive than "dircoveragenorm".

dirpca

Directory where RaMWAS saves results of PCA analysis at Step 4 of the pipeline.
It is "PCA_12_cvrts_0b0a0c" by default and located in "dircoveragenorm", where 12 is the number of covariates regressed out and "0b0a0c" is a unique code to differenciate different sets of 12 covariates.

dirmwas

Directory where RaMWAS saves results of MWAS analysis at Step 5 of the pipeline.
It is "Testing_age_7_PCs" by default and located in "dirpca", where "age" is the phenotype being tested and "7" is number of top PCs included in the model.

dircv

Directory where RaMWAS saves results of Methylation Risk Score analysis at Step 7 of the pipeline.
It is "CV_10_folds" by default and located in "dirmwas", where 10 is number of folds in N-fold cross validation.

dirbam

Location of BAM files.
If not absolute, it is considered to be relative to "dirproject".

filebamlist

If defined, must point to a text file with one BAM file name per line.
BAM file names may include path, relative to "dirbam" or absolute.

bamnames

A character vector with BAM file names.
Not required if "filebamlist" is specified.
BAM file names may include path, relative to "dirbam" or absolute.

filebam2sample

Allowes multiple BAMs contain information about common sample.
Must point to a file with lines like "sample1=bam1,bam2,bam3".

bam2sample

Allowes multiple BAMs contain information about common sample.
Not required if "filebam2sample" is specified.
Must be a list like list(sample1 = c("bam1","bam2","bam3"), sample2 = "bam2")

filecpgset

Name of the file storing a set of CpGs.

filenoncpgset

If defined, must point to a file storing vetted locations away from any CpGs.

filecovariates

Name of the file containing phenotype and covariates for the available samples.
If the file has extension ".csv", it is assumed to be comma separated, otherwise - tab separated.

covariates

Data frame with phenotype and covariates for the available samples.
Not required if "filecovariates" is specified.

cputhreads

Maximum number of CPU intensive tasks running in parallel.
Set to the number of CPU cores by default.

diskthreads

Maximum number of disk intensive tasks running in parallel.
Set to 2 by default.

usefilelock

If TRUE, parallel jobs are prevented from simultaneous access to file matrices.
Can improve performance on some systems.

scoretag

Reads from BAM files are filtered by this tag.
The "minscore" parameter defines the minimum admissible score.

minscore

Reads from BAM files with score "scoretag" below this are excluded.

maxrepeats

Duplicate reads (reads with the same start position and direction) in excess of this limit are removed.

minavgcpgcoverage

CpGs with average coverage below this threshold are removed.

minnonzerosamples

CpGs with fraction of samples with non-zero coverage below this threshold are removed.

buffersize

Coverage matrix transposition is performed using buffers of this size.
Larger "buffersize" improves speed of Step 3 of the pipeline, but requires more memory.
Default is 1e9, i.e. 1 GB.

doublesize

The coverage matrix is stored with this number of bytes per value.
Set to 8 for full (double) precision.
Set to 4 to use single precision and create 50% smaller coverage filematrix.

modelcovariates

Names of covariates included in PCA and MWAS.

modeloutcome

Name of the outcome variable for MWAS.

modelPCs

Number of principal components accounted for in MWAS.

modelhasconstant

By default, the tested linear model includes a constant.
To exclude it, set "modelhasconstant" parameter to FALSE.

qqplottitle

The title of the QQ-plot produced by MWAS (step 4 of the pipeline).

toppvthreshold

Determines the number of top MWAS results saved in text file.
If it is 1 or smaller, it defines the p-value threshold.
If larger than 1, it defines the exact number of top results.

mmncpgs

Parameter for multi-marker elastic net cross validation (MRS).
Defines the number of top CpGs on which to train the elastic net.
Can be set of a vector of multiple values, each is tested separately.

mmalpha

Parameter for multi-marker elastic net cross validation (MRS).
Elastic net mixing parameter alpha.
Set to 0 by default.

cvnfolds

Parameter for multi-marker elastic net cross validation (MRS).
The number of folds in the N-fold cross validation.

bihost

Parameter for BiomaRt annotation (Step 6 of the pipeline).
BioMart host site.
Set to "grch37.ensembl.org" by default.

bimart

Parameter for BiomaRt annotation (Step 6 of the pipeline).
BioMart database name, see listMarts.
Set to "ENSEMBL_MART_ENSEMBL" by default.

bidataset

Parameter for BiomaRt annotation (Step 6 of the pipeline).
BioMart data set, see listDatasets.
Set to "hsapiens_gene_ensembl" by default.

biattributes

Parameter for BiomaRt annotation (Step 6 of the pipeline).
BioMart attributes of interest, see listAttributes.
Set to c("hgnc_symbol","entrezgene","strand") by default.

bifilters

Parameter for BiomaRt annotation (Step 6 of the pipeline).
BioMart filters (if any), see listFilters.
Set to list(with_hgnc_transcript_name=TRUE) by default ignore genes without names.

biflank

Parameter for BiomaRt annotation (Step 6 of the pipeline).
Allowed distance between CpGs and genes or other annotation track elements.
Set to 0 by default, requiring direct overlap.

fileSNPs

Name of the filematrix with genotype (SNP) data.
The filematrix dimensions must match the coverage matrix.

dirSNPs

Directory where RaMWAS saves the results of joint methylation-genotype analysis.

...

Any other named parameters can be added here.

Details

The function simply collects all the parameters in a list.
The main benefit of the function is that the user does not need to memorize the names of RaMWAS parameters.

Here is how it helps in RStudio: hint.png

Value

List with provided parameters.

Author(s)

Andrey A Shabalin andrey.shabalin@gmail.com

See Also

See vignettes: browseVignettes("ramwas").

Examples

1
ramwasParameters(dirproject = ".", cputhreads = 4)

andreyshabalin/ramwas documentation built on Sept. 27, 2021, 7:25 p.m.