options: options

Description Format Details See Also Examples

Description

The list to describe the options that are used in sARTP, rARTP. It will be set by function options.default by default.

Format

The format is a list.

out.dir

output directory for temporary and output files. The default is the working directory getwd.

id.str

character string that is appended to temporary file names. The default is "PID".

seed

integer for random number generation. The default is 1.

Options for testing an association:

method

1 = AdaJoint, 2 = AdaJoint2, 3 = ARTP. The default is 3. It can also be 'AdaJoint', 'AdaJoint2', or 'ARTP'. The package will convert it into upper case, so for example, 'Adajoint' is also accepted. The ARTP method was the proposed in Yu et al. (2009) Genet Epi, while AdaJoint and AdaJoint2 methods were proposed in Zhang et al. (2014) EJHG. Note that AdaJoint2 could be more powerful if (1) two functional SNPs are negative correlated and have effects in the same direction; or (2) two functional SNPs are positively correlated and have opposite directions of their effects.

nperm

the number of permutations. The default is 1E5.

nthread

the number of threads for multi-threaded processors in Unix/Linux OS. The default is detectCores() to use all available processors.

Options for controlling data cleaning:

snp.miss.rate

any SNP with missing rate greater than snp.miss.rate will be removed from the analysis. The default is 0.05.

maf

any SNP with minor allele frequency less than maf will be removed from the analysis. The default is 0.05.

HWE.p

any SNP with HWE exact p-value less than HWE.p will be removed from the analysis. The test is applied to the genotype data or reference data. The test is ignored if the imputed genotype are not encoded as 0/1/2. The default is 1E-5.

gene.R2

a number between 0 and 1 to filter out SNPs that are highly correlated within each gene. The cor function will be called to compute the R^2 values between each pair of SNPs and remove one SNP with lower MAF in each pair with R^2 greater than gene.R2. The default is 0.95.

chr.R2

a number between 0 and 1 to filter out SNPs that are highly correlated within each chromosome. The cor function will be called to compute the R^2 values between each pair of SNPs and remove one SNP with lower MAF in each pair with R^2 greater than chr.R2. The default is 0.95.

gene.miss.rate

threshold to remove genes based on their missing rate. Genes with missing rate greater than gene.miss.rate will be removed from the analysis. The missing rate is calculated as the number of subjects with at least one missing genotype among all SNPs in the gene divided by the total number of subjects. The default is 1.0.

rm.gene.subset

TRUE to remove genes which are subsets of other genes. The default is TRUE.

turn.off.filters

a shortcut to turn off all SNP filters. If TRUE, it is equivalent to set snp.miss.rate = 1, maf = 0, trim.huge.chr, gene.R2 = 1, chr.R2 = 1, huge.gene.R2 = 1, huge.chr.R2 = 1, and HWE.p = 0. The default is FALSE.

impute

TRUE to impute missing genotypes with the mean of a SNP. FALSE to use another way other than imputation to handle missing data when constructing the score statistics, which is considered to be more power but also more time-consuming. The default is FALSE. If the pathway is large and the missing rates are expected to be low, consider to set it to be TRUE manually for reducing computational burden. It could be beneficial in terms of power with impute set as FALSE if the missing rate is high, e.g., the data are combined from multiple studies, and a SNP has missing genotypes because it is not measured or successfully imputed in some of the participating studies.

min.marg.p

if a index SNP has its marginal p-value (meta-analyzed if multiple summary files are provided) <= min.marg.p, then all SNPs within +/- window of the index SNP will be discarded from analysis. This is important because the gene or pathway that consists of such SNPs (e.g. p < 1E-8) may have a very small gene- or pathway-level p-value even if no other region can contribute additional association, but that is not a real gene- or pathway-level association we are looking for. The default is 1E-7.

window

an integer to specify window (in bp). The default is 500000 (500kb). See min.marg.p.

group.gap

an integer to regroup SNPs in a chromosome into independent groups. The unit is base-pair (bp). The position information will be collected from the fourth column of bim files. The default is NULL, i.e., regrouping is not performed.

delete

TRUE to delete temporary files containing the test statistics for each gene. The default is TRUE.

print

TRUE to print information to the console. The default is TRUE.

tidy

the data frame deleted.snps in the returned object of sARTP containing information of SNPs excluded from the analysis and their reasons. Possible reason codes include RM_BY_SNP_NAMES, RM_BY_REGIONS, NO_SUM_STAT, NO_RAW_GENO, NO_REF, SNP_MISS_RATE, SNP_LOW_MAF, SNP_CONST, SNP_HWE, GENE_R2, HUGE_GENE_R2, CHR_R2, HUGE_CHR, HUGE_CHR2, HUGE_CHR3, GENE_MISS_RATE, GENE_SUBSET, CONF_ALLELE_INFO, LACK_OF_ACCU_BETA. Set tidy as TRUE to hide the SNPs with codes NO_SUM_STAT and NO_REF. The default is TRUE.

save.setup

TRUE to save necessary data, e.g., working options, observed scores and covariance matrix, to local to repeat the analysis more quicly (skip loading and filtering data). It will be set to be TRUE if only.setup is TRUE. The default is FALSE.

path.setup

character string of file name to save the setup for warm.start if save.setup is TRUE. The default is NULL so that it is set as paste(out.dir, "/setup.", id.str, ".rda", sep = "").

only.setup

TRUE if only the setup is needed while the testing procedure is not. The R code to create the setup uses single thread but the testing procedure can be multi-threaded. The best practice to use ARTP2 on a multi-threaded cluster is to firstly create the setup in single-thread mode, and then call the warm.start to compute the p-values in multiple-thread mode, which uses the saved setup at path.setup as input. save.setup will be set to be TRUE if only.setup is TRUE. The default is FALSE.

keep.geno

TRUE if the reference genotypes of SNPs in pathway is returned. The default is FALSE.

excluded.snps

character vector of SNPs to be excluded in the analysis. NULL if no SNP is excluded. The default is NULL.

selected.snps

character vector of SNPs to be selected in the analysis. NULL if all SNPs are selected but other filters may be applied. The default is NULL.

excluded.regions

data frame with three columns Chr, Start, End, or three columns Chr, Pos, Radius. The unit is base-pair (bp). SNPs within [Start, End] or [Pos - Radius, Pos + Radius] will be excluded. See Examples in sARTP. This option is only available for sARTP. The default is NULL.

excluded.subs

character vector of subject IDs to be excluded in the analysis. These IDs must match with those in the second column (Individual ID) of the fam files in reference. The default is NULL.

selected.subs

character vector of subject IDs to be selected in the analysis. These IDs must match with those in the second column (Individual ID) of the fam files in reference. The default is NULL.

excluded.genes

character vector of genes to be excluded in the analysis. NULL if no gene is excluded. The default is NULL.

meta

TRUE if return meta-analysis summary data from sARTP. The default is FALSE.

ambig.by.AF

TRUE or FALSE to align SNPs with ambiguous alleles by allele frequency (see details). The default is FALSE.

Options for handling huge pathways:

trim.huge.chr

oversized chromosomes could be further trimmed to accelerate the testing procedure. If TRUE the additional options below are in effect. The default is TRUE.

huge.gene.size

a gene with number of SNPs larger than huge.gene.size will be further trimmed with huge.gene.R2 if trim.huge.chr is TRUE. The default is 1000.

huge.chr.size

a chromosome with number of SNPs larger than huge.chr.size will be further trimmed with huge.chr.R2 if trim.huge.chr is TRUE. The default is 2000.

huge.gene.R2

more stringent R^2 threshold to filter out SNPs in a gene. Similar to gene.R2. The default is gene.R2 - 0.05.

huge.chr.R2

more stringent R^2 threshold to filter out SNPs in a chromosome. Similar to chr.R2. The default is chr.R2 - 0.05.

Options for gene-based test:

inspect.snp.n

the number of candidate truncation points to inspect the top SNPs in a gene. The default is 5. (See Details)

inspect.snp.percent

a value x between 0 and 1 such that a truncation point will be defined at every x percent of the top SNPs. The default is 0 so that the truncation points will be 1:inspect.snp.n. (See Details)

Options for pathway-based test:

inspect.gene.n

the number of candidate truncation points to inspect the top genes in the pathway. The default is 10.

inspect.gene.percent

a value x between 0 and 1 such that a truncation point will be defined at every x percent of the top genes. If 0 then the truncation points will be 1:inspect.gene.n. The default is 0.05.

Details

Order of removing SNPs, genes and subjects:
1. Apply the options excluded.snps and selected.snps if non-NULL. Code: RM_BY_SNP_NAMES.
2. Apply the option excluded.regions if non-NULL and if sARTP is used. Code: RM_BY_REGIONS.
3. Remove SNPs without summary statistics in summary.files. Code: NO_SUM_STAT; or remove SNPs without raw genotype data in data or geno.files. Code: NO_RAW_GENO.
4. Remove SNPs not in bim files in reference if sARTP is used. Code: NO_REF.
5. Remove SNPs with conflictive allele information in summary and reference data if sARTP is used. Code: CONF_ALLELE_INFO.
6. Remove SNPs with missing RAF or EAF if sARTP and options$ambig.by.AF are used. Code: NO_VALID_EAF_RAF.
7. Remove SNPs with high missing rate. Code: SNP_MISS_RATE.
8. Remove SNPs with low MAF. Code: SNP_LOW_MAF.
9. Remove constant SNPs. Code: SNP_CONST.
10. Remove SNPs fail to pass HWE test. Code: SNP_HWE.
11. Remove highly correlated SNPs within each gene. Code: GENE_R2 or HUGE_GENE_R2.
12. Remove highly correlated SNPs within each chromosome. Code: CHR_R2, HUGE_CHR, HUGE_CHR2 or HUGE_CHR3.
13. Remove genes with high missing rate. Code: GENE_MISS_RATE.
14. Remove genes which are subsets of other genes. Code: GENE_SUBSET.

Example truncation points defined by inspect.snp.n and inspect.snp.percent: Assume the number of SNPs in a gene is 100. Below are examples of the truncation points for different values of inspect.snp.n and inspect.snp.percent. Similar values are applied to inspect.gene.n and inspect.gene.percent.

inspect.snp.n inspect.snp.percent truncation points
1 0 1
1 0.05 5
1 0.25 25
1 1 100
2 0 1, 2
2 0.05 5, 10
2 0.25 25, 50
2 1 100
3 0.2 20, 40, 60

SNPs with ambiguous alleles:
A SNP with alleles A and T (or C and G) is ambiguous because the strand cannot be determined. Without strand information, it is sometimes better to match SNPs with ambiguous alleles by allele frequency instead of by matching the alleles. By default, this package matches all SNPs by alleles. If matching by allele frequency for the SNPs with ambiguous alleles is desired, then summary files must contain a variable called "RAF" (reference allele frequency) or a variable "EAF" (effect allele frequency).

See Also

options.default

Examples

1
2
3

zhangh12/ARTP3 documentation built on Aug. 16, 2019, 7:39 p.m.