run_pathwayanal_part: High-level function to control pathway analysis.

Description Usage Arguments

View source: R/run_pathwayanal_part.R

Description

High-level function to control pathway analysis.

Usage

1
2
3
4
5
6
7
8
9
run_pathwayanal_part(part = NULL, pathways_tab = NULL,
  pathways_tab_fname = NULL, gene_tab = NULL, gene_tab_fname = NULL,
  SS_file = NULL, SS_fname_root = NULL, evecs_tab = NULL,
  evecs_tab_fname = NULL, num_PCs_use = 5, pathways_per_job = 10,
  gene_buffer = 5000, threshold_1000G = 0.03, prune_factor = 0.5,
  prune_limit = 0.0625, snp_limit = 1000, hard_snp_limit = 1500,
  prune_to_start = TRUE, run_GHC = FALSE, out_name_root = "pathway_anal",
  refsnp_dir = NULL, input_dir = NULL, output_dir = NULL, Snum = 1,
  aID = 1, checkpoint = TRUE)

Arguments

part

This is the difference between run_pathwayanal.R and run_pathwayanal_partial.R This should be a single integer between 1-pathways_per_job, and it will only do that element of the pathways list AFTER it has already been cut by pathways_per_job. Sometimes you specify too many pathways_per_job and you need to make it smaller, but you don't want to mess up the naming conventions of already finished jobs. The output will be named [out_name_root]_S[Snum]_[aID]_part[part].txt.

pathways_tab

A data.frame of pathways defined by genes in the pathway. First column name should be 'Pathway_name', second should be 'Pathway_description', all others should be 'Gene1', 'Gene2', etc. Use NA to fill blanks.

pathways_tab_fname

The name of a file formatted in the manner described by pathways_tab. You only need to specify either pathways_tab or pathways_tab_fname.

gene_tab

A data.frame which defines the location of each gene in the genome. Should have column headings including at least 'Gene', 'CHR', 'Start', 'End'.

gene_tab_fname

The name of a file formatted in the manner described by gene_tab. You only need to specify either gene_tab or gene_tab_fname.

SS_file

A data.frame holding all the summary statistics. Should have column headings including at least 'RS' and 'P-value'.

SS_fname_root

The root name of a file formatted in the manner described by SS_file. If you use this option it is assume you have separated your summary statistics by chromosome into files name [SS_fname_root][1].txt, [SS_fname_root][2].txt, etc. You only need to specify either SS_file or SS_fname_root.

evecs_tab

Data.frame of eigenvectors for correlation estimation. Should have column headings 'Subject', 'EV1', 'EV2', and so on.

evecs_tab_fname

The name of a file formatted in the manner described by evecs_tab. You only need to specify either evecs_tab or evecs_tab_fname.

num_PCs_use

Number of PCs to use.

pathways_per_job

How many pathways to test in one call of the run_pathwayanal() function. Will only be used if you also specify aID to determine which part of the pathway_tab to use.

gene_buffer

A buffer region added to the Start and End of each gene region to capture, for example, possible cis-eQTL effects.

threshold_1000G

The minimum MAF needed for a reference panel SNP before we trust it to be used in covariance estimation.

prune_factor

If the pathway has more than snp_limit SNPs, then multiply the current pruning level by this factor and rerun.

prune_limit

If the pruning factor is less than this amount, stop pruning and move on.

snp_limit

If the pathway has more than this many SNPs, rerun the function to prune more aggressively before testing. Recommended value of 1000, do not set above 2000 or numerical stability will suffer greatly.

hard_snp_limit

If after pruning limit we still haven't gone under snp_limit, can slightly raise the threshold and see if calculation is stable enough.

prune_to_start

If true, begin pruning at prune_factor, otherwise don't prune on first run.

run_GHC

Boolean, if true then test with both GBJ and GHC, if false just GBJ.

out_name_root

Root of output filename. If Snum and aID are specified then output name will be [out_name_root]_S[Snum]_[aID].txt.

refsnp_dir

Directory holding reference panel genotypes.

input_dir

Directory holding summary statistics, pathway table, gene table, eigenvectors, PLINK binary.

output_dir

Directory to save output file.

Snum

Used in cluster job submission scripts to organize jobs.

aID

Used in cluster job submission scripts to organize jobs.

checkpoint

Boolean, if true, print out diagnostic messages.


ryanrsun/LungCancerAssoc documentation built on May 24, 2019, 7:26 p.m.