run_pathwayanal_part: High-level function to control pathway analysis.
In ryanrsun/LungCancerAssoc: Pathway Analysis Using Summary Statistics

High-level function to control pathway analysis.

run_pathwayanal_part(part = NULL, pathways_tab = NULL,
  pathways_tab_fname = NULL, gene_tab = NULL, gene_tab_fname = NULL,
  SS_file = NULL, SS_fname_root = NULL, evecs_tab = NULL,
  evecs_tab_fname = NULL, num_PCs_use = 5, pathways_per_job = 10,
  gene_buffer = 5000, threshold_1000G = 0.03, prune_factor = 0.5,
  prune_limit = 0.0625, snp_limit = 1000, hard_snp_limit = 1500,
  prune_to_start = TRUE, run_GHC = FALSE, out_name_root = "pathway_anal",
  refsnp_dir = NULL, input_dir = NULL, output_dir = NULL, Snum = 1,
  aID = 1, checkpoint = TRUE)

`part`	This is the difference between run_pathwayanal.R and run_pathwayanal_partial.R This should be a single integer between 1-pathways_per_job, and it will only do that element of the pathways list AFTER it has already been cut by pathways_per_job. Sometimes you specify too many pathways_per_job and you need to make it smaller, but you don't want to mess up the naming conventions of already finished jobs. The output will be named [out_name_root]_S[Snum]_[aID]_part[part].txt.
`pathways_tab`	A data.frame of pathways defined by genes in the pathway. First column name should be 'Pathway_name', second should be 'Pathway_description', all others should be 'Gene1', 'Gene2', etc. Use NA to fill blanks.
`pathways_tab_fname`	The name of a file formatted in the manner described by pathways_tab. You only need to specify either pathways_tab or pathways_tab_fname.
`gene_tab`	A data.frame which defines the location of each gene in the genome. Should have column headings including at least 'Gene', 'CHR', 'Start', 'End'.
`gene_tab_fname`	The name of a file formatted in the manner described by gene_tab. You only need to specify either gene_tab or gene_tab_fname.
`SS_file`	A data.frame holding all the summary statistics. Should have column headings including at least 'RS' and 'P-value'.
`SS_fname_root`	The root name of a file formatted in the manner described by SS_file. If you use this option it is assume you have separated your summary statistics by chromosome into files name [SS_fname_root][1].txt, [SS_fname_root][2].txt, etc. You only need to specify either SS_file or SS_fname_root.
`evecs_tab`	Data.frame of eigenvectors for correlation estimation. Should have column headings 'Subject', 'EV1', 'EV2', and so on.
`evecs_tab_fname`	The name of a file formatted in the manner described by evecs_tab. You only need to specify either evecs_tab or evecs_tab_fname.
`num_PCs_use`	Number of PCs to use.
`pathways_per_job`	How many pathways to test in one call of the run_pathwayanal() function. Will only be used if you also specify aID to determine which part of the pathway_tab to use.
`gene_buffer`	A buffer region added to the Start and End of each gene region to capture, for example, possible cis-eQTL effects.
`threshold_1000G`	The minimum MAF needed for a reference panel SNP before we trust it to be used in covariance estimation.
`prune_factor`	If the pathway has more than snp_limit SNPs, then multiply the current pruning level by this factor and rerun.
`prune_limit`	If the pruning factor is less than this amount, stop pruning and move on.
`snp_limit`	If the pathway has more than this many SNPs, rerun the function to prune more aggressively before testing. Recommended value of 1000, do not set above 2000 or numerical stability will suffer greatly.
`hard_snp_limit`	If after pruning limit we still haven't gone under snp_limit, can slightly raise the threshold and see if calculation is stable enough.
`prune_to_start`	If true, begin pruning at prune_factor, otherwise don't prune on first run.
`run_GHC`	Boolean, if true then test with both GBJ and GHC, if false just GBJ.
`out_name_root`	Root of output filename. If Snum and aID are specified then output name will be [out_name_root]_S[Snum]_[aID].txt.
`refsnp_dir`	Directory holding reference panel genotypes.
`input_dir`	Directory holding summary statistics, pathway table, gene table, eigenvectors, PLINK binary.
`output_dir`	Directory to save output file.
`Snum`	Used in cluster job submission scripts to organize jobs.
`aID`	Used in cluster job submission scripts to organize jobs.
`checkpoint`	Boolean, if true, print out diagnostic messages.