MTMCSKAT_workflow_pre_allocate_only: Run an automated MTMC-SKAT job

View source: R/MTMCSKAT_workflow_pre_allocate_only.R

MTMCSKAT_workflow_pre_allocate_onlyR Documentation

Run an automated MTMC-SKAT job

Description

This function provides a one-liner to run MTMC-SKAT over a scaffold.

Usage

MTMCSKAT_workflow_pre_allocate_only(
  phenodata,
  covariates,
  raw_file_path,
  window_size,
  window_shift,
  output_dir,
  pre_allocated_dir,
  job_id,
  desired_sig_figs = 2,
  min_accuracy = 2,
  max_accuracy = 5,
  switch_point = 4,
  plot = TRUE,
  RAM = "AllRAM",
  n_thread = "AllCores",
  missing_cutoff = 0.15,
  top_N = Inf
)

Arguments

phenodata

string for filepath for phenotype file, with labeled columns for FID, IID, and trait.

covariates

string for filepath for covariate file, with ordered covariate values in each column and no header

raw_file_path

string for filepath to .traw file (see [PLINK file format reference](https://www.cog-genomics.org/plink/2.0/formats))

window_size

An integer, indicating the size of each SNP window (in base pairs)

window_shift

An integer, indicating the number of base pairs over which each rolling window will slide; in other terms, the distance between the start (or end) positions of adjacent overlapping windows

output_dir

string for directory where results will be saved

pre_allocated_dir

a directory where pre-allocated SNP window lists are kept

job_id

string, identifier to label the job; this identifier will go into output filename

desired_sig_figs

Integer, minimum number of significant figures desired for empirical p-values

min_accuracy

numeric, threshold x to begin resampling for SNP windows with p-values below 10^-x; default value is 2

max_accuracy

numeric, limit x to end resampling for SNP windows with p-values below 10^-x, usually due to computational cost

switch_point

numeric, limit x at which SNP windows with p-values below 10^-x must be testing by parallelizing over null models rather than SNP windows (may be set by user due to limitations on RAM preventing production of large null models)

plot

if TRUE, produce Manhattan plot of results

RAM

Integer, the total amount of RAM available to be used, for all threads (in bytes)

n_thread

numeric, maximum number of cores over which parallelization is allowed

missing_cutoff

A numeric threshold representing the minimum desired missing rate; missing rate is defined for each SNP as the proportion of genotypes missing data for the given SNP. Imputation to mean is performed , either by 'pre_allocate' or 'SKAT' itself, for all remaining missing values

top_N

Integer representing the number of top associations on which the user wishes to perform resampling. For example, if this value is set to 5, any SNPs that do not produce p-values among the lowest 5 will not be included in outputs from this function.

Details

For whole-genome analysis, this function (accesible by command line) is to be parallelized over jobs submitted in a batch query system on a high-performance cluster, or looped over scaffolds if running on a single machine.

Value

None; outputs are saved to the user-specified output directory, with the user-specified “'job_id“'

Examples

## Not run: 
phenodata <- system.file("extdata",
  "TDZ_shoot_area.plink.pheno",
  package = "SKATMCMT")

covariates <- system.file("extdata",
                          "poplar_PCs_covariates.tbt",
                          package = "SKATMCMT")

raw_file_path <- system.file("extdata",
                             "poplar_SNPs_Chr10_14460to14550kb.traw",
                             package = "SKATMCMT")

MTMCSKAT_workflow(phenodata = phenodata,
                  covariates = covariates,
                  raw_file_path = raw_file_path,
                  window_size = 3000,
                  window_shift = 1000,
                  output_dir = "Results/",
                  pre_allocated_dir = "pre_allocated_dir/",
                  n_thread = "AllCores",
                  job_id = "my_sample_analysis",
                  desired_sig_figs = 2,
                  min_accuracy = 2,
                  max_accuracy = 5,
                  plot = TRUE)

## End(Not run)

naglemi/mtmcskat documentation built on Aug. 23, 2023, 5:35 p.m.