MTMCSKAT_workflow: Run an automated MTMC-SKAT job
In naglemi/mtmcskat: Multi-Threaded Monte Carlo Sequence Kernel Association Test (MTMC-SKAT)

MTMCSKAT_workflow

R Documentation

Run an automated MTMC-SKAT job

Description

This function provides a one-liner to run MTMC-SKAT over a scaffold.

Usage

MTMCSKAT_workflow(
  phenodata,
  covariates,
  raw_file_path,
  window_size,
  window_shift,
  output_dir,
  pre_allocated_dir,
  job_id,
  desired_sig_figs = 2,
  min_accuracy = 2,
  max_accuracy = 5,
  switch_point = 4,
  plot = TRUE,
  RAM = "AllRAM",
  n_thread = "AllCores",
  missing_cutoff = 0.15,
  top_N = Inf
)

Arguments

`phenodata`	string for filepath for phenotype file, with labeled columns for FID, IID, and trait.
`covariates`	string for filepath for covariate file, with ordered covariate values in each column and no header
`raw_file_path`	string for filepath to .traw file (see [PLINK file format reference](https://www.cog-genomics.org/plink/2.0/formats))
`window_size`	An integer, indicating the size of each SNP window (in base pairs)
`window_shift`	An integer, indicating the number of base pairs over which each rolling window will slide; in other terms, the distance between the start (or end) positions of adjacent overlapping windows
`output_dir`	string for directory where results will be saved
`pre_allocated_dir`	a directory where pre-allocated SNP window lists are kept
`job_id`	string, identifier to label the job; this identifier will go into output filename
`desired_sig_figs`	Integer, minimum number of significant figures desired for empirical p-values
`min_accuracy`	numeric, threshold x to begin resampling for SNP windows with p-values below 10^-x; default value is 2
`max_accuracy`	numeric, limit x to end resampling for SNP windows with p-values below 10^-x, usually due to computational cost
`switch_point`	numeric, limit x at which SNP windows with p-values below 10^-x must be testing by parallelizing over null models rather than SNP windows (may be set by user due to limitations on RAM preventing production of large null models)
`plot`	if TRUE, produce Manhattan plot of results
`RAM`	Integer, the total amount of RAM available to be used, for all threads (in bytes)
`n_thread`	numeric, maximum number of cores over which parallelization is allowed
`missing_cutoff`	A numeric threshold representing the minimum desired missing rate; missing rate is defined for each SNP as the proportion of genotypes missing data for the given SNP. Imputation to mean is performed , either by 'pre_allocate' or 'SKAT' itself, for all remaining missing values
`top_N`	Integer representing the number of top associations on which the user wishes to perform resampling. For example, if this value is set to 5, any SNPs that do not produce p-values among the lowest 5 will not be included in outputs from this function.

Details

For whole-genome analysis, this function (accesible by command line) is to be parallelized over jobs submitted in a batch query system on a high-performance cluster, or looped over scaffolds if running on a single machine.

Value

None; outputs are saved to the user-specified output directory, with the user-specified “'job_id“'

Examples

## Not run: 
phenodata <- system.file("extdata",
  "TDZ_shoot_area.plink.pheno",
  package = "SKATMCMT")

covariates <- system.file("extdata",
                          "poplar_PCs_covariates.tbt",
                          package = "SKATMCMT")

raw_file_path <- system.file("extdata",
                             "poplar_SNPs_Chr10_14460to14550kb.traw",
                             package = "SKATMCMT")

MTMCSKAT_workflow(phenodata = phenodata,
                  covariates = covariates,
                  raw_file_path = raw_file_path,
                  window_size = 3000,
                  window_shift = 1000,
                  output_dir = "Results/",
                  pre_allocated_dir = "pre_allocated_dir/",
                  n_thread = "AllCores",
                  job_id = "my_sample_analysis",
                  desired_sig_figs = 2,
                  min_accuracy = 2,
                  max_accuracy = 5,
                  plot = TRUE)

## End(Not run)

naglemi/mtmcskat documentation built on Aug. 23, 2023, 5:35 p.m.

naglemi/mtmcskat index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

naglemi/mtmcskat
Multi-Threaded Monte Carlo Sequence Kernel Association Test (MTMC-SKAT)

MTMCSKAT_workflow: Run an automated MTMC-SKAT job
In naglemi/mtmcskat: Multi-Threaded Monte Carlo Sequence Kernel Association Test (MTMC-SKAT)

Run an automated MTMC-SKAT job

Description

Usage

Arguments

Details

Value

Examples

Related to MTMCSKAT_workflow in naglemi/mtmcskat...

R Package Documentation

Browse R Packages

We want your feedback!

naglemi/mtmcskat Multi-Threaded Monte Carlo Sequence Kernel Association Test (MTMC-SKAT)

MTMCSKAT_workflow: Run an automated MTMC-SKAT job In naglemi/mtmcskat: Multi-Threaded Monte Carlo Sequence Kernel Association Test (MTMC-SKAT)

Run an automated MTMC-SKAT job

Description

Usage

Arguments

Details

Value

Examples

Related to MTMCSKAT_workflow in naglemi/mtmcskat...

R Package Documentation

Browse R Packages

We want your feedback!

naglemi/mtmcskat
Multi-Threaded Monte Carlo Sequence Kernel Association Test (MTMC-SKAT)

MTMCSKAT_workflow: Run an automated MTMC-SKAT job
In naglemi/mtmcskat: Multi-Threaded Monte Carlo Sequence Kernel Association Test (MTMC-SKAT)