View source: R/MTMCSKAT_workflow.R
MTMCSKAT_workflow | R Documentation |
This function provides a one-liner to run MTMC-SKAT over a scaffold.
MTMCSKAT_workflow(
phenodata,
covariates,
raw_file_path,
window_size,
window_shift,
output_dir,
pre_allocated_dir,
job_id,
desired_sig_figs = 2,
min_accuracy = 2,
max_accuracy = 5,
switch_point = 4,
plot = TRUE,
RAM = "AllRAM",
n_thread = "AllCores",
missing_cutoff = 0.15,
top_N = Inf
)
phenodata |
string for filepath for phenotype file, with labeled columns for FID, IID, and trait. |
covariates |
string for filepath for covariate file, with ordered covariate values in each column and no header |
raw_file_path |
string for filepath to .traw file (see [PLINK file format reference](https://www.cog-genomics.org/plink/2.0/formats)) |
window_size |
An integer, indicating the size of each SNP window (in base pairs) |
window_shift |
An integer, indicating the number of base pairs over which each rolling window will slide; in other terms, the distance between the start (or end) positions of adjacent overlapping windows |
output_dir |
string for directory where results will be saved |
pre_allocated_dir |
a directory where pre-allocated SNP window lists are kept |
job_id |
string, identifier to label the job; this identifier will go into output filename |
desired_sig_figs |
Integer, minimum number of significant figures desired for empirical p-values |
min_accuracy |
numeric, threshold x to begin resampling for SNP windows with p-values below 10^-x; default value is 2 |
max_accuracy |
numeric, limit x to end resampling for SNP windows with p-values below 10^-x, usually due to computational cost |
switch_point |
numeric, limit x at which SNP windows with p-values below 10^-x must be testing by parallelizing over null models rather than SNP windows (may be set by user due to limitations on RAM preventing production of large null models) |
plot |
if TRUE, produce Manhattan plot of results |
RAM |
Integer, the total amount of RAM available to be used, for all threads (in bytes) |
n_thread |
numeric, maximum number of cores over which parallelization is allowed |
missing_cutoff |
A numeric threshold representing the minimum desired missing rate; missing rate is defined for each SNP as the proportion of genotypes missing data for the given SNP. Imputation to mean is performed , either by 'pre_allocate' or 'SKAT' itself, for all remaining missing values |
top_N |
Integer representing the number of top associations on which the user wishes to perform resampling. For example, if this value is set to 5, any SNPs that do not produce p-values among the lowest 5 will not be included in outputs from this function. |
For whole-genome analysis, this function (accesible by command line) is to be parallelized over jobs submitted in a batch query system on a high-performance cluster, or looped over scaffolds if running on a single machine.
None; outputs are saved to the user-specified output directory, with the user-specified “'job_id“'
## Not run:
phenodata <- system.file("extdata",
"TDZ_shoot_area.plink.pheno",
package = "SKATMCMT")
covariates <- system.file("extdata",
"poplar_PCs_covariates.tbt",
package = "SKATMCMT")
raw_file_path <- system.file("extdata",
"poplar_SNPs_Chr10_14460to14550kb.traw",
package = "SKATMCMT")
MTMCSKAT_workflow(phenodata = phenodata,
covariates = covariates,
raw_file_path = raw_file_path,
window_size = 3000,
window_shift = 1000,
output_dir = "Results/",
pre_allocated_dir = "pre_allocated_dir/",
n_thread = "AllCores",
job_id = "my_sample_analysis",
desired_sig_figs = 2,
min_accuracy = 2,
max_accuracy = 5,
plot = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.