mtmcskat_NullModels: Run multi-threaded Monte Carlo Sequence Kernel Association...

View source: R/mtmcskat_SNPs.R

mtmcskat_NullModelsR Documentation

Run multi-threaded Monte Carlo Sequence Kernel Association Test over a pre-allcoated set of SNP windows

Description

These function provide means of running SKAT over a set of pre-allocated SNP windows (see pre_allocate) while performing resampling with a user-specified number of permutations. Multithreading is implemented over SNP windows or null models, depending on the function called.

Usage

mtmcskat_NullModels(
  n_thread,
  n_permutations,
  max_permutations_per_job,
  this_phenotype,
  covariates,
  pre_allocated_SNP_windows,
  scaffold_ID,
  missing_cutoff = 0.15
)

mtmcskat_SNPs(
  pre_allocated_SNP_windows,
  n_permutations,
  this_phenotype,
  covariates,
  scaffold_ID,
  n_thread,
  missing_cutoff = 0.15,
  ...
)

Arguments

n_thread

An integer indicating the number of threads to be used for multithreading

n_permutations

Integer indicating the number of permutations used to calculate empirical p-values

max_permutations_per_job

the maximum number of permutations that may fit into a 'SKAT_NULL_Model' object generated for any given thread

this_phenotype

Phenotype data that has been read into R from the standard PLINK phenotype file format (PLINK documentation)

covariates

Covariates data that has been read into R from a comma-delimited text file with a header and one row for each genotype, in the same order as the phenotype file.

pre_allocated_SNP_windows

A list of SNP windows along with their positions and scaffolds, prepared using pre_allocate

scaffold_ID

Integer indicating the chromosome or scaffold of interest

missing_cutoff

A numeric threshold representing the minimum desired missing rate; missing rate is defined for each SNP as the proportion of genotypes missing data for the given SNP. Imputation to mean is performed , either by 'pre_allocate' or 'SKAT' itself, for all remaining missing values

...

Additional parameters passed on to SKAT

Details

The choice of the appropriate function (either 'mtmcskat_SNPs' or 'mtmcskat_NullModels') depends on whether the user desired to multithread over SNP windows or null models, which in turn depends on several factors. First, if multi-threading over SNP windows, the number of SNP windows should not be less than the number of threads available, as efficient computation requires each thread receive at least one SNP window. Second, available RAM limits the amount of permutation that can be accomodated simultaneously in null models.

Multithreading over null models is recommended for situations in which the number of permutations that must be tested cannot fit into available RAM divided by available threads. Multithreading over SNP windows is only recommended for situations involving relatively few permutations and large number of SNP windows, for example, in the standard MTMCSKAT workflow, when calculating empirical p-values for large groups of SNP windows with initial mtskat p-values that are not very low.

Value

A dataframe with four columns, for 1) scaffold ID, 2) SNP window position, 3) p-values from the model used in SKAT without resampling, and 4) empirical p-values

Examples


data("small_phenodata")
data("small_covariates")
data("small_pre_allocated_windows")

# Multithreading over SNP windows
mtmcskat_SNPs(
this_phenotype = small_phenodata,
covariates = small_covariates,
n_permutations = 500,
pre_allocated_SNP_windows = small_pre_allocated_windows[2:4],
scaffold_ID = small_pre_allocated_windows[[1]][[3]],
n_thread = 2)

# Multithreading over null models, where all necessary permutations can
#  simultaneously fit into memory and computation can be completed in a
#  single "batch."
mtmcskat_NullModels(
this_phenotype = small_phenodata,
covariates = small_covariates,
n_permutations = 500,
n_thread = 2,
max_permutations_per_job = 251,
pre_allocated_SNP_windows = small_pre_allocated_windows[2:4],
scaffold_ID = small_pre_allocated_windows[[1]][[3]])


# Multithreading over null models, where the the number of permutations
#   is greater than that which can fit into memory (as indicated by the
#   user or upstream functions through the `max_permutations_per_job`
#   argument), thus requiring multiple sequential "batches" of computation.
mtmcskat_NullModels(
this_phenotype = small_phenodata,
covariates = small_covariates,
n_permutations = 500,
n_thread = 2,
max_permutations_per_job = 249,
pre_allocated_SNP_windows = small_pre_allocated_windows[2:4],
scaffold_ID = small_pre_allocated_windows[[1]][[3]])


naglemi/mtmcskat documentation built on Aug. 23, 2023, 5:35 p.m.