run.gadgets: A function to run the GADGETS algorithm to detect multi-SNP...

View source: R/run.gadgets.R

run.gadgetsR Documentation

A function to run the GADGETS algorithm to detect multi-SNP effects in case-parent triad studies.

Description

This function runs the GADGETS algorithm to detect multi-SNP effects in case-parent triad studies.

Usage

run.gadgets(
  data.list,
  n.chromosomes,
  chromosome.size,
  results.dir,
  cluster.type,
  registryargs = list(file.dir = NA, seed = 1500),
  resources = list(),
  cluster.template = NULL,
  n.workers = min(detectCores() - 2, n.islands/island.cluster.size),
  n.chunks = NULL,
  n.different.snps.weight = 2,
  n.both.one.weight = 1,
  weight.function.int = 2,
  generations = 500,
  gen.same.fitness = 50,
  initial.sample.duplicates = FALSE,
  snp.sampling.type = "chisq",
  crossover.prop = 0.8,
  n.islands = 1000,
  island.cluster.size = 4,
  migration.generations = 50,
  n.migrations = 20,
  recessive.ref.prop = 0.75,
  recode.test.stat = 1.64,
  n.random.chroms = 10000,
  null.mean.vec = NULL,
  null.sd.vec = NULL
)

Arguments

data.list

The output list from preprocess.genetic.data.

n.chromosomes

An integer specifying the number of chromosomes to use for each island in GADGETS.

chromosome.size

An integer specifying the number of SNPs in each chromosome.

results.dir

The directory to which island results will be saved.

cluster.type

A character string indicating the type of cluster on which to evolve solutions in parallel. Supported options are interactive, socket, multicore, sge, slurm, lsf, openlava, or torque. See the \ documentation for package batchtools for more information.

registryargs

A list of the arguments to be provided to batchtools::makeRegistry.

resources

A named list of key-value pairs to be substituted into the template file. Options available are specified in batchtools::submitJobs.

cluster.template

A character string of the path to the template file required for the cluster specified in cluster.type. Defaults to NULL. Required for options sge, slurm, lsf, openlava and torque of argument cluster.type.

n.workers

An integer indicating the number of workers for the cluster specified in cluster.type, if socket or multicore. Defaults to parallel::detectCores - 2.

n.chunks

An integer specifying the number of chunks jobs running island clusters should be split into when dispatching jobs using batchtools. For multicore or socket cluster.type, this defaults to n.workers, resulting in the total number of island cluster jobs (equal to n.islands\island.cluster.size) being split into n.chunks chunks. All chunks then run in parallel, with jobs within a chunk running sequentially. For other cluster types, this defaults to 1 chunk, with the recommendation that users of HPC clusters which support array jobs specify chunks.as.arrayjobs = TRUE in argument resources. For those users, the setup will submit an array of n.islands\island.cluster.size jobs to the cluster. For HPC clusters that do not support array jobs, the default setting should not be used. See batchtools::submitJobs for more information on job chunking.

n.different.snps.weight

The number by which the number of different SNPs between a case and complement or unaffected sibling is multiplied in computing the family weights. Defaults to 2.

n.both.one.weight

The number by which the number of SNPs equal to 1 in both the case and complement or unaffected sibling is multiplied in computing the family weights. Defaults to 1.

weight.function.int

An integer used to assign family weights. Specifically, we use weight.function.int in a function that takes the weighted sum of the number of different SNPs and SNPs both equal to one as an argument, denoted as x, and returns a family weight equal to weight.function.int^x. Defaults to 2. If set to null, then the family weight will not be exponentiated and instead set to just x.

generations

The maximum number of generations for which GADGETS will run. Defaults to 500.

gen.same.fitness

The number of consecutive generations with the same fitness score required for algorithm termination. Defaults to 50.

initial.sample.duplicates

A logical indicating whether the same SNP can appear in more than one chromosome in the initial sample of chromosomes (the same SNP may appear in more than one chromosome thereafter, regardless). Default to FALSE.

snp.sampling.type

A string indicating how SNPs are to be sampled for mutations. Options are 'chisq', 'random', or 'manual'. The 'chisq' option takes into account the marginal association between a SNP and disease status, with larger marginal associations corresponding to higher sampling probabilities. The 'random' option gives each SNP the same sampling probability regardless of marginal association. The 'manual' option should be used when snp.sampling.probs are manually input into function preprocess.genetic.data. Defaults to 'chisq'.

crossover.prop

A numeric between 0 and 1 indicating the proportion of chromosomes to be subjected to cross over.The remaining proportion will be mutated. Defaults to 0.8.

n.islands

An integer indicating the number of islands to be used. Defaults to 1000.

island.cluster.size

An integer specifying the number of islands in a given cluster. Must evenly divide n.islands and defaults to 4. More specifically, under the default settings, the 1000 n.islands are split into 250 distinct clusters each containing 4 islands (island.cluster.size). Within a cluster, migrations of top chromosomes from one cluster island to another are periodically permitted (controlled by migration.generations), and distinct clusters evolve completely independently.

migration.generations

An integer equal to the number of generations between migrations among islands of a distinct cluster. Argument generations must be an integer multiple of this value. Defaults to 50.

n.migrations

The number of chromosomes that migrate among islands. This value must be less than n.chromosomes and greater than 0, defaulting to 20.

recessive.ref.prop

The proportion to which the observed proportion of informative cases with the provisional risk genotype(s) will be compared to determine whether to recode the SNP as recessive. Defaults to 0.75.

recode.test.stat

For a given SNP, the minimum test statistic required to recode and recompute the fitness score using recessive coding. Defaults to 1.64. See the GADGETS paper for specific details.

n.random.chroms

(experimental) The number of random chromosomes used to construct a reference null mean and standard deviations vectors to compute the E-GADGETS (GxGxE) fitness score.

null.mean.vec

(experimental) A vector of estimated null means for each of the components of the E-GADGETS fitness score. This needs to be specified if running permutes under the no-GxE null, and should be set to the values in the "null.mean" element of the "null.mean.sd.info.rds" file stored in the results.dir directory for the observed data. It also should be specified if analyst wants to replicate the results of a previous E-GADGETS run, or if some of the islands of a run failed to complete, and the analyst forgot to set the seed prior to running run.gadgets.

null.sd.vec

A vector of estimated null standard deviations for the components of the E-GADGETS fitness score. See argument null.mean.vec for reasons this argument might be specified. For a given run, the previously used vector can also be found in the "null.se" element of the file "null.mean.sd.info.rds" stored in the results.dir directory.

Value

For each island, a list of two elements will be written to results.dir:

top.chromosome.results

A data.table of the final generation chromosomes, their fitness scores, and, for GADGETS, additional information pertaining to nominated risk-related genotypes. See the package vignette for an example and the documentation for chrom.fitness.score for additional details.

n.generations

The total number of generations run.

Examples


data(case)
case <- as.matrix(case)
data(dad)
dad <- as.matrix(dad)
data(mom)
mom <- as.matrix(mom)
pp.list <- preprocess.genetic.data(case[, 1:10],
                                   father.genetic.data = dad[ , 1:10],
                                   mother.genetic.data = mom[ , 1:10],
                                   ld.block.vec = c(10))
run.gadgets(pp.list, n.chromosomes = 4, chromosome.size = 3,
            results.dir = 'tmp', cluster.type = 'interactive',
            registryargs = list(file.dir = 'tmp_reg', seed = 1500),
            generations = 2, n.islands = 2, island.cluster.size = 1,
            n.migrations = 0)

unlink('tmp_bm', recursive = TRUE)
unlink('tmp', recursive = TRUE)
unlink('tmp_reg', recursive = TRUE)


mnodzenski/epistasisGA documentation built on Jan. 17, 2023, 7:07 p.m.