GADGETS: A function to run the GADGETS method

View source: R/GADGETS.R

GADGETSR Documentation

A function to run the GADGETS method

Description

This function runs the GADGETS method on a given cluster of islands. It is a wrapper for the underlying Rcpp function run_GADGETS.

Usage

GADGETS(
  cluster.number,
  results.dir,
  case.genetic.data,
  complement.genetic.data,
  case.genetic.data.n,
  mother.genetic.data.n,
  father.genetic.data.n,
  exposure.mat,
  weight.lookup.n,
  ld.block.vec,
  n.chromosomes,
  chromosome.size,
  snp.chisq,
  weight.lookup,
  null.mean.vec = c(0, 0),
  null.se.vec = c(1, 1),
  island.cluster.size = 4,
  n.migrations = 20,
  n.different.snps.weight = 2,
  n.both.one.weight = 1,
  migration.interval = 50,
  gen.same.fitness = 50,
  max.generations = 500,
  initial.sample.duplicates = FALSE,
  crossover.prop = 0.8,
  recessive.ref.prop = 0.75,
  recode.test.stat = 1.64,
  E_GADGETS = FALSE
)

Arguments

cluster.number

An integer indicating the cluster number (used for labeling the output file).

results.dir

The directory to which island results will be saved.

case.genetic.data

The genetic data of the disease affected children from case-parent trios or disease-discordant sibling pairs. If searching for maternal SNPs that are related to risk of disease in the child, some of the columns in case.genetic.data may contain maternal SNP genotypes (See argument mother.snps for how to indicate which SNPs columns correspond to maternal genotypes). Columns are SNP allele counts, and rows are individuals.This object should be of class 'matrix'. The ordering of the columns must be consistent with the LD structure specified in ld.block.vec. The genotypes cannot be dosages imputed with uncertainty. If any data are missing for a particular family member for a particular SNP, that SNP's genotype should be coded as -9 for each member of the entire family: (case.genetic.data and father.genetic.data/mother.genetic.data, or case.genetic.data and complement.genetic.data). If running the experimental E-GADGETS method, this argument should be set to a 1x1 matrix whose only value is 0.0, and case.genetic.data.n will be used to specify the case genotypes.

complement.genetic.data

A genetic dataset for the controls corresponding to the genotypes in case.genetic.data.For SNPs that correspond to the affected child in case.genetic.data, the corresponding column in complement.genetic.data should be set equal to mother allele count + father allele count - case allele count. If using disease-discordant siblings this argument should be the genotypes for the unaffected siblings. For SNPs in case.genetic.data that represent maternal genotypes (if any) the corresponding column in complement.genetic.data should be the paternal genotypes for that SNP. This object should be of class 'matrix'. Columns are SNP allele counts, rows are families. If not specified, father.genetic.data and mother.genetic.data must be specified. The genotypes cannot be dosages imputed with uncertainty. If any data are missing for a particular family for a particular SNP, that SNP's genotype should be coded as -9 for the entire family (case.genetic.data and complement.genetic.data) for that SNP. If running the experimental E-GADGETS method, this argument should be set to a 1x1 matrix whose only value is 0.0, and complement.genetic.data.n will be used to specify the complement genotypes.

case.genetic.data.n

(experimental) A matrix, to be used in the experimental E-GADGETS method, containing the same data as described above for case.genetic.data, but the genotypes here are stored as floating point values, as opposed to integer values in case.genetic.data. If not running E-GADGETS, this should be specified as a 1x1 matrix whose only value is 0.0.

mother.genetic.data.n

(experimental) A matrix, to be used in the experimental E-GADGETS method, containing the genotypes for the mothers of case.genetic.data, where the genotypes are stored as floating point values, as opposed to integer values. If not running E-GADGETS, this should be specified as a 1x1 matrix whose only value is 0.0.

father.genetic.data.n

(experimental) A matrix, to be used in the experimental E-GADGETS method, containing the genotypes for the fathers of case.genetic.data, where the genotypes are stored as floating point values, as opposed to integer values. If not running E-GADGETS, this should be specified as a 1x1 matrix whose only value is 0.0.

exposure.mat

(experimental) A matrix of the input categorical and continuous exposures, if specified, to be used in the experimental E-GADGETS method. If not running E-GADGETS, this should be a 1x1 matrix whose only entry is 0.0.

weight.lookup.n

(experimental) A vector that maps a family weight to the weighted sum of the number of different SNPs and SNPs both equal to one, to be used by the experimetnal E-GADGETS method. The vector should store values as floating point values, not integers. If not running E-GADGETS, this argument should be specified as 0.0, and will not be used in the GA. Instead, for GADGETS, computation of the family weights will be based on argument weight.lookup, which is computed in the same way, except stores values as integers.

ld.block.vec

An integer vector specifying the linkage blocks of the input SNPs. As an example, for 100 candidate SNPs, suppose we specify ld.block.vec <- c(25, 75, 100). This vector indicates that the input genetic data has 3 distinct linkage blocks, with SNPs 1-25 in the first linkage block, 26-75 in the second block, and 76-100 in the third block. Note that this means the ordering of the columns (SNPs) in case.genetic.data must be consistent with the LD blocks specified in ld.block.vec. In the absence of outside information, a reasonable default is to consider SNPs to be in LD if they are located on the same biological chromosome. If case.genetic.data includes both maternal and child SNP genotypes, we recommend considering any maternal SNP and any child SNP located on the same nominal biological chromosome as 'in linkage'. E.g., we recommend considering any maternal SNPs located on chromosome 1 as being 'linked' to any child SNPs located on chromosome 1, even though, strictly speaking, the maternal and child SNPs are located on separate pieces of DNA. If running E-GADGETS, this argument should be specified as 0, and will not be used.

n.chromosomes

An integer specifying the number of chromosomes to use in the GA.

chromosome.size

An integer specifying the number of SNPs on each chromosome.

snp.chisq

A vector of statistics to be used in sampling SNPs for mutation. By default, these are the square roots of the chi-square marginal SNP-disease association statistics for each column in case.genetic.data, but can also be manually specified or uniformly 1 (corresponding to totally random sampling).

weight.lookup

A vector that maps a family weight to the weighted sum of the number of different SNPs and SNPs both equal to one. This should store values as integers.

null.mean.vec

(experimental) A vector of estimated null means for each component of the E-GADGETS (GxGxE) fitness score. For all other uses, this should be specified as rep(0, 2) and will not be used.

null.se.vec

(experimental) A vector of estimated null standard deviations for each component of the E-GADGETS (GxGxE) fitness score. For all other uses, this should be specified as rep(0, 2) and will not be used.

island.cluster.size

An integer specifying the number of islands in the cluster. See coderun.gadgets for additional details.

n.migrations

The number of chromosomes that migrate among islands. This value must be less than n.chromosomes and greater than 0, defaulting to 20.

n.different.snps.weight

The number by which the number of different SNPs between a case and complement is multiplied in computing the family weights. Defaults to 2.

n.both.one.weight

The number by which the number of SNPs equal to 1 in both the case and complement is multiplied in computing the family weights. Defaults to 1.

migration.interval

The interval of generations for which GADGETS will run prior to migration of top chromosomes among islands in a cluster. Defaults to 50. In other words, top chromosomes will migrate among cluster islands every migration.interval generations. We also check for convergence at each of these intervals.

gen.same.fitness

The number of consecutive generations with the same fitness score required for algorithm termination. Defaults to 50.

max.generations

The maximum number of generations for which GADGETS will run. Defaults to 500.

initial.sample.duplicates

A logical indicating whether the same SNP can appear in more than one chromosome in the initial sample of chromosomes (the same SNP may appear in more than one chromosome thereafter, regardless). Defaults to FALSE.

crossover.prop

A numeric between 0 and 1 indicating the proportion of chromosomes to be subjected to cross over. The remaining proportion will be mutated. Defaults to 0.8.

recessive.ref.prop

The proportion to which the observed proportion of informative cases with the provisional risk genotype(s) will be compared to determine whether to recode the SNP as recessive. Defaults to 0.75.

recode.test.stat

For a given SNP, the minimum test statistic required to recode and recompute the fitness score using recessive coding. Defaults to 1.64. See the GADGETS paper for specific details.

E_GADGETS

(experimental) A boolean indicating whether to run the experimental 'E_GADGETS' method.

Value

For each island in the cluster, an rds object containing a list with the following elements will be written to results.dir.

top.chromosome.results

A data.table of the final generation chromosomes, their fitness scores, and, for GADGETS, additional information pertaining to nominated risk-related genotypes. See the package vignette for an example and for additional details.

n.generations

The total number of generations run.

Examples


set.seed(10)
data(case)
case <- as.matrix(case)
data(dad)
dad <- as.matrix(dad)
data(mom)
mom <- as.matrix(mom)
data.list <- preprocess.genetic.data(case[, 1:10],
                                     father.genetic.data = dad[ , 1:10],
                                     mother.genetic.data = mom[ , 1:10],
                                     ld.block.vec = c(10))

 chisq.stats <- sqrt(data.list$chisq.stats)
 ld.block.vec <- data.list$ld.block.vec
 case.genetic.data <- data.list$case.genetic.data
 complement.genetic.data <- data.list$complement.genetic.data

 #required inputs but not actually used in function below
 case.genetic.data.n <- matrix(0.0, 1, 1)
 mother.genetic.data.n <- matrix(0.0, 1, 1)
 father.genetic.data.n <- matrix(0.0, 1, 1)
 exposure.mat <- data.list$exposure.mat + 0.0

 weight.lookup <- vapply(seq_len(6), function(x) 2^x, 1)
 dir.create('tmp')
GADGETS(cluster.number = 1, results.dir = 'tmp',
        case.genetic.data = case.genetic.data,
        complement.genetic.data = complement.genetic.data,
        case.genetic.data.n = case.genetic.data.n,
        mother.genetic.data.n = mother.genetic.data.n,
        father.genetic.data.n = father.genetic.data.n,
        exposure.mat = exposure.mat,
        weight.lookup.n = weight.lookup + 0.0,
        ld.block.vec = ld.block.vec,
        n.chromosomes = 10, chromosome.size = 3, snp.chisq = chisq.stats,
        weight.lookup = weight.lookup, n.migrations = 2,
        migration.interval = 5,
        gen.same.fitness = 10, max.generations = 10)


mnodzenski/epistasisGA documentation built on Jan. 17, 2023, 7:07 p.m.