A function to assign a fitness score to a chromosome

View source: R/ Documentation

A function to assign a fitness score to a chromosome


This function assigns a fitness score to a chromosome. It is a wrapper for the Rcpp function chrom_fitness_score.

  n.different.snps.weight = 2, = 1,
  recessive.ref.prop = 0.75,
  recode.test.stat = 1.64,
  epi.test = FALSE


The genetic data of the disease affected children from case-parent trios or disease-discordant sibling pairs. If searching for maternal SNPs that are related to risk of disease in the child, some of the columns in may contain maternal SNP genotypes (See argument mother.snps for how to indicate which SNPs columns correspond to maternal genotypes). Columns are SNP allele counts, and rows are individuals. This object may either be of class matrix' OR of class 'big.matrix'. If of class 'big.matrix' it must be file backed as type 'integer' (see the bigmemory package for more information). The ordering of the columns must be consistent with the LD structure specified in ld.block.vec. The genotypes cannot be dosages imputed with uncertainty.

A genetic dataset for the controls corresponding to the genotypes in SNPs that correspond to the affected child in, the corresponding column in should be set equal to mother allele count + father allele count - case allele count. If using disease-discordant siblings this argument should be the genotypes for the unaffected siblings. For SNPs in that represent maternal genotypes (if any) the corresponding column in should be the paternal genotypes for that SNP. Regardless, may be an object of either class matrix' OR of class 'big.matrix'. If of class 'big.matrix' it must be file backed as type 'integer' (see the bigmemory package for more information). Columns are SNP allele counts, rows are families. If not specified, and must be specified. The genotypes cannot be dosages imputed with uncertainty.


An integer vector of the columns corresponding to the collection of SNPs, or chromosome, for which the fitness score will be computed.


An integer vector specifying the linkage blocks of the input SNPs. As an example, for 100 candidate SNPs, suppose we specify ld.block.vec <- c(25, 75, 100). This vector indicates that the input genetic data has 3 distinct linkage blocks, with SNPs 1-25 in the first linkage block, 26-75 in the second block, and 76-100 in the third block. Note that this means the ordering of the columns (SNPs) in must be consistent with the LD blocks specified in ld.block.vec. In the absence of outside information, a reasonable default is to consider SNPs to be in LD if they are located on the same biological chromosome. If includes both maternal and child SNP genotypes, we recommend considering any maternal SNP and any child SNP located on the same nominal biological chromosome as 'in linkage'. E.g., we recommend considering any maternal SNPs located on chromosome 1 as being 'linked' to any child SNPs located on chromosome 1, even though, strictly speaking, the maternal and child SNPs are located on separate pieces of DNA.If not specified, this defaults to assuming all input SNPs are in linkage, which may be overly conservative and could adversely affect performance.


A vector that maps a family weight to the weighted sum of the number of different SNPs and SNPs both equal to one.


The number by which the number of different SNPs between a case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 2.

The number by which the number of SNPs equal to 1 in both the case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 1.


The proportion to which the observed proportion of informative cases with the provisional risk genotype(s) will be compared to determine whether to recode the SNP as recessive. Defaults to 0.75.


For a given SNP, the minimum test statistic required to recode and recompute the fitness score using recessive coding. Defaults to 1.64.


A logical indicating whether the function should return the information required to run function epistasis.test for a given SNP-set.


A list:


The chromosome fitness score.


The weighted mean difference vector corresponding to the chromosome, with each element divided by it's pseudo-standard error. The magnitudes of these values are not particularly important, but the sign is useful. A positive value for a given SNP indicates the minor allele is positively associated with disease status, while a negative value implies the reference (‘wild type’) allele is positively associated with the disease.


The number of cases with a risk-related genotype at each locus over the total number of cases or controls that have a full set of risk genotypes at each locus, among families where only one of the case or control has the full risk set.


A vector indicating the number risk alleles a case or complement must have for each SNP in target.snps for the case or complement to be classified as having the proposed risk set. '1+' indicates at least one copy of the risk allele is required, while '2' indicates 2 copies are needed. The risk allele can be determined based on the signs of the elements of sum_dif_vecs, where a negative value indicates the major allele for a given SNP is the risk allele, while a positive value implicates the minor allele.


An integer vector of the informative family rows. Only returned if epi.test = TRUE.


case <- as.matrix(case)
dad <- as.matrix(dad)
mom <- as.matrix(mom)
comp <- mom + dad - case
weight.lookup <- vapply(seq_len(6), function(x) 2^x, 1)
storage.mode(weight.lookup) <- "integer"
block.ld.vec <- cumsum(rep(25, 4)), comp, c(1, 4, 7),
                    block.ld.vec, weight.lookup)

mnodzenski/epistasisGA documentation built on Jan. 17, 2023, 7:07 p.m.