applyFnToGenes: apply a function to the genotypes (markers) in each gene...

applyFnToGenesR Documentation

apply a function to the genotypes (markers) in each gene transcript and/or base pair range


This function generates base pair ranges from its input arguments. Each range specifies a chromosome, a start base pair and end base pair. Typically, a range could be a gene transcript, though it could be a whole chromosome, or a run of base pairs on a chromosome. Once the ranges are generated, applyFnToRanges is called to find all the rows (i.e. markers) from the markers data frame that fall in each range. For these markers, a matrix of the genotypes is generated. Finally, the op function is called for each range with the arguments: markers, range, and 'environment'.


applyFnToGenes(op           = function (markers, range, envir) {},
               genes_arg    = NULL,
               ranges_arg   = matrix(ncol = 3, nrow = 0),
               chrs_arg     = vector("integer", 0),
               markers_arg  = vector("character", 0),
               type_arg     = "TX",
               fuzz_arg     = 0,
               envir        = ENV)



Is a function of three arguments. It will be called repeatedly by applyFnToGenes in a try/catch context. The arguments are:


Marker data for each marker selected. A marker is a data frame with the following 5 observations:


is the ordinal ranking of this marker among all loci


is the position of corresponding marker genotype data in the unified_genotype_table


is the text name of the marker


is the integer chromosome number


is the integer base pair position of marker


An indicator of which range argument these markers correspond to.


An 'environment' holding Mega2R data frames and state data.


a character vector of gene names. All the transcripts identified with the specified gene in BioConductor Annotation,
TxDb.Hsapiens.UCSC.hg19.knownGene, are selected. This produces multiple "range" elements containing chromosome, start base pair, end base pair. (If the gene name is "*", all the transcript will be selected.) Note: BioCoductor Annotation is used to convert from gene name to ENTREZ gene id.


an integer matrix of three columns. The columns define a range: a chromosome number, a start base pair value, and an end base pair value.


an integer vector of chromosome numbers. All of the base pairs on each chromosomes will be selected as a single range.


a data frame with the following 5 observations:


is the ordinal ranking of this marker among all loci


is the position of corresponding marker genotype data in the


is the text name of the marker


is the integer chromosome number


is the integer base pair position of marker


a character vector of length 1 that contains "TX" or does not. If it is "TX", which is the default, the TX fields of BioConductor Annotation,
TxDb.Hsapiens.UCSC.hg19.knownGene are used to define the base pair ranges and chromosome. Otherwise, the CDS fields are used.


is an integer vector of length one or two. The first argument is used to reduce the start base pair selected from each transcript and the second to increase the end base pair position. (If only one value is present, it is used for both adjustments.) Note: The values can be positive or negative.


an 'environment' that contains all the data frames created from the SQLite database.




If you want subsequent calls to op to share information, data can be placed in a data frame that is added to the 'environment'.


  db = system.file("exdata", "seqsimm.db", package="Mega2R")
  ENV = read.Mega2DB(db)

  show = function(m, r, e) {
      print(head(getgenotypes(m, envir = e)))

   # apply function "show" to all transcripts on genes ELL2 and CARD15

    # donttestcheck: time
    applyFnToGenes(show, genes_arg = c("CEP104"))

   # apply function "show" to all genotypes on chromosomes 11 for two base
   # pair ranges
   applyFnToGenes(show, ranges_arg = matrix(c(1, 5000000, 10000000,
                  1, 10000000, 15000000), ncol = 3, nrow = 2, byrow = TRUE))

   # apply function "show" to all genotypes for first marker in each chromosome
   applyFnToGenes(show, markers_arg = ENV$markers[! duplicated(ENV$markers$chromosome), 3])

   # apply function "show" to all genotypes on chromosomes 24 and 26
   applyFnToGenes(show, chrs_arg=c(24, 26))

Mega2R documentation built on May 29, 2024, 1:14 a.m.