applyFnToGenes: apply a function to the genotypes (markers) in each gene...
In Mega2R: Accessing and Processing a 'Mega2' Genetic Database

applyFnToGenes

R Documentation

apply a function to the genotypes (markers) in each gene transcript and/or base pair range

Description

This function generates base pair ranges from its input arguments. Each range specifies a chromosome, a start base pair and end base pair. Typically, a range could be a gene transcript, though it could be a whole chromosome, or a run of base pairs on a chromosome. Once the ranges are generated, applyFnToRanges is called to find all the rows (i.e. markers) from the markers data frame that fall in each range. For these markers, a matrix of the genotypes is generated. Finally, the op function is called for each range with the arguments: markers, range, and 'environment'.

Usage

applyFnToGenes(op           = function (markers, range, envir) {},
               genes_arg    = NULL,
               ranges_arg   = matrix(ncol = 3, nrow = 0),
               chrs_arg     = vector("integer", 0),
               markers_arg  = vector("character", 0),
               type_arg     = "TX",
               fuzz_arg     = 0,
               envir        = ENV)

Arguments

`op`	Is a function of three arguments. It will be called repeatedly by `applyFnToGenes` in a try/catch context. The arguments are: markers Marker data for each marker selected. A marker is a data frame with the following 5 observations: locus_link is the ordinal ranking of this marker among all loci locus_link_fill is the position of corresponding marker genotype data in the unified_genotype_table MarkerName is the text name of the marker chromosome is the integer chromosome number position is the integer base pair position of marker range An indicator of which range argument these markers correspond to. envir An 'environment' holding Mega2R data frames and state data.
`genes_arg`	a character vector of gene names. All the transcripts identified with the specified gene in BioConductor Annotation, TxDb.Hsapiens.UCSC.hg19.knownGene, are selected. This produces multiple "range" elements containing chromosome, start base pair, end base pair. (If the gene name is "", all the transcript will be selected.) Note: BioCoductor Annotation org.Hs.eg.db* is used to convert from gene name to ENTREZ gene id.
`ranges_arg`	an integer matrix of three columns. The columns define a range: a chromosome number, a start base pair value, and an end base pair value.
`chrs_arg`	an integer vector of chromosome numbers. All of the base pairs on each chromosomes will be selected as a single range.
`markers_arg`	a data frame with the following 5 observations: locus_link is the ordinal ranking of this marker among all loci locus_link_fill is the position of corresponding marker genotype data in the unified_genotype_table MarkerName is the text name of the marker chromosome is the integer chromosome number position is the integer base pair position of marker
`type_arg`	a character vector of length 1 that contains "TX" or does not. If it is "TX", which is the default, the TX fields of BioConductor Annotation, TxDb.Hsapiens.UCSC.hg19.knownGene are used to define the base pair ranges and chromosome. Otherwise, the CDS fields are used.
`fuzz_arg`	is an integer vector of length one or two. The first argument is used to reduce the start base pair selected from each transcript and the second to increase the end base pair position. (If only one value is present, it is used for both adjustments.) Note: The values can be positive or negative.
`envir`	an 'environment' that contains all the data frames created from the SQLite database.

Value

None

Note

If you want subsequent calls to op to share information, data can be placed in a data frame that is added to the 'environment'.

Examples

  db = system.file("exdata", "seqsimm.db", package="Mega2R")
  ENV = read.Mega2DB(db)

  show = function(m, r, e) {
      print(r)
      print(m)
      print(head(getgenotypes(m, envir = e)))
  }

   # apply function "show" to all transcripts on genes ELL2 and CARD15

    # donttestcheck: time
    applyFnToGenes(show, genes_arg = c("CEP104"))


   # apply function "show" to all genotypes on chromosomes 11 for two base
   # pair ranges
   applyFnToGenes(show, ranges_arg = matrix(c(1, 5000000, 10000000,
                  1, 10000000, 15000000), ncol = 3, nrow = 2, byrow = TRUE))

   # apply function "show" to all genotypes for first marker in each chromosome
   applyFnToGenes(show, markers_arg = ENV$markers[! duplicated(ENV$markers$chromosome), 3])

   # apply function "show" to all genotypes on chromosomes 24 and 26
   applyFnToGenes(show, chrs_arg=c(24, 26))

Mega2R documentation built on May 29, 2024, 1:14 a.m.