gscores: Accessing genomic gscores

gscoresR Documentation

Accessing genomic gscores

Description

Functions to access genomic gscores through GScores objects.

Usage

## S4 method for signature 'GScores,GenomicRanges'
gscores(x, ranges, ...)
## S4 method for signature 'GScores,character'
gscores(x, ranges, ...)
## S4 method for signature 'GScores'
score(x, ..., simplify=TRUE)

Arguments

x

A GScores object.

ranges

A GenomicRanges object with positions from where to retrieve genomic scores, or a character string vector with identifiers associated by the data producer with the genomic scores, e.g., dbSNP 'rs' identifiers in the case of some MafDb.* packages.

...

In the call to the gscores() method one can additionally specify the following optional parameters:

  • pop: Character string vector specifying the scores populations to query, when there is more than one. By default, its value is 'defaultPopulation(x)'. Use populations() to find out the available scores populations.

  • type: Character string specifying the type of genomic position being sought, which can be a single nucleotide range (snr), by default, or a nonsnr spanning multiple nucleotides. The latter is the case of indel variants in minor allele frequency data.

  • scores.only: Flag setting whether only the scores should be returned as a numeric vector (TRUE), instead of returning them as a metadata column in a GRanges object (FALSE, default).

  • summaryFun: Function to summarize genomic scores when more than one position is retrieved. By default, this is set to the arithmetic mean, i.e., the mean() function.

  • quantized: Flag setting whether the genomic scores should be returned quantized (TRUE) or dequantized (FALSE, default).

  • ref: Vector of reference alleles in the form of either a character vector, a DNAStringSet object or a DNAStringSetList object. This argument is used only when either there are multiple scores per position or x is a MafDb.* package.

  • alt: Vector of alternative alleles in the form of either a character vector, a DNAStringSet object or a DNAStringSetList object. This argument is used only when either there are multiple scores per position or x is a MafDb.* package.

  • minoverlap: Integer value passed internally to the function findOverlaps() from the IRanges package, when querying genomic positions associated with multiple-nucleotide ranges (nonSNRs). By default, minoverlap=1L, which assumes that the sought nonSNRs are stored as in VCF files, using the nucleotide composition of the reference sequence. This argument is only relevant for genomic scores associated with nonSNRs.

  • caching: Flag setting whether genomic scores per chromosome should be kept cached in memory (TRUE, default) or not (FALSE). The latter option minimizes the memory footprint but slows down the performance when the gscores() method is called multiple times.

simplify

Flag setting whether the result should be simplified to a vector (TRUE, default) if possible. This happens when scores from a single population are queried.

Details

The method gscores() takes as first argument a GScores object, previouly loaded from either an annotation package or an AnnotationHub resource; see getGScores().

The arguments ref and alt serve two purposes. One, when there are multiple scores per position, such as with CADD or M-CAP, and we want to select a score matching a specific combination of reference and alternate alleles. The other purpose is when the GScores object x is a MafDb.* package, then by providing ref and alt alelles we will get separate frequencies for reference and alternate alleles. The current lossy compression of these values yields a correct assignment for biallelic variants in the corresponding MafDb.* package and an approximation for multiallelic ones.

Value

The method gscores() returns a GRanges object with the genomic scores in a metadata column called score. The method score() returns a numeric vector with the genomic scores.

Author(s)

R. Castelo

References

Puigdevall, P. and Castelo, R. GenomicScores: seamless access to genomewide position-specific scores from R and Bioconductor. Bioinformatics, 18:3208-3210, 2018.

See Also

phastCons100way.UCSC.hg38 MafDb.1Kgenomes.phase1.hs37d5

Examples

## one genomic range of width 5
gr1 <- GRanges(seqnames="chr22", IRanges(start=50528591, width=5))
gr1

## five genomic ranges of width 1
gr2 <- GRanges(seqnames="chr22", IRanges(start=50528591:50528596, width=1))
gr2

## accessing genomic gscores from an annotation package
if (require(phastCons100way.UCSC.hg38)) {
  library(GenomicRanges)

  gsco <- phastCons100way.UCSC.hg38
  gsco
  gscores(gsco, gr1)
  score(gsco, gr1)
  gscores(gsco, gr2)
  populations(gsco)
  gscores(gsco, gr2, pop="DP2")
}

if (require(MafDb.1Kgenomes.phase1.hs37d5)) {
  mafdb <- MafDb.1Kgenomes.phase1.hs37d5
  mafdb
  populations(mafdb)

  ## lookup allele frequencies for SNP rs1129038, located at 15:28356859, a
  ## SNP associated to blue and brown eye colors as reported by Eiberg et al.
  ## Blue eye color in humans may be caused by a perfectly associated founder
  ## mutation in a regulatory element located within the HERC2 gene
  ## inhibiting OCA2 expression. Human Genetics, 123(2):177-87, 2008
  ## [http://www.ncbi.nlm.nih.gov/pubmed/18172690]
  gscores(mafdb, GRanges("15:28356859"), pop=populations(mafdb))
  gscores(mafdb, "rs1129038", pop=populations(mafdb))
}

rcastelo/GenomicScores documentation built on Jan. 19, 2024, 2:09 p.m.