builds: Utilities for working with _HUMAN_ genome builds

buildsR Documentation

Utilities for working with HUMAN genome builds

Description

A few functions are available to search for build versions, either from NCBI or UCSC.

  • translateBuild: translates between UCSC and NCBI build versions

  • extractBuild: use grep patterns to find the first build within the string input

  • uniformBuilds: replace build occurrences below a threshold level of occurence with the alternative build

  • correctBuild: Ensure that the build annotation is correct based on the NCBI/UCSC website. If not, use translateBuild with the indicated 'style' input

  • isCorrect: Check to see if the build is exactly as annotated

Usage

translateBuild(from, to = c("UCSC", "NCBI"))

correctBuild(build, style = c("UCSC", "NCBI"))

isCorrect(build, style = c("UCSC", "NCBI"))

extractBuild(string, build = c("UCSC", "NCBI"))

uniformBuilds(builds, cutoff = 0.2, na = c("", "NA"))

Arguments

from

character() A vector of build versions typically from genome() (e.g., "37"). The build vector must be homogenous (i.e., length(unique(x)) == 1L).

to

character(1) The name of the desired build version (either "UCSC" or "NCBI"; default: "UCSC")

build

A vector of build version names (default UCSC, NCBI)

style

character(1) The annotation style, either 'UCSC' or 'NCBI'

string

A single character string

builds

A character vector of builds

cutoff

numeric(1L) An inclusive threshold tolerance value for missing values and translating builds that are below the threshold

na

character() The values to be considered as missing (default: c("", "NA"))

Details

The correctBuild function takes the input and ensures that the style specified matches the input. Otherwise, it will return the correct style for use with seqlevelsStyle. Currently, the function does not support patched builds (e.g., 'GRCh38.p13') Build names are taken from the website: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/

Value

translateBuild: A character vector of translated genome builds

extractBuild: A character string of the build information available

uniformBuilds: A character vector of builds where all builds are
    identical `identical(length(unique(build)), 1L)`

correctBuild: A character string of the 'corrected' build name

isCorrect: A logical indicating if the build is exactly as annotated

Examples


translateBuild("GRCh35", "UCSC")


correctBuild("grch38", "NCBI")
correctBuild("hg19", "NCBI")


isCorrect("GRCh38", "NCBI")

isCorrect("hg19", "UCSC")


extractBuild(
"SCENA_p_TCGAb29and30_SNP_N_GenomeWideSNP_6_G05_569110.nocnv_grch38.seg.txt"
)


buildvec <- rep(c("GRCh37", "hg19"), times = c(5, 1))
uniformBuilds(buildvec)

navec <- c(rep(c("GRCh37", "hg19"), times = c(5, 1)), "NA")
uniformBuilds(navec)


waldronlab/TCGAutils documentation built on Feb. 25, 2024, midnight