available.genomes: Find available/installed genomes

View source: R/available.genomes.R

available.genomesR Documentation

Find available/installed genomes

Description

available.genomes gets the list of BSgenome data packages that are available in the Bioconductor repositories for your version of R/Bioconductor.

installed.genomes gets the list of BSgenome data packages that are currently installed on your system.

getBSgenome searchs the installed BSgenome data packages for the specified genome and returns it as a BSgenome object.

Usage

available.genomes(splitNameParts=FALSE, type=getOption("pkgType"))

installed.genomes(splitNameParts=FALSE)

getBSgenome(genome, masked=FALSE, load.only=FALSE)

Arguments

splitNameParts

Whether to split or not the package names in parts. In that case the result is returned in a data frame with 5 columns.

type

Character string indicating the type of package ("source", "mac.binary" or "win.binary") to look for.

genome

A BSgenome object, or the full name of an installed BSgenome data package, or a short string specifying the name of an NCBI assembly (e.g. "GRCh38", "TAIR10.1", etc...) or UCSC genome (e.g. "hg38", "bosTau9", "galGal6", "ce11", etc...). The supplied short string must refer unambiguously to an installed BSgenome data package.

masked

TRUE or FALSE. Whether to search for the masked BSgenome object (i.e. the object that contains the masked sequences) or not (the default).

load.only

TRUE or FALSE. By default getBSgenome loads and attaches the BSgenome data package containing the requested genome, resulting in its addition to the search path. Use load.only=TRUE to prevent this, in which case the BSgenome data package is loaded but not attached.

Details

A BSgenome data package contains the full genome sequences for a given organism.

Its name typically has 4 parts (5 parts if it's a masked BSgenome data package i.e. if it contains masked sequences) separated by a dot e.g. BSgenome.Mmusculus.UCSC.mm10 or BSgenome.Mmusculus.UCSC.mm10.masked:

  1. The 1st part is always BSgenome.

  2. The 2nd part is the name of the organism in abbreviated form e.g. Mmusculus, Hsapiens, Celegans, Scerevisiae, Ecoli, etc...

  3. The 3rd part is the name of the organisation who provided the genome sequences. We formally refer to it as the provider of the genome. E.g. UCSC, NCBI, TAIR, etc...

  4. The 4th part is a short string specifying the name of an NCBI assembly (e.g. GRCh38, TAIR10.1, etc...) or UCSC genome (e.g. hg38, mm10, susScr11, bosTau9, galGal6, ce11, etc...).

  5. If the package contains masked sequences, its name has the .masked suffix added to it, which is typically the 5th part.

A BSgenome data package contains a single top-level object (a BSgenome object) named like the package itself that can be used to access the genome sequences.

Value

For available.genomes and installed.genomes: by default (i.e. if splitNameParts=FALSE), a character vector containing the names of the BSgenome data packages that are available (for available.genomes) or currently installed (for installed.genomes). If splitNameParts=TRUE, the list of packages is returned in a data frame with one row per package and the following columns: pkgname (character), organism (factor), provider (factor), genome (character), and masked (logical).

For getBSgenome: the BSgenome object containing the sequences for the specified genome. Or an error if the object cannot be found in the BSgenome data packages currently installed.

Author(s)

H. Pagès

See Also

  • BSgenome objects.

  • available.packages.

Examples

## ---------------------------------------------------------------------
## available.genomes() and installed.genomes()
## ---------------------------------------------------------------------

# What genomes are currently installed:
installed.genomes()

# What genomes are available:
available.genomes()

# Split the package names in parts:
av_gen <- available.genomes(splitNameParts=TRUE)
table(av_gen$organism)
table(av_gen$provider)

# Make your choice and install with:
if (interactive()) {
    if (!require("BiocManager"))
        install.packages("BiocManager")
    BiocManager::install("BSgenome.Scerevisiae.UCSC.sacCer1")
}

# Have a coffee 8-)

# Load the package and display the index of sequences for this genome:
library(BSgenome.Scerevisiae.UCSC.sacCer1)
Scerevisiae  # same as BSgenome.Scerevisiae.UCSC.sacCer1

## ---------------------------------------------------------------------
## getBSgenome()
## ---------------------------------------------------------------------

## Specify the full name of an installed BSgenome data package:
genome <- getBSgenome("BSgenome.Celegans.UCSC.ce2")
genome

## Specify a UCSC genome:
genome <- getBSgenome("hg38")
class(genome)  # BSgenome object
seqinfo(genome)
genome$chrM

genome <- getBSgenome("hg38", masked=TRUE)
class(genome)  # MaskedBSgenome object
seqinfo(genome)
genome$chr22

Bioconductor/BSgenome documentation built on Oct. 31, 2024, 10:46 p.m.