src: Create a sqlite database from TxDb and corresponding Org...

src_organismR Documentation

Create a sqlite database from TxDb and corresponding Org packages

Description

The database provides a convenient way to map between gene, transcript, and protein identifiers.

'select_.tbl_organism()' is DEPRECATED, please use 'select()'.

Usage

src_organism(txdb = NULL, dbpath = NULL, overwrite = FALSE)

src_ucsc(organism, genome = NULL, id = NULL, dbpath = NULL, verbose = TRUE)

supportedOrganisms()

## S3 method for class 'tbl_organism'
select_(.data, ...)

## S3 method for class 'src_organism'
src_tbls(x, ...)

## S3 method for class 'src_organism'
tbl(src, from, ...)

## S4 method for signature 'src_organism'
orgPackageName(x)

## S4 method for signature 'src_organism'
seqinfo(x)

Arguments

txdb

character(1) naming a TxDb.* package (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene) or a TxDb object instantiating the content of a TxDb.* pacakge.

dbpath

character(1) path or BiocFileCache instance representing the location where an Organism.dplyr SQLite database will be accessed or created. If no path is specified, the SQLite file is created in the default BiocFileCache() location.

overwrite

logical(1) overwrite an exisging 'dbpath' contains an Organism.dplyr SQLite databse different from the version implied by 'txdb'?

organism

organism or common name

genome

genome name

id

choose from "knownGene", "ensGene" and "refGene"

verbose

logical. Should R report extra information on progress? Default is TRUE.

.data

A tbl.

...

Comma separated list of unquoted expressions. You can treat variable names like they are positions. Use positive values to select variables; use negative values to drop variables.

x

A src_organism object

src

An src_organism object

from

character(1) name of temporary table in 'src'.

Details

src_organism() and src_ucsc() are meant to be a building block for src_organism, which provides an integrated presentation of identifiers and genomic coordinates.

src_organism() creates a dplyr database integrating org.* and TxDb.* information by given TxDb. And src_ucsc() creates the database by given organism name, genome and/or id.

supportedOrganisms() provides all supported organisms in this package with corresponding OrgDb and TxDb.

The 'tbl.src_organism()' parameter '.load_tbl_only' has been removed. The function behaves as '.load_tbl_only = FALSE' (the previous default); for '.load_tbl_only = TRUE', use 'tbl(src$con, ...)'.

Value

src_organism() and src_ucsc() returns a dplyr src_dbi instance representing the data tables.

A tibble of the requested table coming from the temporary database of the src_organism object.

Author(s)

Yubo Cheng.

See Also

dplyr for details about using dplyr to manipulate data.

transcripts_tbl for generic functions to extract genomic features from a src_organism object.

select,src_organism-method for "select" interface on src_organism objects.

Examples

## create human sqlite database with TxDb.Hsapiens.UCSC.hg38.knownGene and
## corresponding org.Hs.eg.db
## Not run: src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")
src <- src_organism(dbpath=hg38light())

## query using dplyr
inner_join(tbl(src, "id"), tbl(src, "id_go")) %>%
     filter(symbol == "ADA") %>%
     dplyr::select(entrez, ensembl, symbol, go, evidence, ontology)

## create human sqlite database using hg38 genome
## Not run: human <- src_ucsc("human")

## all supported organisms with corresponding OrgDb and TxDb
supportedOrganisms()

## Look at all available tables
src_tbls(src)

## Look at data in table "id"
tbl(src, "id")

## Look at fields of one table
colnames(tbl(src, "id"))

## name of org package of src_organism object
orgPackageName(src)

## seqinfo of src_organism object
seqinfo(src)


Bioconductor/Organism.dplyr documentation built on Nov. 2, 2023, 12:57 a.m.