load_biomart_annotations: Extract annotation information from biomart.

View source: R/annotation_biomart.R

load_biomart_annotationsR Documentation

Extract annotation information from biomart.

Description

Biomart is an amazing resource of information, but using it is a bit annoying. This function hopes to alleviate some common headaches.

Usage

load_biomart_annotations(
  species = "hsapiens",
  overwrite = FALSE,
  do_save = TRUE,
  host = NULL,
  trymart = "ENSEMBL_MART_ENSEMBL",
  archive = TRUE,
  default_hosts = c("useast.ensembl.org", "uswest.ensembl.org", "www.ensembl.org",
    "asia.ensembl.org"),
  year = NULL,
  month = NULL,
  drop_haplotypes = FALSE,
  trydataset = NULL,
  gene_requests = c("ensembl_gene_id", "version", "ensembl_transcript_id",
    "transcript_version", "description", "gene_biotype"),
  length_requests = c("ensembl_transcript_id", "cds_length", "chromosome_name", "strand",
    "start_position", "end_position"),
  gene_tx_map = TRUE,
  gene_id_column = "ensembl_gene_id",
  gene_version_column = "version",
  tx_id_column = "ensembl_transcript_id",
  tx_version_column = "transcript_version",
  symbol_columns = NULL,
  include_lengths = TRUE,
  do_load = TRUE,
  savefile = NULL
)

Arguments

species

Choose a species.

overwrite

Overwite an existing save file?

do_save

Create a savefile of annotations for future runs?

host

Ensembl hostname to use.

trymart

Biomart has become a circular dependency, this makes me sad, now to list the marts, you need to have a mart loaded.

archive

Try an archive server instead of a mirror? If this is a character, it will assume it is a specific archive hostname.

default_hosts

List of biomart mirrors to try.

year

Choose specific year(s) for the archive servers?

month

Choose specific month(s) for the archive server?

drop_haplotypes

Some chromosomes have stupid names because they are from non-standard haplotypes and they should go away. Setting this to false stops that.

trydataset

Choose the biomart dataset from which to query.

gene_requests

Set of columns to query for description-ish annotations.

length_requests

Set of columns to query for location-ish annotations.

gene_tx_map

Provide a gene2tx map for things like salmon (perhaps rename this to tx_gene_map?)

gene_id_column

Column containing the gene ID.

gene_version_column

Column containing the ensembl gene version.

tx_id_column

Column containing the transcript ID.

tx_version_column

Columns containing the ensembl transcript version.

symbol_columns

Vector of columns containing the gene symbols.

include_lengths

Also perform a search on structural elements in the genome?

do_load

Load the data?

savefile

Use this savefile.

Details

Tested in test_40ann_biomart.R This goes to some lengths to find the relevant tables in biomart. But biomart is incredibly complex and one should carefully inspect the output if it fails to see if there are more appropriate marts, datasets, and columns to download.

Value

List containing: a data frame of the found annotations, a copy of The mart instance to help with finding problems, the hostname queried, the name of the mart queried, a vector of rows queried, vector of the available attributes, and the ensembl dataset queried.

See Also

[biomaRt::listDatasets()] [biomaRt::getBM()] [find_working_mart()]

Examples

 ## This downloads the hsapiens annotations by default.
 hs_biomart_annot <- load_biomart_annotations()
 summary(hs_biomart_annot)
 dim(hs_biomart_annot$annotation)

elsayed-lab/hpgltools documentation built on May 9, 2024, 5:02 a.m.