View source: R/annotation_biomart.R
load_biomart_annotations | R Documentation |
Biomart is an amazing resource of information, but using it is a bit annoying. This function hopes to alleviate some common headaches.
load_biomart_annotations(
species = "hsapiens",
overwrite = FALSE,
do_save = TRUE,
host = NULL,
trymart = "ENSEMBL_MART_ENSEMBL",
archive = TRUE,
default_hosts = c("useast.ensembl.org", "uswest.ensembl.org", "www.ensembl.org",
"asia.ensembl.org"),
year = NULL,
month = NULL,
drop_haplotypes = FALSE,
trydataset = NULL,
gene_requests = c("ensembl_gene_id", "version", "ensembl_transcript_id",
"transcript_version", "description", "gene_biotype"),
length_requests = c("ensembl_transcript_id", "cds_length", "chromosome_name", "strand",
"start_position", "end_position"),
gene_tx_map = TRUE,
gene_id_column = "ensembl_gene_id",
gene_version_column = "version",
tx_id_column = "ensembl_transcript_id",
tx_version_column = "transcript_version",
symbol_columns = NULL,
include_lengths = TRUE,
do_load = TRUE,
savefile = NULL
)
species |
Choose a species. |
overwrite |
Overwite an existing save file? |
do_save |
Create a savefile of annotations for future runs? |
host |
Ensembl hostname to use. |
trymart |
Biomart has become a circular dependency, this makes me sad, now to list the marts, you need to have a mart loaded. |
archive |
Try an archive server instead of a mirror? If this is a character, it will assume it is a specific archive hostname. |
default_hosts |
List of biomart mirrors to try. |
year |
Choose specific year(s) for the archive servers? |
month |
Choose specific month(s) for the archive server? |
drop_haplotypes |
Some chromosomes have stupid names because they are from non-standard haplotypes and they should go away. Setting this to false stops that. |
trydataset |
Choose the biomart dataset from which to query. |
gene_requests |
Set of columns to query for description-ish annotations. |
length_requests |
Set of columns to query for location-ish annotations. |
gene_tx_map |
Provide a gene2tx map for things like salmon (perhaps rename this to tx_gene_map?) |
gene_id_column |
Column containing the gene ID. |
gene_version_column |
Column containing the ensembl gene version. |
tx_id_column |
Column containing the transcript ID. |
tx_version_column |
Columns containing the ensembl transcript version. |
symbol_columns |
Vector of columns containing the gene symbols. |
include_lengths |
Also perform a search on structural elements in the genome? |
do_load |
Load the data? |
savefile |
Use this savefile. |
Tested in test_40ann_biomart.R This goes to some lengths to find the relevant tables in biomart. But biomart is incredibly complex and one should carefully inspect the output if it fails to see if there are more appropriate marts, datasets, and columns to download.
List containing: a data frame of the found annotations, a copy of The mart instance to help with finding problems, the hostname queried, the name of the mart queried, a vector of rows queried, vector of the available attributes, and the ensembl dataset queried.
[biomaRt::listDatasets()] [biomaRt::getBM()] [find_working_mart()]
## This downloads the hsapiens annotations by default.
hs_biomart_annot <- load_biomart_annotations()
summary(hs_biomart_annot)
dim(hs_biomart_annot$annotation)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.