knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Cellosaurus is a comprehensive knowledge resource dedicated to cell lines, providing a wealth of information about various types of cells used in biomedical research. It serves as a centralized repository that offers detailed data on cell lines, including their origins, characteristics, authentication methods, references, and more. Please view the Cellosaurus website at https://web.expasy.org/cellosaurus/ for more information and a detailed description can be found at https://www.cellosaurus.org/description.html.
The AnnotationGx
package provides a wrapper around the Cellosaurus API to
map cell line identifiers to the Cellosaurus database fields.
library(AnnotationGx) library(data.table) # set options to warn to quiet info logs options("log_level" = "WARN")
The main function that is provided by the package is mapCell2Accession
. This function
takes in a vector of cell line identifiers and returns a data.table
.
By default, the function will try to map using the common identifiers and synonyms (from = "idsy"
) and
will return the the Standardized Identifier as cellLineName
and the Cellosaurus Accession ID accession
.
The function also returns an additional column query
which can be used to identify the original query if needed.
Let's see how we can use this function to map the "HeLa" and "A549" cell line names to the Cellosaurus database.
mapCell2Accession("hela")
mapCell2Accession("A549")
Functionality for mapping multiple cell lines is also supported. ``` {r map BT474 each} mapCell2Accession(c("A549", "THIS SHOULD FAIL", "BT474"))
By default, the function will parse the API responses to return the most common mapping. To return all possible mappings, set `parsed = FALSE`. ```{R parsed} # parsed mapCell2Accession(c("A549", "hela", "BT474"), parsed = TRUE) # no parsing mapCell2Accession(c("A549", "hela", "BT474"), parsed = FALSE)
The backend of the function also tries to map any misspellings or synonyms of the cell line names. ```{R mispellings}
samples <- c("SK23", "SJCRH30") mapCell2Accession(samples)
If some cell lines still cannot be found, there is an additional parameter for fuzzy searching ```{R fuzzy} # No fuzzy mapCell2Accession("DOR 13") # Fuzzy mapCell2Accession("DOR 13", fuzzy =T)
Once accession IDs are obtained and the mappings are satisfactory, they can then be mapped to other fields in the Cellosaurus database.
A list of available fields can be found using cellosaurus_fields()
```{R fields} cellosaurus_fields()
The `annotateCellAccession()` function can be used to map the accession IDs to the desired fields. By default the function will try to map to `"id", "ac", "hi", "sy", "ca", "sx", "ag", "di", "derived-from-site", "misspelling", "dt"` ```{R annotate} # Annotate the A549 cell line mappedAccessions <- mapCell2Accession("A549") annotateCellAccession(accessions = mappedAccessions$accession)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.