knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
suppressPackageStartupMessages({ library(JASPAR2020) library(TFBSTools) })
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 157 PFMs were updated (125 for vertebrates, 28 for plants, and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles.
The easiest way to use the JASPAR2020 data package [@Fornes:2019] is by TFBSTools
package interface [@Tan:2016], which provides functions to retrieve and manipulate data from the JASPAR database. This vignette demonstrates how to use those functions.
library(JASPAR2020) library(TFBSTools)
Matrices from JASPAR can be retrieved using either getMatrixByID
or getMatrixByName
function by providing a matrix ID or a matrix name from JASPAR, respectively. These functions accept either a single element as the ID/name parameter or a vector of values. The former case returns a PFMatrix
object, while the later one returns a PFMatrixList
with multiple PFMatrix
objects.
## the user assigns a single matrix ID to the argument ID pfm <- getMatrixByID(JASPAR2020, ID="MA0139.1") ## the function returns a PFMatrix object pfm
The user can utilise the PFMatrix object for further analysis and visualisation. Here is an example of how to plot a sequence logo of a given matrix using functions available in TFBSTools
package.
seqLogo(toICM(pfm))
## the user assigns multiple matrix IDs to the argument ID pfmList <- getMatrixByID(JASPAR2020, ID=c("MA0139.1", "MA1102.1")) ## the function returns a PFMatrix object pfmList ## PFMatrixList can be subsetted to extract enclosed PFMatrix objects pfmList[[2]]
getMatrixByName
retrieves matrices by name. If multiple matrix names are supplied, the function returns a PFMatrixList object.
pfm <- getMatrixByName(JASPAR2020, name="Arnt") pfm pfmList <- getMatrixByName(JASPAR2020, name=c("Arnt", "Ahr::Arnt")) pfmList
The getMatrixSet
function fetches all matrices that match criteria defined by the named arguments, and it returns a PFMatrixList
object.
## select all matrices found in a specific species and constructed from the SELEX experiment opts <- list() opts[["species"]] <- 9606 opts[["type"]] <- "SELEX" opts[["all_versions"]] <- TRUE PFMatrixList <- getMatrixSet(JASPAR2020, opts) PFMatrixList ## retrieve all matrices constructed from SELEX experiment opts2 <- list() opts2[["type"]] <- "SELEX" PFMatrixList2 <- getMatrixSet(JASPAR2020, opts2) PFMatrixList2
Additional details about TFBS matrix analysis can be found in the TFBSTools documantation.
Here is the output of sessionInfo()
on the system on which this document was compiled:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.