knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
suppressPackageStartupMessages({
  library(JASPAR2022)
  library(TFBSTools)
})

Introduction

JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.

The easiest way to use the JASPAR2022 data package [@10.1093/nar/gkab1113] is by TFBSTools package interface [@Tan:2016], which provides functions to retrieve and manipulate data from the JASPAR database. This vignette demonstrates how to use those functions.

library(JASPAR2022)
library(TFBSTools)

Retrieving matrices from JASPAR2022 by ID or name

Matrices from JASPAR can be retrieved using either getMatrixByID or getMatrixByName function by providing a matrix ID or a matrix name from JASPAR, respectively. These functions accept either a single element as the ID/name parameter or a vector of values. The former case returns a PFMatrix object, while the later one returns a PFMatrixList with multiple PFMatrix objects.

## the user assigns a single matrix ID to the argument ID 
pfm <- getMatrixByID(JASPAR2022, ID="MA0139.1")
## the function returns a PFMatrix object
pfm

The user can utilise the PFMatrix object for further analysis and visualisation. Here is an example of how to plot a sequence logo of a given matrix using functions available in TFBSTools package.

seqLogo(toICM(pfm))
## the user assigns multiple matrix IDs to the argument ID 
pfmList <- getMatrixByID(JASPAR2022, ID=c("MA0139.1", "MA1102.1"))
## the function returns a PFMatrix object
pfmList

## PFMatrixList can be subsetted to extract enclosed PFMatrix objects
pfmList[[2]]

getMatrixByName retrieves matrices by name. If multiple matrix names are supplied, the function returns a PFMatrixList object.

pfm <- getMatrixByName(JASPAR2022, name="Arnt")
pfm

pfmList <- getMatrixByName(JASPAR2022, name=c("Arnt", "Ahr::Arnt"))
pfmList

The use of filtering criteria

The getMatrixSet function fetches all matrices that match criteria defined by the named arguments, and it returns a PFMatrixList object.

## select all matrices found in a specific species and constructed from the SELEX experiment
opts <- list()
opts[["species"]] <- 9606
opts[["type"]] <- "SELEX"
opts[["all_versions"]] <- TRUE
PFMatrixList <- getMatrixSet(JASPAR2022, opts)
PFMatrixList

## retrieve all matrices constructed from SELEX experiment
opts2 <- list()
opts2[["type"]] <- "SELEX"
PFMatrixList2 <- getMatrixSet(JASPAR2022, opts2)
PFMatrixList2

Additional details about TFBS matrix analysis can be found in the TFBSTools documantation.

Session Info

Here is the output of sessionInfo() on the system on which this document was compiled:

sessionInfo()

Bibliography



da-bar/JASPAR2022 documentation built on Nov. 21, 2023, 1:34 p.m.