knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
suppressPackageStartupMessages({
  library(JASPAR2020)
  library(TFBSTools)
})

Introduction

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 157 PFMs were updated (125 for vertebrates, 28 for plants, and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles.

The easiest way to use the JASPAR2020 data package [@Fornes:2019] is by TFBSTools package interface [@Tan:2016], which provides functions to retrieve and manipulate data from the JASPAR database. This vignette demonstrates how to use those functions.

library(JASPAR2020)
library(TFBSTools)

Retrieving matrices from JASPAR2020 by ID or name

Matrices from JASPAR can be retrieved using either getMatrixByID or getMatrixByName function by providing a matrix ID or a matrix name from JASPAR, respectively. These functions accept either a single element as the ID/name parameter or a vector of values. The former case returns a PFMatrix object, while the later one returns a PFMatrixList with multiple PFMatrix objects.

## the user assigns a single matrix ID to the argument ID 
pfm <- getMatrixByID(JASPAR2020, ID="MA0139.1")
## the function returns a PFMatrix object
pfm

The user can utilise the PFMatrix object for further analysis and visualisation. Here is an example of how to plot a sequence logo of a given matrix using functions available in TFBSTools package.

seqLogo(toICM(pfm))
## the user assigns multiple matrix IDs to the argument ID 
pfmList <- getMatrixByID(JASPAR2020, ID=c("MA0139.1", "MA1102.1"))
## the function returns a PFMatrix object
pfmList

## PFMatrixList can be subsetted to extract enclosed PFMatrix objects
pfmList[[2]]

getMatrixByName retrieves matrices by name. If multiple matrix names are supplied, the function returns a PFMatrixList object.

pfm <- getMatrixByName(JASPAR2020, name="Arnt")
pfm

pfmList <- getMatrixByName(JASPAR2020, name=c("Arnt", "Ahr::Arnt"))
pfmList

The use of filtering criteria

The getMatrixSet function fetches all matrices that match criteria defined by the named arguments, and it returns a PFMatrixList object.

## select all matrices found in a specific species and constructed from the SELEX experiment
opts <- list()
opts[["species"]] <- 9606
opts[["type"]] <- "SELEX"
opts[["all_versions"]] <- TRUE
PFMatrixList <- getMatrixSet(JASPAR2020, opts)
PFMatrixList

## retrieve all matrices constructed from SELEX experiment
opts2 <- list()
opts2[["type"]] <- "SELEX"
PFMatrixList2 <- getMatrixSet(JASPAR2020, opts2)
PFMatrixList2

Additional details about TFBS matrix analysis can be found in the TFBSTools documantation.

Session Info

Here is the output of sessionInfo() on the system on which this document was compiled:

sessionInfo()

Bibliography



da-bar/JASPAR2020 documentation built on May 3, 2020, 4:58 p.m.