getMatrixSet-methods: Advanced JASPAR database search functions 'get_MatrixSet'

Description Usage Arguments Details Value Author(s) See Also Examples

Description

This function fetches matrix data for all matrices in the database matching criteria defined by the named arguments and returns a PFMatrixList object

Usage

1
2
3
4
5
6
  ## S4 method for signature 'character'
getMatrixSet(x, opts)
  ## S4 method for signature 'SQLiteConnection'
getMatrixSet(x, opts)
  ## S4 method for signature 'JASPAR2014'
getMatrixSet(x, opts)

Arguments

x

a character vector of length 1 for the path of JASPAR SQLite file, a SQLiteConnection object, or a JASPAR2014 object.

opts

a search options list. See more details below.

Details

The search options include three categories:

(1) Database basic criterias:

all=c(TRUE, FALSE)

ID: a unique identifier for each model. CORE matrices always have a "MAnnnnIDs.Version".

name: The name of the transcription factor. As far as possible, the name is based on the standardized Entrez gene symbols. In the case the model describes a transcription factor hetero-dimer, two names are concatenated, such as RXR-VDR. In a few cases, different splice forms of the same gene have different binding specificity: in this case the splice form information is added to the name, based on the relevant literature.

collection=c("CORE", "CNE", "PHYLOFACTS", "SPLICE", "POLII", "FAM", "PBM", "PBM_HOMEO", "PBM_HLH", "UNVALIDATED"

all_versions=c(FALSE,TRUE): We constantly update the profiles in JASPAR. Some profiles may have multiple versions. By default, only the latest version will be returned.

species: The species source for the sequences, in Latin (Homo sapiens) or NCBI tax IDs (9606).

matrixtype=c("PFM", "PWM", "ICM")

(2) Tags based criterias:

class: Structural class of the transcription factor, based on the TFCaT system. Examples: "Zipper-Type"", "Helix-Turn-Helix", etc.

type: Methodology used for matrix construction: "SELEX", "ChIP-seq", "PBM", etc.

tax_group: Group of species, currently consisting of "plants", "vertebrates", "insects", "urochordat", "nematodes", "fungi".

family: Structural sub-class of the transcription factor, based on the TFCaT system.

Acc: A representative protein accession number in Genbank for the transcription factor. Human takes precedence if several exists.

medline: relevant publication reporting the sites used in the mode building.

Pazar_tf_id: PAZAR database id.

(3) Further criterias:

min_ic (minimum total information content of the matrix)

length (minimum sites length)

sites (minimum average sites number per base)

When all is TRUE, it will get all the matrices and has higher priority over other options. Then ID has the second highest priority, and will ignore all the followiing options. The rest options are combined in search with AND, while multiple elements under one options have the logical operator OR.

Value

A PFMatrixList object.

Author(s)

Ge Tan

See Also

getMatrixByID, getMatrixByName

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  
    library(JASPAR2014)
    db <- file.path(system.file("extdata", package="JASPAR2014"), 
                    "JASPAR2014.sqlite")
    opts <- list()
    opts[["species"]] <- 9606
    opts[["type"]] <- "SELEX"
    opts[["all_versions"]] <- FALSE
    siteList <- getMatrixSet(db, opts)
    siteList2 <- getMatrixSet(JASPAR2014, opts)
  

TFBSTools documentation built on Nov. 8, 2020, 8:14 p.m.