MotifDb | R Documentation |
Approximately 2000 position frequency matrices collected from public sources, with ample accompanying metadata, and search and export capabilities provided.
MotifDb is an R object of class MotifList, whose entries are numeric
matrices, accompanied by a 'parallel' metadata structure, a DataFrame, in
which each row provides information about the corresponding matrix.
This object is automatically created and fully populated by data from
five public sources (see below) when the package is loaded into your R environment via the
library
call.
The matrices are obtained from six public sources:
FlyFactorSurvey: | 614 |
hPDI: | 437 |
JASPAR_CORE: | 459 |
jolma2013: | 843 |
ScerTF: | 196 |
stamlab: | 683 |
UniPROBE: | 380 |
cisbp 1.02 | 874 |
Representing primarily five organsisms (and 49 total):
Hsapiens: | 2328 |
Dmelanogaster: | 1008 |
Scerevisiae: | 701 |
Mmusculus: | 660 |
Athaliana: | 160 |
Celegans: | 44 |
other: | 177 |
All the matrices are stored as position frequency matrices, in which each columm (each position) sums to 1.0. When the number of sequences which contributed to the motif are known, that number will be found in the matrix's metadata. With this information, one can transform the matrices into either PCM (position count matrices), or PWM (position weight matrices), also known as PSSM (position-specific-scoring matrices). The latter transformation requires that a model of the background distribution be known, or assumed.
The names of the matrices are the same as rownames of the metadata DataFrame, and have been chosen to balance the needs of concision and full description, including the organism in which the motif was discovered, the data source, and the name of the motif in the data source from which it was obtained. For example: "Hsapiens-JASPAR_CORE-SP1-MA0079.2" and "Scerevisiae-ScerTF-GSM1-badis".
Subsets of the Matrices may be obtainted in several ways:
By integer index, eg, MotifDb [[1]]
By query, eg, as.list (query (MotifDb, 'FBgn0000014'))
(Interactively only) by subset as.list (subset (MotifDb,
geneSymbol=='Abda' & !is.na (pubmedID)))
The matrices are stored in a SimpleList
which has semantics very
similar to the familiar list of R base. To examine a matrix, however,
you must sidestep the MotifDb show
method. These three commands
display quite different results:
> MotifDb [1] MotifDb object of length 1 | Created from downloaded public sources: 2012-Jul6 | 1 position frequency matrices from 1 source: | FlyFactorSurvey: 1 | 1 organism/s | Dmelanogaster: 1 Dmelanogaster-FlyFactorSurvey-ab_SANGER_10_FBgn0259750 > MotifDb [[1]] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 A 0.0 0.50 0.20 0.35 0 0 1 0 0 0.55 0.35 0.05 0.20 0.45 0.20 0.10 0.40 0.40 0.25 0.50 0.30 C 0.3 0.15 0.25 0.00 1 1 0 0 0 0.10 0.65 0.70 0.45 0.25 0.10 0.25 0.25 0.10 0.10 0.25 0.25 G 0.4 0.05 0.50 0.65 0 0 0 1 1 0.00 0.00 0.05 0.05 0.15 0.05 0.20 0.05 0.15 0.55 0.15 0.45 T 0.3 0.30 0.05 0.00 0 0 0 0 0 0.35 0.00 0.20 0.30 0.15 0.65 0.45 0.30 0.35 0.10 0.10 0.00 > as.list (MotifDb [1]) $`Dmelanogaster-FlyFactorSurvey-ab_SANGER_10_FBgn0259750` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 A 0.0 0.50 0.20 0.35 0 0 1 0 0 0.55 0.35 0.05 0.20 0.45 0.20 0.10 0.40 0.40 0.25 0.50 0.30 C 0.3 0.15 0.25 0.00 1 1 0 0 0 0.10 0.65 0.70 0.45 0.25 0.10 0.25 0.25 0.10 0.10 0.25 0.25 G 0.4 0.05 0.50 0.65 0 0 0 1 1 0.00 0.00 0.05 0.05 0.15 0.05 0.20 0.05 0.15 0.55 0.15 0.45 T 0.3 0.30 0.05 0.00 0 0 0 0 0 0.35 0.00 0.20 0.30 0.15 0.65 0.45 0.30 0.35 0.10 0.10 0.00
There are fifteen kinds of metadata – though not all matrices have a full complement: not all of the public sources are complete in this regard. The information falls into these categories, using the Dmelanogaster-FlyFactorSurvey-ab_SANGER_10_FBgn0259750 entry as an example (see below for the associated position frequency matrix):
providerName: "ab_SANGER_10_FBgn0259750"
providerId: "FBgn0259750"
dataSource: "FlyFactorSurvey"
geneSymbol: "Ab"
geneId: "FBgn0259750"
geneIdType: "FLYBASE"
proteinId: "E1JHF4"
proteinIdType: "UNIPROT"
organism: "Dmelanogaster"
sequenceCount: 20
bindingSequence: NA
bindingDomain: NA
tfFamily: NA
experimentType: "bacterial 1-hybrid, SANGER sequencing"
pubmedID: NA
Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. Circuitry and dynamics of human transcription factor regulatory networks. Cell. 2012 Sep 14;150(6):1274-86.
Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010 Jan;38(Database issue):D105-10. Epub 2009 Nov 11.
Robasky K, Bulyk ML. UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2011 Jan;39(Database issue):D124-8. Epub 2010 Oct 30.
Spivak AT, Stormo GD. ScerTF: a comprehensive database of benchmarked position weight matrices for Saccharomyces species. Nucleic Acids Res. 2012 Jan;40(Database issue):D162-8. Epub 2011 Dec 2.
Xie Z, Hu S, Blackshaw S, Zhu H, Qian J. hPDI: a database of experimental human protein-DNA interactions. Bioinformatics. 2010 Jan 15;26(2):287-9. Epub 2009 Nov 9.
Zhu LJ, et al. 2011. FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res. 2011 Jan;39(Database issue):D111-7. Epub 2010 Nov 19.
Jolma A, et al. 2013. DNA-binding specificities of human transcription factors. Cell 2013 Jan 17.
query, subset, export, flyFactorSurvey, hPDI, jaspar, ScerTF, uniprobe
# are there any matrices for Sox4? we find two mdb.sox4 <- MotifDb [grep ('sox4', values (MotifDb)$geneSymbol, ignore.case=TRUE)] # the same two matrices can be obtained this way also if (interactive ()) mdb.sox4 <- subset (MotifDb, tolower(geneSymbol)=='sox4') # and like this mdb.sox4 <- query (MotifDb, 'sox4') # matches against all fields in the metadata # implicitly invoke the 'show' method mdb.sox4 # get their full names names (mdb.sox4) # examine their metadata values (mdb.sox4) # examine the matrices with names include as.list (mdb.sox4) # export the matrices in meme format destination.file = tempfile () export (mdb.sox4, destination.file, 'meme')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.