knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
NOTE: I copied this section straight from Dave Bray's old README and haven't updated it at all.
The hTF array design was generated using the hTF_v01_nextPBM_design.R
file included in this repository. The microarray itself is available to purchase from Agilent (Design ID: 086290). In order to reconstitute the files and annotations associated with the design, simply run hTF_v01_nextPBM_design.R
as an Rscript as follows:
Rscript hTF_v01_nextPBM_design.R hTF_v01
The only argument is a prefix to use while naming the individual DNA probes and the design as a whole. In the above example, I've used hTF_v01
as this prefix. The script depends on several R and bioconductor packages, so if you would like to rerun it, please install the following first:
TFBSTools
JASPAR2018
* plyr
The following are each of the steps executed in the design Rscript to build the array design starting from the TF binding models included in the open-source JASPAR 2018 database.
JASPAR2018
bioconductor R package was used to fetch the motif matrices from the databaseGACTACTACGTGTCGACGATCGAGCACGCAGATC
GC cap + 34-base target (TF site or SNV + backbone) + 24 base double-stranding primer
Q: Why was a size threshold used? Why not just filter a consensus sequence if it is a subsequence of a larger consensus sequence?
A: It may be biophysically important to study a half-site (ie. from a TF complex) or a core element in isolation from other half-sites or flanking bases for example. I wanted to avoid the situation where I would be filtering out half-sites or smaller, more degenerate sites for this reason.
Q: Why was a relative size filter of 0.90 selected? Why not another number?
A: This choice was completely arbitrary. I adjusted it until I obtained a suitable number of final consensus sequences to fit within the limits of the Agilent 4x180K microarray design
Q: What happened to all of the consensus sequences that were filtered out?
A: None of the 452 human JASPAR CORE motifs from the 2018 build were actually "filtered out". The ones that were determined to be duplicates or have close enough sequence identity to a consensus sequence on the design were simply flagged and added as "equivalent". Details can be viewed for individual TFs within the full array annotation or the sample results dataset.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.