knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

hTF Array design details

NOTE: I copied this section straight from Dave Bray's old README and haven't updated it at all.

The hTF array design was generated using the hTF_v01_nextPBM_design.R file included in this repository. The microarray itself is available to purchase from Agilent (Design ID: 086290). In order to reconstitute the files and annotations associated with the design, simply run hTF_v01_nextPBM_design.R as an Rscript as follows:

Rscript hTF_v01_nextPBM_design.R hTF_v01

The only argument is a prefix to use while naming the individual DNA probes and the design as a whole. In the above example, I've used hTF_v01 as this prefix. The script depends on several R and bioconductor packages, so if you would like to rerun it, please install the following first: TFBSTools JASPAR2018 * plyr

The following are each of the steps executed in the design Rscript to build the array design starting from the TF binding models included in the open-source JASPAR 2018 database.

1. Obtain all core TF motifs in the JASPAR database and their consensus sequences

2. Flag "equivalent" motif seed sequences for "filtering" form final design

3. Pad the consensus sequence to create the final target sequence and generate single nucleotide variant probes

4. Generate a backbone probe sequence

5. Construct final probes

Design FAQ

Q: Why was a size threshold used? Why not just filter a consensus sequence if it is a subsequence of a larger consensus sequence?
A: It may be biophysically important to study a half-site (ie. from a TF complex) or a core element in isolation from other half-sites or flanking bases for example. I wanted to avoid the situation where I would be filtering out half-sites or smaller, more degenerate sites for this reason.

Q: Why was a relative size filter of 0.90 selected? Why not another number?
A: This choice was completely arbitrary. I adjusted it until I obtained a suitable number of final consensus sequences to fit within the limits of the Agilent 4x180K microarray design

Q: What happened to all of the consensus sequences that were filtered out?
A: None of the 452 human JASPAR CORE motifs from the 2018 build were actually "filtered out". The ones that were determined to be duplicates or have close enough sequence identity to a consensus sequence on the design were simply flagged and added as "equivalent". Details can be viewed for individual TFs within the full array annotation or the sample results dataset.



Siggers-Lab/hTF_array documentation built on Feb. 7, 2024, 11:25 p.m.