Introduction

suppressPackageStartupMessages({
library(BiocStyle)
library(TFutils)
library(org.Hs.eg.db)
library(GO.db)
library(data.table)
})

A central concern of genome biology is improving understanding of gene transcription. Transcription factors (TFs) are proteins that bind to DNA, typically near gene promoter regions. The role of TFs in gene expression variation is of great interest. Progress in deciphering genetic and epigenetic processes that affect TF abundance and function will be essential in clarifying and interpreting gene expression variation patterns and their effects on phenotype. Difficulties of identifying TFs, and opportunities for doing so in systems biology contexts, are reviewed in @Weirauch2014.

This paper describes an R/Bioconductor package called TFutils, which assembles various resources intended to clarify and unify approaches to working with TF concepts in bioinformatic analysis. Computations described in this paper can be carried out with Bioconductor version 3.6. The package can be installed with

library(BiocInstaller) # use source("http://www.bioconductor.org/biocLite.R") if not available
biocLite("TFutils")

Enumerating transcription factors

Various sources of human tfs

library(TFutils)
library(AnnotationDbi)
tfdf = select(org.Hs.eg.db::org.Hs.eg.db, 
    keys="GO:0003700", keytype="GO", 
    columns=c("ENTREZID", "SYMBOL"))
tfdf = tfdf[, c("ENTREZID", "SYMBOL")]
TFs_GO = TFCatalog(name="GO.0003700", nativeIds=tfdf$ENTREZID,
 HGNCmap=tfdf)

data(tftColl)
data(tftCollMap)
TFs_MSIG = TFCatalog(name="MsigDb.TFT", nativeIds=names(tftColl),
 HGNCmap=data.frame(tftCollMap,stringsAsFactors=FALSE))

data(cisbpTFcat)
TFs_CISBP = TFCatalog(name="CISBP.info", nativeIds=cisbpTFcat[,1],
 HGNCmap = cisbpTFcat)

data(hocomoco.mono)
TFs_HOCO = TFCatalog(name="hocomoco11", nativeIds=hocomoco.mono[,1],
 HGNCmap=hocomoco.mono)

We have four basic enumerations of TFs with diverse forms of metadata.

TFs_GO
TFs_MSIG
TFs_CISBP
TFs_HOCO

GO: 820 HOCOMOCO: 680 CIS-BP: 1734 (how many map to HGNC)? MSigDb TFclass

A simple way of enumerating genes coding for TFs is to interrogate Gene Ontology Annotation. In Bioconductor 3.6, the annotations are derived from the November 2017 latest-lite table. The number of distinct gene symbols annotated to the term DNA binding transcription factor activity is found as

1

These annotations are accompanied by evidence codes.

Another relevant resource is the HOCOMOCO project (@Kulakovskiy2018). In the conclusion of the 2018 Nucleic Acids Research paper, these authors indicate that their database identifies 680 human TFs.

Enumerating TF targets

The Broad Institute MSigDb (@Subramanian15545) includes a gene set collection devoted to cataloging TF targets. We have used Bioconductor's r Biocpkg("GSEABase") package to import and serialize the gmt representation of this collection.

TFutils::tftColl

Names of TFs for which target sets are assembled are encoded in a somewhat systematic way. We attempt to decode with string operations:

tftn = names(TFutils::tftColl)
stftn = strsplit(tftn, "_")

So there are some exact matches between components of the MSigDb TF target collection names and the HOCOMOCO TF names. However, we observe some peculiarity in nomenclature in the MSigDb labels:

grep("NFK", names(TFutils::tftColl), value=TRUE)

Some manual curation will be in order to improve the precision with which MSigDb TF target sets can used.

Quantitative data on TF binding sites

The introduction provides context as to why the software tool was developed and what need it addresses. It is good scholarly practice to mention previously developed tools that address similar needs, and why the current tool is needed.

Methods

Implementation

For software tool papers, this section should address how the tool works and any relevant technical details required for implementation of the tool by other developers.

Operation

This part of the methods should include the minimal system requirements needed to run the software and an overview of the workflow for the tool for users of the tool.

Results

This section is only required if the paper includes novel data or analyses, and should be written as a traditional results section.

Use Cases

This section is required if the paper does not include novel data or analyses. Examples of input and output files should be provided with some explanatory context. Any novel or complex variable parameters should also be explained in sufficient detail to allow users to understand and use the tool's functionality.

Discussion

This section is only required if the paper includes novel data or analyses, and should be written in the same style as a traditional discussion section. Please include a brief discussion of allowances made (if any) for controlling bias or unwanted sources of variability, and the limitations of any novel datasets.

Conclusions

This section is only required if the paper includes novel data or analyses, and should be written as a traditional conclusion.

Summary

This section is required if the paper does not include novel data or analyses. It allows authors to briefly summarize the key points from the article.

Data availability

Please add details of where any datasets that are mentioned in the paper, and that have not have not previously been formally published, can be found. If previously published datasets are mentioned, these should be cited in the references, as per usual scholarly conventions.

Software availability

This section will be generated by the Editorial Office before publication. Authors are asked to provide some initial information to assist the Editorial Office, as detailed below.

  1. URL link to where the software can be downloaded from or used by a non-coder (AUTHOR TO PROVIDE; optional)
  2. URL link to the author's version control system repository containing the source code (AUTHOR TO PROVIDE; required)
  3. Link to source code as at time of publication (F1000Research TO GENERATE)
  4. Link to archived source code as at time of publication (F1000Research TO GENERATE)
  5. Software license (AUTHOR TO PROVIDE; required)

Author contributions

In order to give appropriate credit to each author of an article, the individual contributions of each author to the manuscript should be detailed in this section. We recommend using author initials and then stating briefly how they contributed.

Competing interests

All financial, personal, or professional competing interests for any of the authors that could be construed to unduly influence the content of the article must be disclosed and will be displayed alongside the article. If there are no relevant competing interests to declare, please add the following: 'No competing interests were disclosed'.

Grant information

Please state who funded the work discussed in this article, whether it is your employer, a grant funder etc. Please do not list funding that you have that is not relevant to this specific piece of research. For each funder, please state the funder’s name, the grant number where applicable, and the individual to whom the grant was assigned. If your work was not funded by any grants, please include the line: 'The author(s) declared that no grants were involved in supporting this work.'

Acknowledgments

This section should acknowledge anyone who contributed to the research or the article but who does not qualify as an author based on the criteria provided earlier (e.g. someone or an organization that provided writing assistance). Please state how they contributed; authors should obtain permission to acknowledge from all those mentioned in the Acknowledgments section.

Please do not list grant funding in this section.

USING R MARKDOWN

Some examples of commonly used markdown syntax are listed below, to help you get started.

Cross-references

For portability between different output formats, use the syntax introduced by bookdown, such as (\#label) for labels and \@ref(label) for cross-references. The following sections provide examples of referencing tables, figures, and equations.

Citations

You can include references in a standard Bibtex file. The name of this file is given in the header of the markdown document (in our case it is sample.bib). References to entries in the Bibtex file are made using square brackets and use an @ plus the key for the entry you are referencing [@Smith:2012qr]. You can combine multiple entries by separating them with a semi-colon [@Smith:2012qr; @Smith:2013jd]. The default bibliography style uses numerical citations. For superscript or author-year citations set the header metadata field natbiboptions to either super or round, respectively.

Code chunks

You can embed an R code chunk like this:

x <- 1:10
x

If you specify a figure caption to a code chunk using the chunk option fig.cap, the plot will be automatically labeled and numbered. The figure label is generated from the label of the code chunk by prefixing it with fig:, e.g., see Figure \@ref(fig:plot).

plot(x)

Tables

Markdown syntax tends to lack some of the more sophisticated formatting features available in LaTeX, so you may need to edit the tables later to get the desired format.

| First name | Last Name | Grade | | ----------- | --------- | ----- | | John | Doe | 7.5 | | Richard | Miles | 2 |

Table: Caption to table.

Just like figures, tables with captions will also be numbered and can be referenced. Captions are entered as a paragraph beginning with the string "Table:" (or just ":"), which may appear either before or after the table. A label for the table should appear in the beginning of the caption in the form of (\#tab:label), e.g., see Table \@ref(tab:table).

: (#tab:table) A table with text justification.

| First name | Last Name | Grade | | ----------- | :-------: | ----: | | John | Doe | 7.5 | | Richard | Miles | 2 |

Figures

You can include static figures (i.e. no generated by code) using the include_graphics() function from the knitr package, in a standard code chunk.

knitr::include_graphics('frog.jpg')

You can again use the fig.cap option to provide the figure caption, and reference the image based on the code chunk label. You can also use options such as fig.align and fig.width to adjust the position and size of the image within the final document, e.g. Figure \@ref(fig:frog-picture) is a frog.

Alternatively, you can use the standard markdown syntax like so:

This is a smaller version of the same picture, inserted using the standard markdown syntax{width=25%}

Please give figures appropriate filenames, e.g.: figure1.pdf, figure2.png.

Figure legends should briefly describe the key messages of the figure such that the figure can stand alone from the main text. However, all figures should also be discussed in the article text. Each legend should have a concise title of no more than 15 words. The legend itself should be succinct, while still explaining all symbols and abbreviations. Avoid lengthy descriptions of methods.

For any figures reproduced from another publication (as long as appropriate permission has been obtained from the copyright holder —see under the heading 'Submission'), please include a line in the legend to state that: 'This figure has been reproduced with kind permission from [include original publication citation]'.

Mathematics

You can use LaTeX syntax to typeset mathematical expressions. Let $X_1, X_2, \ldots, X_n$ be a sequence of independent and identically distributed random variables with $\text{E}[X_i] = \mu$ and $\text{Var}[X_i] = \sigma^2 < \infty$, and let $$S_n = \frac{X_1 + X_2 + \cdots + X_n}{n} = \frac{1}{n}\sum_{i}^{n} X_i$$ denote their mean. Then as $n$ approaches infinity, the random variables $\sqrt{n}(S_n - \mu)$ converge in distribution to a normal $\mathcal{N}(0, \sigma^2)$.

To number and refer to equations, put them in the equation environments and assign labels to them, as for Equation \@ref(eq:binom).

\begin{equation} f\left(k\right) = \binom{n}{k} p^k\left(1-p\right)^{n-k} (#eq:binom) \end{equation}

Lists

You can make ordered lists

  1. Like this,
  2. and like this.

or bullet points



shwetagopaul92/TFutils documentation built on May 26, 2019, 4:32 a.m.