To footprint the binding of transcription factors on DNA, the traditional motif mapping analysis requires previously curated motifs, and is limited to profile only 1 motif at a time. To overcome such issues, this package provides the implementation of the mutual-infomation based footprint analysis for transcription factors.
Check also https://www.nature.com/articles/s41586-018-0549-5 for details on how to use.
To use the functions in this package. Please note that the package fjComm needs to be installed before hand. There is an install_github
function to install R packages hosted on GitHub in the devtools package. It requests developer’s name.:
install_github("aquaflakes/fjComm")
The input data needs to be stored in a dataframe with only 1 column, each row corresponds to one sequence from the illumina sequencing. The data frame look like follows:
## .
## 1 ACTACTATCTATTCTATCATTACTACTACATATCAT
## 2 ACTACTATCTATTCTATCATTACTACTACATATCAT
## 3 ACTACTATCTATTCTATCATTACTACTACATATCAT
## 4 ACTACTATCTATTCTATCATTACTACTACATATCAT
## 5 ACTACTATCTATTCTATCATTACTACTACATATCAT
Use the following command to perform calculation of the mutual infomation and generate a data frame, only the first few rows were displayed :
df= ic_related_calc (input_seq, kmerLen=2L, filter_for_spacing=TRUE, spacing=0, verbose=F, pseudo=10L, type="maxBias", maxBias_dimer_Params=list(type="topMI",topNo=5L) )
The generated data frame contains 6 columns
## # A tibble: 6 x 6
## pos1 pos2 topMIsum bottomMIsum topk1 topk2
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 1 4 0.00247 -0.00142 TAA CGC
## 2 1 5 0.00234 -0.00106 CAC CTC
## 3 1 6 0.00220 -0.000986 AAA GCC
## 4 1 7 0.00176 -0.000991 CTA CGA
## 5 1 8 0.00173 -0.00105 CCC CCC
## 6 1 9 0.00176 -0.000937 TAA CTA
This package also provides ready-made functions to aid the visualization of necessary infomations. There are two basic way to visualize the data:
2D Heat Map for All Positional Combinations
To visualize all infomation contained in the dataframe, we can plot a 2-dimensional heatmap. This provide a quick overview for the whole dataset.
fjComm::gg_heat2D_MI(df, grad_colors = fjComm::gg_steelblue_red)
1D Heat Map for Neighbouring Positions
As TFs in general do not bind across a large span of DNA. For most of the cases, the mutual infomation between the neighbouring positions are the strongest. Therefore, most infomations are already contained by visualizing the diagnal of the 2D plot.
fjComm::gg_heat2D_diag(df, grad_colors = fjComm::gg_steelblue_red)
For more instructions about individual functions, please read the help documents under https://github.com/aquaflakes/MutualInfo/man Fangjie Zhu 2020-02-18
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.