README.md

primirTSS

primirTSS is a first R package to predict pri-miRNA Transcription Start Site.

1 Introduction

Identifying human miRNA transcriptional start sites (TSSs) plays a significant role in understanding the transcriptional regulation of miRNA. However, due to the quick capping of pri-miRNA and many miRNA genes may lie in the introns or even exons of other genes, it is difficult to detect miRNA TSSs. miRNA TSSs are cell-specific. And miRNA TSSs are cell-specific, which implies the same miRNA in different cell-lines may start transcribing at different TSSs.

High throughput sequencing, like ChIP-seq, has gradually become an essential and versatile approach for us to identify and understand genomes and their transcriptional processes. By integrating H3k4me4 and Pol II data, parting of false positive counts after scoring can be filter out. Besides, DNase I hypersensitive sites(DHS) also imply TSSs, where miRNAs will be accessible and functionally related to transcription activities. And additionally, the expression profile of miRNA and genes in certain cell-line will be considered as well for improve fidelity. By employing all these different kinds of data, here we have developed the primirTSS package to assist users to identify miRNA TSSs in human and to provide them with related information about where miRNA genes lie in the genome, with both command-line and graphical interfaces.

2 Find the best putative TSS

Installation

1 primirTSS

Install the latest release of R, then get primirTSS by starting R and entering the commands:

devtools::install_github("ipumin/primirTSS")

2 Install Java SE Development Kit(JDK)

As Java development environment is indispensable for the main function in our package, it is necessary for users to install Java SE Development Kit 10 before using primirTSS.

3 Getting Started

Step 1: Process of H3K4me3 and Pol II data

H3K4me3 and Pol II data are key points for accurate prediction our method. If one of these to peak data is input, before execute the main function find_TSS, the function peak_merge should be used to merge adjacent peaks whose distance between each other is less than n base pairs and return the merged peaks as an output.

library(primirTSS)
peak_df <- data.frame(chrom = c("chr1", "chr2", "chr1"),
                       chromStart = c(450, 460, 680),
                       chromEnd = c(470, 480, 710),
                       stringsAsFactors = FALSE)
peak <-  as(peak_df, "GRanges")
peak_merge(peak, n =250)

If both of H3K4me3 and Pol II data, after separately merging these two kinds of data first, peak_join should be employed to integrate H3K4me3 and Pol II peaks and return the result as bed_merged parameter for the main function find_tss.

peak_df1 <- data.frame(chrom = c("chr1", "chr1", "chr1", "chr2"),
                       start = c(100, 460, 600, 70),
                       end = c(200, 500, 630, 100),
                       stringsAsFactors = FALSE)
peak1 <-  as(peak_df1, "GRanges")

peak_df2 <- data.frame(chrom = c("chr1", "chr1", "chr1", "chr2"),
                       start = c(160, 470, 640, 71),
                       end = c(210, 480, 700, 90),
                       stringsAsFactors = FALSE)
peak2 <-  as(peak_df2, "GRanges")

peak_join(peak1, peak2)

Step 2: Predict most possible TSS for miRNA

There will be different circumstances where not all miRNA expression profiles, DHS data, protein-coding gene('gene') expression profiles are available:

Circumstance 1: no miRNA expression data; then suggest DHS check and protein-coding gene check.

peakfile <- system.file("testdata", "HMEC_h3.csv", package = "primirTSS")
DHSfile <- system.file("testdata", "HMEC_DHS.csv", package = "primirTSS")
peak_h3 <- read.csv(peakfile, stringsAsFactors = FALSE)
DHS <- read.csv(DHSfile, stringsAsFactors = FALSE)
peak_h3 <-  as(peak_h3, "GRanges")
peak <- peak_merge(peak_h3)
no_ownmiRNA <- find_tss(peak, ignore_DHS_check = FALSE,
                        DHS = DHS, allmirdhs_byforce = FALSE,
                        expressed_gene = "all",
                        allmirgene_byforce = FALSE,
                        seek_tf = FALSE)

Circumstance 2: miRNA expression data provided; then no need for DHS check but protein-coding gene check.

bed_merged <- data.frame(
                chrom = c("chr1", "chr1", "chr1", "chr1", "chr2"),
                start = c(9910686, 9942202, 9996940, 10032962, 9830615),
                end = c(9911113, 9944469, 9998065, 10035458, 9917994),
                stringsAsFactors = FALSE)
bed_merged <- as(bed_merged, "GRanges")

expressed_mir <- c("hsa-mir-5697")

ownmiRNA <- find_tss(bed_merged, expressed_mir = expressed_mir,
                     ignore_DHS_check = TRUE,
                     expressed_gene = "all",
                     allmirgene_byforce = TRUE,
                     seek_tf = FALSE)

Step 3: Searching for TFs

Step4: Analysis of results

Here is a demo of predicting TSS for hsa-mir-5697, ignore DHS check.

PART1, $tss_df:

ownmiRNA$tss_df

The first part of the result returns details of predicted TSSs, composed of seven columns: mir_name, chrom, stem_loop_p1, stem_loop_p2, strand mir_context, tss_type gene and predicted_tss:

Entry|Implication :----:|---- mir_name| Name of miRNA. chrom | Chromosome. stem_loop_p1| The start site of a stem-loop. stem_loop_p2| The end site of a stem-loop. strand|Polynucleotide strands. (+/-) mir_context|2 types of relative position relationship between stem-loop and protein-coding gene. (intra/inter) tss_type|4 types of predicted TSSs. See the section below TSS types for details.(host_TSS/intra_TSS/overlap_inter_TSS/inter_TSS) gene|Ensembl gene ID. predicted_tss| Predicted transcription start sites(TSSs). pri_tss_distance|The distance between a predicted TSS and the start site of the stem-loop.

TSSs are cataloged into 4 types as below:

(See Xu HUA et al 2016 for more details)

PART2, $log:

The second part of the result returns 4 logs created during the process of prediction:

4 Plot the prediction of TSS for miRNA

plot_primiRNA(expressed_mir, bed_merged,
              flanking_num = 1000, threshold = 0.7,
              ignore_DHS_check = TRUE,
              DHS, allmirdhs_byforce = TRUE,
              expressed_gene = "all",
              allmirgene_byforce = TRUE)

plot_tss    Figure S1. Visualized result for miRNA TSSs by Plot pri-miRNA TSS()

As Figure S1 shows, the picture contains information of the pri-miRNA’s coordinate, the closest gene to the miRNA, the eponine score of the miRNA’s candidate TSS and the conservation score of the miRNA’s candidate TSS. There are six tracks plotted in return:

Entry|Implication :----:|----- Chromosome |Position of miRNA on the chromosome. hg19 |Reference genome coordinate in hg19. pri-miRNA: |Position of pri-miRNA. Ensemble genes |Position of related proterin-coding gene. Eponine score |Score of best putative TSS by Eponine method. Conservation score |Conservation score of TSS.                         

5 Graphical web interface for prediction

TAG1: Find the best putative TSS

shiny_tss  Figure S2. Graphical web interface of Find pri-miRNA TSS()

As Figure S2 shows, if we want to use the shiny app, we should select the appropriate options or upload the appropriate files. Histone peaks, Pol II peaks and DHS files are comma-separated values (CSV) files, whose first line is chrom,start,end. Every line of miRNA expression profiles has only one miRNA name which start with hsa-mir, such as hsa-mir-5697. Every line of gene expression profiles has only one gene name which derived from Ensembl, such as ENSG00000261657. All of miRNA expression profiles and gene expression profiles do not have column names. If we have prepared, we can push the Start the analysis button to start finding the TSSs. The process of analysis may need to take a few minutes, and a process bar will appear in right corner.

As a result, we will view first six rows of the result. The first five columns are about miRNA information, next five columns are about TSS information. The column of gene denotes the gene whose TSS is closest to the miRNA TSS. The column of pri_tss_distance denotes the distance between miRNA TSS and stem-loop. If users choose to get TFs simultaneously, they will have an additional column, tf, which stores related TFs.

TAG2: Plot pri-miRNA TSS

shiny_plot  Figure S3. Graphical web interface of Plot pri-miRNA TSS()

As Figure S4 shows, if we select the appropriate options and upload the appropriate files, we can have a picture of miRNA TSSs.

Session info

Here is the output of sessionInfo() on the system on which this document was compiled:

Session info -------------------------------
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.1.442)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       Asia/Shanghai               
 date     2018-07-24                  
Packages -----------------------------------
 package                     * version    date       source        
 acepack                       1.4.1      2016-10-29 CRAN (R 3.5.0)
 annotate                      1.58.0     2018-05-01 Bioconductor  
 AnnotationDbi                 1.42.1     2018-05-08 Bioconductor  
 AnnotationFilter              1.4.0      2018-05-01 Bioconductor  
 AnnotationHub                 2.12.0     2018-05-01 Bioconductor  
 assertthat                    0.2.0      2017-04-11 CRAN (R 3.5.0)
 backports                     1.1.2      2017-12-13 CRAN (R 3.5.0)
 base                        * 3.5.0      2018-04-24 local         
 base64enc                     0.1-3      2015-07-28 CRAN (R 3.5.0)
 bindr                         0.1.1      2018-03-13 CRAN (R 3.5.0)
 bindrcpp                    * 0.2.2      2018-03-29 CRAN (R 3.5.0)
 Biobase                       2.40.0     2018-05-01 Bioconductor  
 BiocGenerics                  0.26.0     2018-05-01 Bioconductor  
 BiocInstaller                 1.30.0     2018-05-01 Bioconductor  
 BiocParallel                  1.14.2     2018-07-08 Bioconductor  
 biomaRt                       2.36.1     2018-05-24 Bioconductor  
 Biostrings                    2.48.0     2018-05-01 Bioconductor  
 biovizBase                    1.28.1     2018-07-10 Bioconductor  
 bit                           1.1-14     2018-05-29 CRAN (R 3.5.0)
 bit64                         0.9-7      2017-05-08 CRAN (R 3.5.0)
 bitops                        1.0-6      2013-08-17 CRAN (R 3.5.0)
 blob                          1.1.1      2018-03-25 CRAN (R 3.5.0)
 BSgenome                      1.48.0     2018-05-01 Bioconductor  
 BSgenome.Hsapiens.UCSC.hg38   1.4.1      2018-07-18 Bioconductor  
 caTools                       1.17.1     2014-09-10 CRAN (R 3.5.0)
 checkmate                     1.8.5      2017-10-24 CRAN (R 3.5.0)
 cluster                       2.0.7-1    2018-04-13 CRAN (R 3.5.0)
 CNEr                          1.16.1     2018-06-01 Bioconductor  
 colorspace                    1.3-2      2016-12-14 CRAN (R 3.5.0)
 compiler                      3.5.0      2018-04-24 local         
 crayon                        1.3.4      2017-09-16 CRAN (R 3.5.0)
 curl                          3.2        2018-03-28 CRAN (R 3.5.0)
 data.table                    1.11.4     2018-05-27 CRAN (R 3.5.0)
 datasets                    * 3.5.0      2018-04-24 local         
 DBI                           1.0.0      2018-05-02 CRAN (R 3.5.0)
 DelayedArray                  0.6.1      2018-06-15 Bioconductor  
 devtools                      1.13.6     2018-06-27 CRAN (R 3.5.0)
 dichromat                     2.0-0      2013-01-24 CRAN (R 3.5.0)
 digest                        0.6.15     2018-01-28 CRAN (R 3.5.0)
 DirichletMultinomial          1.22.0     2018-05-01 Bioconductor  
 dplyr                         0.7.6      2018-06-29 CRAN (R 3.5.1)
 ensembldb                     2.4.1      2018-05-07 Bioconductor  
 foreign                       0.8-70     2017-11-28 CRAN (R 3.5.0)
 Formula                       1.2-3      2018-05-03 CRAN (R 3.5.0)
 GenomeInfoDb                  1.16.0     2018-05-01 Bioconductor  
 GenomeInfoDbData              1.1.0      2018-05-30 Bioconductor  
 GenomicAlignments             1.16.0     2018-05-01 Bioconductor  
 GenomicFeatures               1.32.0     2018-05-01 Bioconductor  
 GenomicRanges                 1.32.4     2018-07-13 Bioconductor  
 GenomicScores                 1.4.1      2018-05-23 Bioconductor  
 ggplot2                       3.0.0      2018-07-03 CRAN (R 3.5.0)
 glue                          1.3.0      2018-07-17 CRAN (R 3.5.0)
 GO.db                         3.6.0      2018-05-30 Bioconductor  
 graphics                    * 3.5.0      2018-04-24 local         
 grDevices                   * 3.5.0      2018-04-24 local         
 grid                          3.5.0      2018-04-24 local         
 gridExtra                     2.3        2017-09-09 CRAN (R 3.5.0)
 gtable                        0.2.0      2016-02-26 CRAN (R 3.5.0)
 gtools                        3.8.1      2018-06-26 CRAN (R 3.5.0)
 Gviz                          1.24.0     2018-05-01 Bioconductor  
 Hmisc                         4.1-1      2018-01-03 CRAN (R 3.5.0)
 hms                           0.4.2      2018-03-10 CRAN (R 3.5.0)
 htmlTable                     1.12       2018-05-26 CRAN (R 3.5.0)
 htmltools                     0.3.6      2017-04-28 CRAN (R 3.5.0)
 htmlwidgets                   1.2        2018-04-19 CRAN (R 3.5.0)
 httpuv                        1.4.4.2    2018-07-02 CRAN (R 3.5.0)
 httr                          1.3.1      2017-08-20 CRAN (R 3.5.0)
 interactiveDisplayBase        1.18.0     2018-05-01 Bioconductor  
 IRanges                       2.14.10    2018-05-16 Bioconductor  
 JASPAR2018                    1.1.1      2018-05-30 Bioconductor  
 KEGGREST                      1.20.1     2018-06-27 Bioconductor  
 knitr                         1.20       2018-02-20 CRAN (R 3.5.0)
 later                         0.7.3      2018-06-08 CRAN (R 3.5.0)
 lattice                       0.20-35    2017-03-25 CRAN (R 3.5.0)
 latticeExtra                  0.6-28     2016-02-09 CRAN (R 3.5.0)
 lazyeval                      0.2.1      2017-10-29 CRAN (R 3.5.0)
 magrittr                      1.5        2014-11-22 CRAN (R 3.5.0)
 Matrix                        1.2-14     2018-04-13 CRAN (R 3.5.0)
 matrixStats                   0.53.1     2018-02-11 CRAN (R 3.5.0)
 memoise                       1.1.0      2017-04-21 CRAN (R 3.5.0)
 methods                     * 3.5.0      2018-04-24 local         
 mime                          0.5        2016-07-07 CRAN (R 3.5.0)
 munsell                       0.5.0      2018-06-12 CRAN (R 3.5.0)
 nnet                          7.3-12     2016-02-02 CRAN (R 3.5.0)
 parallel                      3.5.0      2018-04-24 local         
 phastCons100way.UCSC.hg38     3.7.1      2018-07-18 Bioconductor  
 pillar                        1.3.0      2018-07-14 CRAN (R 3.5.0)
 pkgconfig                     2.0.1      2017-03-21 CRAN (R 3.5.0)
 plyr                          1.8.4      2016-06-08 CRAN (R 3.5.0)
 png                           0.1-7      2013-12-03 CRAN (R 3.5.0)
 poweRlaw                      0.70.1     2017-08-29 CRAN (R 3.5.0)
 prettyunits                   1.0.2      2015-07-13 CRAN (R 3.5.0)
 primirTSS                   * 0.0.0.9000 2018-07-24 local         
 progress                      1.2.0      2018-06-14 CRAN (R 3.5.0)
 promises                      1.0.1      2018-04-13 CRAN (R 3.5.0)
 ProtGenerics                  1.12.0     2018-05-01 Bioconductor  
 purrr                         0.2.5      2018-05-29 CRAN (R 3.5.0)
 R.methodsS3                   1.7.1      2016-02-16 CRAN (R 3.5.0)
 R.oo                          1.22.0     2018-04-22 CRAN (R 3.5.0)
 R.utils                       2.6.0      2017-11-05 CRAN (R 3.5.0)
 R6                            2.2.2      2017-06-17 CRAN (R 3.5.0)
 RColorBrewer                  1.1-2      2014-12-07 CRAN (R 3.5.0)
 Rcpp                          0.12.17    2018-05-18 CRAN (R 3.5.0)
 RCurl                         1.95-4.11  2018-07-15 CRAN (R 3.5.0)
 readr                         1.1.1      2017-05-16 CRAN (R 3.5.0)
 reshape2                      1.4.3      2017-12-11 CRAN (R 3.5.0)
 rlang                         0.2.1      2018-05-30 CRAN (R 3.5.0)
 rpart                         4.1-13     2018-02-23 CRAN (R 3.5.0)
 Rsamtools                     1.32.2     2018-07-03 Bioconductor  
 RSQLite                       2.1.1      2018-05-06 CRAN (R 3.5.0)
 rstudioapi                    0.7        2017-09-07 CRAN (R 3.5.0)
 rtracklayer                   1.40.3     2018-06-02 Bioconductor  
 S4Vectors                     0.18.3     2018-06-08 Bioconductor  
 scales                        0.5.0      2017-08-24 CRAN (R 3.5.0)
 seqLogo                       1.46.0     2018-05-01 Bioconductor  
 shiny                         1.1.0      2018-05-17 CRAN (R 3.5.0)
 splines                       3.5.0      2018-04-24 local         
 stats                       * 3.5.0      2018-04-24 local         
 stats4                        3.5.0      2018-04-24 local         
 stringi                       1.2.3      2018-06-12 CRAN (R 3.5.0)
 stringr                       1.3.1      2018-05-10 CRAN (R 3.5.0)
 SummarizedExperiment          1.10.1     2018-05-11 Bioconductor  
 survival                      2.42-6     2018-07-13 CRAN (R 3.5.0)
 TFBSTools                     1.18.0     2018-05-01 Bioconductor  
 TFMPvalue                     0.0.8      2018-05-16 CRAN (R 3.5.0)
 tibble                        1.4.2      2018-01-22 CRAN (R 3.5.0)
 tidyr                         0.8.1      2018-05-18 CRAN (R 3.5.0)
 tidyselect                    0.2.4      2018-02-26 CRAN (R 3.5.0)
 tools                         3.5.0      2018-04-24 local         
 utils                       * 3.5.0      2018-04-24 local         
 VariantAnnotation             1.26.1     2018-07-04 Bioconductor  
 VGAM                          1.0-5      2018-02-07 CRAN (R 3.5.0)
 withr                         2.1.2      2018-03-15 CRAN (R 3.5.0)
 XML                           3.98-1.12  2018-07-15 CRAN (R 3.5.0)
 xtable                        1.8-2      2016-02-05 CRAN (R 3.5.0)
 XVector                       0.20.0     2018-05-01 Bioconductor  
 yaml                          2.1.19     2018-05-01 CRAN (R 3.5.0)
 zlibbioc                      1.26.0     2018-05-01 Bioconductor 


ipumin/primirTSS documentation built on June 10, 2020, 9:52 a.m.