Package: pulseTD
Type: Package
Author: XinWang
Maintainer: The package maintainer xindd_2014@163.com
Title: Identification of Transcriptional Dynamics using Pulse Models via 4su-Seq Data and RNA-Seq Data
Version: 0.1.0
Date: 2019-5-5
Description: A tool for analyzing the transcription, processing and degradation rates of genes by 4sU-seq (the Metabolic Label 4-thiouridine) data and RNA-seq (RNA sequencing) data. It can not only recognize the transcriptional dynamic rates at the measurement time points, but also obtain continuous changes in transcriptional dynamics. More importantly, it is able to predict the trend of mRNA (Mature RNA) transcription and expression changes in the future.
License: GPL-2
Encoding: UTF-8
Depends: R (>= 3.4.0)
Imports: AnnotationDbi,SummarizedExperiment,Rsamtools,Biobase,S4Vectors,methods,parallel,GenomicFeatures, ggplot2, grid, GenomicAlignments
URL: https://github.com/bioWzz/pulseTD_0.1.0
RoxygenNote: 6.1.1
Suggests: knitr, rmarkdown, TxDb.Hsapiens.UCSC.hg19.knownGene
VignetteBuilder: knitr
The life cycle of intracellular mRNA mainly undergoes transcriptional production, splicing maturation and degradation processes. We refer to the dynamic changes of these processes over time as transcriptional dynamics. Under the influence of external disturbances and other factors, the common regulation of transcriptional dynamic processes leads to different levels of RNA expression. The emergence of labeled RNA (4sU-RNA) has made it possible for us to analyze this transcriptional dynamics. The pulseTD package is analyzed with 4sU-seq data to resolve the transcriptional dynamics of the gene. PulseTD constructed based on pulse model can identify the transcriptional dynamic rate of the measurement time node, and can also recognize the continuous change of mRNA transcription dynamics during the monitoring time, and the model can also predict the trend of mRNA transcription and expression changes in the future.
The workflow of the pulseTD packages is:
In the process of calculating the expression value, we use the RPKM calculation method. The user only needs to provide a list of the bam files of the reads alignment result, and the annotation file in the txdb format.The expression value can be obtained by using the estimateExpression() function. Test files are provided in the pulseTD package.
library(pulseTD) library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene test_path <- file.path(system.file(package="pulseTD"),'extdata/test1.sorted.bam') test_path2 <- file.path(system.file(package="pulseTD"),'extdata/test2.sorted.bam')
rpkmres <- estimateExpression(txdb,c(test_path,test_path2), by='gene')
data('rpkmres', package='pulseTD') head(rpkmres$total_exp) head(rpkmres$pre_exp)
pulseTD uses a pulse model, which is multiplied by two sigmoid functions, the parameter vector $(h_{0},h_{1},h_{2},t_{1}, t_{2},\beta)$, where $h_{0},h_{1},h_{2}$ represent the initial state rate, the value, the peak rate value and the steady state rate value again, $t_{1}, t_{2}$ are the maximum times of the first and second rise or fall changes, respectively, and $\beta$ is the slope of the two changes. The unknown vector $X=(\Theta_{\alpha},\Theta_\gamma,\Theta _\beta)$ has 15 parameters. We use the R stats function nlminb to optimize.
data('rpkmSim', package='pulseTD') rpkm_TL <- rpkmSim$labexon[1:2,] rpkm_PT <- rpkmSim$totintr[1:2,] rpkm_TT <- rpkmSim$totexon[1:2,] TimeGrid <- c(0, 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180) tL <- 10 pulseRates<- estimateParams(rpkm_TL, rpkm_TT, rpkm_PT,TimeGrid, tL, clusterNumber=1,loopnumber=10)
In the process of parameter fitting, we adopt the method of random initial value. Due to the size of the random initial value or the overflow of the number of iterations, the parameters of the fitting failure may occur, which may cause the data of some genes lose meaning. Not conducive to our subsequent analysis. To this end, we re-estimate the genes that failed to fit, and if they fail again, these genes will be filtered out. Only need to use the correctionParams function to complete the re-evaluation of the parameters.
data('pulseRates', package='pulseTD') pulseRates_correct = correctionParams(pulseRates) pulseRates_correct@fitfailure
If we need to analyze the steady state characteristics of the transitional dynamics, we can obtain the pulse function parameters corresponding to the transcription, processing and degradation of each gene, and also use the pulseModel to obtain the corresponding rate curve.
data('pulseRates', package='pulseTD') TimeGrid <- c(0, 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180) pulseRates_correct <- correctionParams(pulseRates) transcription_params = getParams(pulseRates,'transcription') degradation_params = getParams(pulseRates, 'degradation') processing_params = getParams(pulseRates, 'processing') head(transcription_params) head(degradation_params) head(processing_params) transcription_params = getParams(pulseRates, 'transcription', genename=c(1,2,3)) head(transcription_params) # get pulse Model value transcription_pulse = pulseModel(as.matrix(transcription_params[1,]), TimeGrid) degradation_pulse = pulseModel(as.matrix(degradation_params[1,]), TimeGrid) processing_pulse = pulseModel(as.matrix(processing_params[1,]), TimeGrid)
The pulseTD method can recognize the transcription, splicing maturation, degradation rate and predict the future trend of transcriptional dynamic rate at any time during the detection period.
This function is used to calculate the transcriptional dynamic rate of a gene. You can get the discrete or continuous rate values of the measurement time node. At the same time, it has a predictive function that provides rate values for any future time node or any range of time.
data('pulseRates', package='pulseTD') pulseRates_correct <- correctionParams(pulseRates) transcription = getRates(pulseRates_correct,'transcription') degradation = getRates(pulseRates_correct, 'degradation') processing = getRates(pulseRates_correct, 'processing') head(transcription)
If you want to get the coefficients of the parameters, you only need to divide the expression on the basis of the rate:
if(length(pulseRates_correct@fitfailure)==0){ genename=pulseRates_correct@genenames }else{ genename=pulseRates_correct@genenames[-pulseRates_correct@fitfailure] } data('rpkmSim', package='pulseTD') simTL <- rpkmSim$labexon[c(1,2,3),] simPT <- rpkmSim$totintr[c(1,2,3),] simTT <- rpkmSim$totexon[c(1,2,3),] trans_factor <- getRates(pulseRates_correct,'transcription', genename=c(1,2,3)) degr_factor <- getRates(pulseRates_correct, 'degradation', genename=c(1,2,3)) /(simTT-simPT) proc_factor <- getRates(pulseRates_correct,'processing',genename=c(1,2,3))/simPT head(degr_factor)
Predicting the transcription dynamic rate requires only adding a specific time series to the getRates function, for example:
transcription_pre <- getRates(pulseRates_correct, 'transcription', timevector=seq(0,360, 15)) degradation_pre <- getRates(pulseRates_correct, 'degradation', timevector=seq(0,360, 15)) processing_pre <- getRates(pulseRates_correct, 'processing', timevector <- seq(0,360, 15)) head(degradation_pre)
data('pulseRates', package='pulseTD') plotRates(pulseRates, 15)
plotRates(pulseRates, 15, predict=c(0,360,15))
This function is used to predict the expression of all gene at a given time, including the expression of pre-mRNA and the expression of total mRNA. End time and time interval can be arbitrarily defined
data('pulseRates', package='pulseTD') pulseRates_correct = correctionParams(pulseRates) TimeGrid = seq(0,180,15)
preExp = predictExpression(pulseRates_correct, tg=TimeGrid)
data('preExp', package='pulseTD') df = data.frame(preExp[['NM_001002011']]) head(df)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.