View source: R/inference_V2_new.R
calrate | R Documentation |
Calculate gene elongation rate from a pair of Pro-seq or Gro-seq data, using the LSS (least sum of squares) or HMM (hidden Markov model) method.
calrate(
time1file,
time2file,
targetfile = NULL,
gene_ids = NULL,
genomename = "mm10",
time,
strandmethod = 1,
threads = 1,
lencutoff = 70000,
fpkmcutoff = 1,
startshorten = 1000,
endshorten = 1000,
window_num = 40,
method = "LSS",
pythonpath = NULL,
hmmseed = 1234,
difftype = 1,
utr = FALSE,
utrexts = NULL,
textsize = 13,
titlesize = 15,
face = "bold"
)
time1file |
The reference Pro-seq/Gro-seq bam file, corresponding to the experimental condition of no transcriptional inhibitor treatment. Can be a string indicating the directory of the file, or a GAlignmentPairs object, a GAlignments object, or a GRanges object from the original bam file. |
time2file |
The treatment Pro-seq/Gro-seq bam file, corresponding to the experimental condition of transcriptional inhibitor treatment for a specific time (e.g. DRB treatment for 15 min). Can be a string indicating the directory of the bam file, or a GAlignmentPairs object, a GAlignments object, or a GRanges object from the original file. |
targetfile |
A txt file with the genes whose transcriptional rates need
to be calculated. Should contain columns named as chr, start, end, strand,
and gene_id. It can also be NULL, so that the genes in the genome set by
the parameter |
gene_ids |
A vector with gene symbols indicating the ones need to be
analyzed. In addition to |
genomename |
Specify the genome of the genes to be analyzed, when the
parameter |
time |
An integer indicating the inhibitor treatment time difference
between the |
strandmethod |
Indicate the strand specific method used when preparing the sequencing library, can be 1 for the directional ligation method, 2 for the dUTP method, and 0 for a non-strand specific library. In addition, if the sample is sequenced using a single strand method, set it as 3. |
threads |
Number of threads to perform parallelization. Default is 1. |
lencutoff |
The cutoff on gene length (bp). Only genes longer than this cutoff can be considered for analysis. Default is 70000. |
fpkmcutoff |
The cutoff value on gene FPKM. Only genes with an FPKM
value greater than the cutoff in |
startshorten |
Before inferring a gene's transcription rate, its first
1000 bp (or other length) and last 1000 bp (or other length) regions will
be discarded to avoid the unstable reads at the transcription starting and
ending stages. However, these regions' lengths can be changed by setting
this parameter |
endshorten |
Before inferring a gene's transcription rate, its first
1000 bp (or other length) and last 1000 bp (or other length) regions will
be discarded to avoid the unstable reads at the transcription starting and
ending stages. However, these regions' lengths can be changed by setting
this parameter |
window_num |
Before inferring a gene's transcription rate, the function
will divide this gene into 40 bins (or other bin number). For each bin,
the normalized read count ratio between the treatment and the reference
files will be calculated, so a vector with 40 ratios (or other bin number)
will be generated. Then, the LSS or HMM method will be used to find the
transition bin between the gene's transcription inhibited region and the
normal reads region. After that, this identified transition bin and its
downstream neighbor will be merged and expanded to the single-base-pair
level, and the LSS or HMM method will be further used on them to find the
transition base pair in this region. The parameter |
method |
The method to be used for transcription rate inference. The default value is "LSS", so that the least sum of squares method will be used. Can also be "HMM", so that the hidden Markov model will be used. |
pythonpath |
The HMM method is base on |
hmmseed |
The HMM method involves random processes, so a random seed should be set via this parameter to repeat the results. Default value is 1234, can also be other integers, such as 2023. |
difftype |
In most cases, the treatment and reference Pro-seq/Gro-seq
files are from experiments treating cells with transcription inhibitors,
such as DRB (5,6-dichloro-1-beta-d-ribofuranosylbenzimidazole), so that
the normal transcription will be repressed for a specific time, generating
a reads-depleted region upstream of the normal transcription region. For
such inhibitor-based experiments, this parameter |
utr |
In addition to inferring transcription rates from Pro-seq/Gro-seq
data, |
utrexts |
When the former parameter |
textsize |
In addition to returning a data frame to show the inference results, this function will also generate several plots to show them, and the font size for the plot texts is set by this parameter. Default is 13. |
titlesize |
The font size for the plot titles. Default is 15. |
face |
The font face for the plot texts. Default is "bold". |
A list including a slot named "report", which is a data frame with the inferred transcription elongation rates, or genes' proximal and distal polyA sites, as well as other information, such as the genes' coordinates, the results' significance, etc. In addition, the result list also contains other slots, such as "binplots" and "expandplots", which contains the data that can be used to plot the inference results.
library(proRate)
wt0file <- system.file("extdata", "wt0.bam", package = "proRate")
wt15file <- system.file("extdata", "wt15.bam", package = "proRate")
wtrates <- calrate(time1file = wt0file,
time2file = wt15file,
time = 15,
strandmethod = 1,
genomename = "mm10",
lencutoff = 40000,
fpkmcutoff = 1,
threads = 4,
startshorten = 1000,
endshorten = 1000,
window_num = 40,
method = "LSS",
pythonpath = NULL,
difftype = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.