View source: R/inference_V2_new.R
mcalrate | R Documentation |
Calculate gene elongation rate for multiple pairs of Pro-seq or Gro-seq data with the LSS (least sum of squares) or HMM (hidden Markov model) method.
mcalrate(
time1files,
time2files,
targetfile = NULL,
gene_ids = NULL,
genomename = "mm10",
times,
strandmethod = 0,
threads = 1,
mergerefs = TRUE,
mergecases = FALSE,
lencutoff = 70000,
fpkmcutoff = 1,
startshorten = 1000,
endshorten = 1000,
window_num = 40,
method = "LSS",
pythonpath = NULL,
hmmseed = 1234,
difftype = 1,
utr = FALSE,
utrexts = NULL,
textsize = 13,
titlesize = 15,
face = "bold"
)
time1files |
The reference Pro-seq/Gro-seq bam files, corresponding to the experimental condition of no transcriptional inhibitor treatment. Can be a vector with elements as strings indicating the directories of the bam files. |
time2files |
The treatment Pro-seq/Gro-seq bam files, corresponding to the treatment of transcriptional inhibitor for specific times (e.g. DRB treatment for 15 min, 30 min, etc). Should be a vector with elements as strings indicating the directories of the bam files. |
targetfile |
A txt file with the genes whose transcriptional rates need
to be calculated. Should contain columns named as chr, start, end, strand,
and gene_id. It can also be NULL, so that the genes in the genome set by
the parameter |
gene_ids |
A vector with gene symbols indicating the ones need to be
analyzed. In addition to |
genomename |
Specify the genome of the genes to be analyzed, when the
parameter |
times |
The treatment time differences between the |
strandmethod |
Indicate the strand specific method used when preparing the sequencing libraries, can be 1 for the directional ligation method, 2 for the dUTP method, and 0 for non-strand specific librares. In addition, if the samples are sequenced using a single strand method, set it as 3. |
threads |
Number of threads to do the parallelization. Default is 1. |
mergerefs |
Whether to merge all the reference data contained in the
|
mergecases |
Whether to merge all the treatment data contained in the
|
lencutoff |
The cutoff on gene length (bp). Only genes longer than this cutoff can be considered for analysis. Default is 70000. |
fpkmcutoff |
The cutoff value on gene FPKM. Only genes with an FPKM value greater than the cutoff in the reference data can be considered for analysis. Default is 1. |
startshorten |
Before inferring a gene's transcription rate, its first
1000 bp (or other length) and last 1000 bp (or other length) regions will
be discarded to avoid the unstable reads at the transcription starting and
ending stages. However, these regions' lengths can be changed by setting
this parameter |
endshorten |
Before inferring a gene's transcription rate, its first
1000 bp (or other length) and last 1000 bp (or other length) regions will
be discarded to avoid the unstable reads at the transcription starting and
ending stages. However, these regions' lengths can be changed by setting
this parameter |
window_num |
Before inferring a gene's transcription rate, the function
will divide this gene into 40 bins (or other bin number). For each bin,
the normalized read count ratio between the treatment and the reference
files will be calculated, so a vector with 40 ratios (or other bin number)
will be generated. Then, the LSS or HMM method will be used to find the
transition bin between the gene's transcription inhibited region and the
normal reads region. After that, this identified transition bin and its
downstream neighbor will be merged and expanded to the single-base-pair
level, and the LSS or HMM method will be further used on them to find the
transition base pair in this region. The parameter |
method |
The method to be used for transcription rate inference. The default value is "LSS", so that the least sum of squares method will be used. Can also be "HMM", so that the hidden Markov model will be used. |
pythonpath |
The HMM method is base on |
hmmseed |
The HMM method involves random processes, so a random seed should be set via this parameter to repeat the results. Default value is 1234, can also be other integers, such as 2023. |
difftype |
In most cases, the treatment and reference Pro-seq/Gro-seq
files are from experiments treating cells with transcription inhibitors,
such as DRB (5,6-dichloro-1-beta-d-ribofuranosylbenzimidazole), so that
the normal transcription will be repressed for a specific time, generating
a reads-depleted region upstream of the normal transcription region. For
such inhibitor-based experiments, this parameter |
utr |
In addition to inferring transcription rates from Pro-seq/Gro-seq
data, |
utrexts |
When the former parameter |
textsize |
In addition to returning a data frame to show the inference results, this function will also generate several plots to show them, and the font size for the plot texts is set by this parameter. Default is 13. |
titlesize |
The font size for the plot titles. Default is 15. |
face |
The font face for the plot texts. Default is "bold". |
A list with several sub-lists and each of them includes a slot named "report", which is a data frame with the inferred transcription rates, or genes' proximal and distal polyA sites, as well as other information, such as the genes' coordinates, the results' significance, etc. A sub-list also contains other slots, such as "binplots" and "expandplots", which contains the data that can be used to plot the inference results.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.