View source: R/11.setup_CPsSearch.R
setup_CPsSearch | R Documentation |
prepare data for predicting cleavage and polyadenylation (CP) sites
setup_CPsSearch(
sqlite_db,
genome = getInPASGenome(),
chr.utr3,
seqname,
background = c("same_as_long_coverage_threshold", "1K", "5K", "10K", "50K"),
TxDb = getInPASTxDb(),
hugeData = TRUE,
outdir = getInPASOutputDirectory(),
silence = FALSE,
minZ = 2,
cutStart = 10,
MINSIZE = 10,
coverage_threshold = 5
)
sqlite_db |
A path to the SQLite database for InPAS, i.e. the output of
|
genome |
An object of BSgenome::BSgenome |
chr.utr3 |
An object of GenomicRanges::GRanges, an element of the
output of |
seqname |
A character(1), the name of a chromosome/scaffold |
background |
A character(1) vector, the range for calculating cutoff threshold of local background. It can be "same_as_long_coverage_threshold", "1K", "5K","10K", or "50K". |
TxDb |
an object of GenomicFeatures::TxDb |
hugeData |
A logical(1) vector, indicating whether it is huge data |
outdir |
A character(1) vector, a path with write permission for storing InPAS analysis results. If it doesn't exist, it will be created. |
silence |
report progress or not. By default it doesn't report progress. |
minZ |
A numeric(1), a Z score cutoff value |
cutStart |
An integer(1) vector a numeric(1) vector. What percentage or how many nucleotides should be removed from 5' extremities before searching for CP sites? It can be a decimal between 0, and 1, or an integer greater than 1. 0.1 means 10 percent, 25 means cut first 25 bases |
MINSIZE |
A integer(1) vector, specifying the minimal length in bp of a short/proximal 3' UTR. Default, 10 |
coverage_threshold |
An integer(1) vector, specifying the cutoff threshold of coverage for first 100 nucleotides. If the coverage of first 100 nucleotides is lower than coverage_threshold, that transcript will be not considered for further analysis. Default, 5. |
A file storing a list as described below:
The type of methods for background coverage calculation
Z-score cutoff thresholds for each 3' UTRs
A named vector containing depth weight
A matrix storing condition/sample-specific coverage for 3' UTR and next.exon.gap (if exist)
A logical vector, indicating whether a 3'UTR has a convergent 3' UTR of its downstream transcript
A GRangesList, storing extracted 3' UTR annotation of transcript on a given chr
Jianhong Ou, Haibo Liu
if (interactive()) {
library(BSgenome.Mmusculus.UCSC.mm10)
library("TxDb.Mmusculus.UCSC.mm10.knownGene")
genome <- BSgenome.Mmusculus.UCSC.mm10
TxDb <- TxDb.Mmusculus.UCSC.mm10.knownGene
## load UTR3 annotation and convert it into a GRangesList
data(utr3.mm10)
utr3 <- split(utr3.mm10, seqnames(utr3.mm10), drop = TRUE)
bedgraphs <- system.file("extdata", c(
"Baf3.extract.bedgraph",
"UM15.extract.bedgraph"
),
package = "InPAS"
)
tags <- c("Baf3", "UM15")
metadata <- data.frame(
tag = tags,
condition = c("Baf3", "UM15"),
bedgraph_file = bedgraphs
)
outdir <- tempdir()
write.table(metadata,
file = file.path(outdir, "metadata.txt"),
sep = "\t", quote = FALSE, row.names = FALSE
)
sqlite_db <- setup_sqlitedb(
metadata = file.path(
outdir,
"metadata.txt"
),
outdir
)
addLockName(filename = tempfile())
coverage <- list()
for (i in seq_along(bedgraphs)) {
coverage[[tags[i]]] <- get_ssRleCov(
bedgraph = bedgraphs[i],
tag = tags[i],
genome = genome,
sqlite_db = sqlite_db,
outdir = outdir,
chr2exclude = "chrM"
)
}
data4CPsitesSearch <- setup_CPsSearch(sqlite_db,
genome,
chr.utr3 = utr3[["chr6"]],
seqname = "chr6",
background = "10K",
TxDb = TxDb,
hugeData = TRUE,
outdir = outdir
)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.