setup_parCPsSearch: Prepare data for predicting cleavage and polyadenylation (CP)...

View source: R/11.setup_CPsSearch.R

setup_parCPsSearchR Documentation

Prepare data for predicting cleavage and polyadenylation (CP) sites using parallel computing

Description

Prepare data for predicting cleavage and polyadenylation (CP) sites using parallel computing

Usage

setup_parCPsSearch(
  sqlite_db,
  genome = getInPASGenome(),
  utr3,
  seqnames,
  background = c("same_as_long_coverage_threshold", "1K", "5K", "10K", "50K"),
  TxDb = getInPASTxDb(),
  future.chunk.size = 1,
  chr2exclude = getChr2Exclude(),
  hugeData = TRUE,
  outdir = getInPASOutputDirectory(),
  silence = FALSE,
  minZ = 2,
  cutStart = 10,
  MINSIZE = 10,
  coverage_threshold = 5
)

Arguments

sqlite_db

A path to the SQLite database for InPAS, i.e. the output of setup_sqlitedb().

genome

An object of BSgenome::BSgenome

utr3

An object of GenomicRanges::GRangesList, the output of extract_UTR3Anno()

seqnames

A character(1), the names of all chromosomes/scaffolds with both coverage and 3' UTR annotation. Users can get this by calling the get_chromosomes().

background

A character(1) vector, the range for calculating cutoff threshold of local background. It can be "same_as_long_coverage_threshold", "1K", "5K","10K", or "50K".

TxDb

an object of GenomicFeatures::TxDb

future.chunk.size

The average number of elements per future ("chunk"). If Inf, then all elements are processed in a single future. If NULL, then argument future.scheduling = 1 is used by default. Users can set future.chunk.size = total number of elements/number of cores set for the backend. See the future.apply package for details.

chr2exclude

A character vector, NA or NULL, specifying chromosomes or scaffolds to be excluded for InPAS analysis. chrM and alternative scaffolds representing different haplotypes should be excluded.

hugeData

A logical(1) vector, indicating whether it is huge data

outdir

A character(1) vector, a path with write permission for storing InPAS analysis results. If it doesn't exist, it will be created.

silence

report progress or not. By default it doesn't report progress.

minZ

A numeric(1), a Z score cutoff value

cutStart

An integer(1) vector a numeric(1) vector. What percentage or how many nucleotides should be removed from 5' extremities before searching for CP sites? It can be a decimal between 0, and 1, or an integer greater than 1. 0.1 means 10 percent, 25 means cut first 25 bases

MINSIZE

A integer(1) vector, specifying the minimal length in bp of a short/proximal 3' UTR. Default, 10

coverage_threshold

An integer(1) vector, specifying the cutoff threshold of coverage for first 100 nucleotides. If the coverage of first 100 nucleotides is lower than coverage_threshold, that transcript will be not considered for further analysis. Default, 5.

Value

A list of list as described below:

background

The type of methods for background coverage calculation

z2s

Z-score cutoff thresholds for each 3' UTRs

depth.weight

A named vector containing depth weight

chr.cov.merge

A list of matrice storing condition/sample- specific coverage for 3' UTR and next.exon.gap (if exist)

conn_next_utr3

A logical vector, indicating whether a 3'UTR has a convergent 3' UTR of its downstream transcript

chr.utr3

A GRangesList, storing extracted 3' UTR annotation of transcript on a given chr

Author(s)

Jianhong Ou, Haibo Liu


jianhong/InPAS documentation built on Jan. 3, 2025, 10:29 p.m.