buildRefPACdsAnno: Build a reference annotation of PACdataset
In BMILAB/movAPA: movAPA: Modeling and Visualization of Dynamics of Alternative PolyAdenylation

buildRefPACdsAnno

R Documentation

Build a reference annotation of PACdataset

Description

buildRefPACdsAnno builds a reference annotation of PACdataset which can be used for 'mergePACdsByRef'. First high-quality pAs meet min.counts/min.smps/max.width are filtered from PACdsList. Then PACdsList and the refPACds are merged into a single PACds, based on 'd' nt and by=coord/range. For each merged pA, if it is combined from multiple sources, then the final pA range is determined by priority: source=ref >> max.counts >> min.width >> most distant one.

Usage

buildRefPACdsAnno(
  refPACds,
  PACdsList = NULL,
  d = 24,
  by = "coord",
  min.counts = 10,
  min.smps = 10,
  max.width = 100,
  verbose = TRUE
)

Arguments

`refPACds`	a reference PACds, usually is a PACds with high-confidence, such as polyA sites from 3'seq data. refPACds is used as the basic reference to include more high-confident pAs from other sources provided by PACdsList.
`PACdsList`	default is NULL, or can be a PACdataset, or a list of multiple PACdataset objects. The PACds@anno should have columns chr/strand/coord. If it is NULL, then will only reduce refPACds by 'd' nt. min.counts, min.smps, and max.width are used only when PACdsList is not NULL. The name of the list would be used for the 'source' column in the final built refPACds@anno, which represent the source of that pA.
`d`	distance to group nearby PACds, default is 24 nt.
`by`	a charater of "coord" or other string. If coord then use the PA's coord for merging, otherwise use UPA_start and UPA_end in the anno slot for merging PAs.
`min.counts`	pAs with total tagnum >=min.counts will be filtered from PACdsList for building the reference. It can be set as NULL to not filter by counts.
`min.smps`	pAs expressed in >=min.smps samples (columns of counts) will be filtered from PACdsList for building the reference. It can be set as NULL to not filter by number of samples.
`max.width`	pAs spanning a region <=max.width will be filtered from PACdsList for building the reference. It can be set as NULL to not filter by width of pA. If all the above three parametes are set NULL, then all pAs in PACdsList will be used for building reference. This is useful when just want to merge all pAs' ranges in a smarter way than calling `mergePACds`.
`verbose`	TRUE to show message on screen.

Value

A PACdataset representing the reference pAs. The counts slot stores counts of the finally used samples, which is just for reference. The colData slot is simply set as 'group', which is meanningless. The anno slot contains these columns: chr, strand, coord, UPA_start, UPA_end, source, counts, merged_start, merged_end. The 'source' column is the data source of the reference pA, 'ref' for the refPACds, and others for other sources in PACdsList. The merged_start and merged_end are the bigger range of the pA that may be combined from multiple sources, while the UPA_start and UPA_end are the smaller range only retieved from one source (with ref in the first priority) .

Examples

## make example pacds
ds1=makeExamplePACds(seed=123)
## only use ds1 to build ref (which means only reduce ranges by d=24nt)
ref=buildRefPACdsAnno(refPACds=ds1, by='coord', d=24)
table(ref@anno$source)

## change a bit
ds2=ds1
ds2@anno$coord[1:3]=ds2@anno$coord[1:3]-1000
ds2@anno$UPA_start[1:3]=ds2@anno$coord[1:3]-10
ds2@anno$UPA_end[1:3]=ds2@anno$coord[1:3]+5

## using refPACds and high-quality pAs from PACdsList to build ref
ref=buildRefPACdsAnno(refPACds=ds1, PACdsList=list(ds1=ds1, ds2=ds2),
                      by='coord', d=24,
                      min.counts = 50, min.smps=3, max.width=100,
                      verbose=TRUE)
table(ref@anno$source)


## no high-quality from PACdsList, so only pAs in reduced refPACds were output
ref2=buildRefPACdsAnno(refPACds=ds2, PACdsList=list(ds1, ds2),
                       by='range', d=0,
                       min.counts = 5000, min.smps=3, max.width=50,
                       verbose=TRUE)
table(ref2@anno$source)

BMILAB/movAPA documentation built on Jan. 3, 2024, 11:09 p.m.