mergePACds: Merge multiple PACdatasets
In BMILAB/movAPA: movAPA: Modeling and Visualization of Dynamics of Alternative PolyAdenylation

mergePACds

R Documentation

Merge multiple PACdatasets

Description

mergePACds groups nearby PACs from single/multiple PACdataset objects, with or without reference PACds. If first reduces all pA regions by d-nt to a single GRanges, and then count each sample in the merged ranges. For reducing GRanges, two ways are provided: with refPACds or without. This function is suitable for both bulk and single-cell data. This function is particularly useful for grouping nearby cleavage sites into PACs. It is also useful When you have multiple PA or PAC files, each file is from one sample. Then you need to merge these PACds into one PACds for DE or other analyses. But after grouping and/or merging, you may need call annotatePAC to annotate the merged PACs by a GFF annotation.

Usage

mergePACds(PACdsList, d = 24, by = "coord", refPACds = NULL)

Arguments

`PACdsList`	a PACdataset, or a list of multiple PACdataset objects. The PACds@anno should have columns chr/strand/coord. If there is no colData in PACds, then the sample label will be set as groupN. If PACdsList is a PACdataset, then will treat it as PA and group nearby PAs into PACs.
`d`	distance to group nearby PACds, default is 24 nt.
`by`	a charater of "coord" or other string. If coord then use the PA's coord for merging, otherwise use UPA_start and UPA_end in the anno slot for merging PAs.
`refPACds`	a reference PACds for merging PACdsList in a smarter way. Providing refPACds for merging is useful when there are multiple large pA lists to be merged, which can prevent generating pAs with a very wide range. If there is reference pAs from 3'seq, it is recommended to use it. Meanwhile, it is also recommended to call `buildRefPACdsAnno` with both refPACds and PACdsList (set min.cells/min.counts/max.width) to get a high-confident pA list as reference. However, if there is no refPACds from 3'seq, it is still encouraged to call `buildRefPACdsAnno` with PACdsList alone (and set either PACds as the ref) to obtain high-quality pAs as ref.

Value

A merged PACdataset. The counts slot stores counts of merged samples. If sample names from different PACdataset objects are any duplicated, then the sample name will be added a suffix .i for each sample in PACds[[i]]. The colData slot stores the merge sample annotation from the first column of each @colData. The anno slot contains these columns: chr, strand, coord, UPA_start, UPA_end. Note: Three columns in previous version of movAPA (or current function mergePACds_v0) – tottag, nPA, maxtag columns – are not output here anymore.

Examples

## make example pacds
ds1=makeExamplePACds(seed=123)
## change a bit for ds2
ds2=ds1
ds2@anno$coord[1:3]=ds2@anno$coord[1:3]-1
ds2@anno$UPA_start[1:3]=ds2@anno$UPA_start[1:3]-10
ds2@anno$UPA_end[1:3]=ds2@anno$UPA_end[1:3]+5

newc=ds2@anno[1:3, ]$coord
oldc=ds1@anno[1:3, ]$coord

### merge two pacds by coord
## without using refPACds
p1=mergePACds(list(ds1, ds2), d=0, by='coord')

## merge with refPACds ds2
## ps uses ds2 as ref,
## so the final output will use ds2's info if a merged pA can from both ds1 and ds2
p2=mergePACds(PACdsList=list(ds1, ds2), refPACds=ds2, d=0, by='coord')

## use ds1 as ref
p3=mergePACds(PACdsList=list(ds1, ds2), refPACds=ds1, d=0, by='coord')

summary(p1)
summary(p2)
summary(p3)

## the width of the final PA ranges
summary(p1@anno$UPA_end-p1@anno$UPA_start)
summary(p2@anno$UPA_end-p2@anno$UPA_start)
summary(p3@anno$UPA_end-p3@anno$UPA_start)

## number of reads will not change after merging
sum(ds1@counts)+sum(ds2@counts)
sum(p1@counts)
sum(p2@counts)
sum(p3@counts)

## all TRUE, because p2 uses ds2 as reference
newc %in% p2@anno$coord
oldc %in% p2@anno$coord
## all TRUE, because p3 uses ds2 as reference
oldc %in% p3@anno$coord
newc %in% p3@anno$coord

#### merge two pacds by ranges
p1=mergePACds(list(ds1, ds2), d=0, by='range')
p2=mergePACds(PACdsList=list(ds1, ds2), refPACds=ds2, d=0, by='range')
p3=mergePACds(PACdsList=list(ds1, ds2), refPACds=ds1, d=0, by='range')

newc=ds2@anno[1:3, ]$UPA_start
oldc=ds1@anno[1:3, ]$UPA_start

## all TRUE, because p2 uses ds2 as reference
newc %in% p2@anno$UPA_start
oldc %in% p2@anno$UPA_start
## all TRUE, because p3 uses ds2 as reference
oldc %in% p3@anno$UPA_start
newc %in% p3@anno$UPA_start

BMILAB/movAPA documentation built on Jan. 3, 2024, 11:09 p.m.