reducePeaks: Reduce peaks number in a countsfile and/or peaksfile

View source: R/scAPAtrap_funlib.R

reducePeaksR Documentation

Reduce peaks number in a countsfile and/or peaksfile

Description

Reduce peaks in a countsfile or counts table and also remove same peaks in peaksfile (if provided), by min.cells/max.cells, and min.counts/max.counts. This function is useful to retrieve highly expressed, lowly expressed peaks or moderately expressed peaks.

Usage

reducePeaks(
  countsfile,
  peaksfile = NULL,
  min.cells = 10,
  min.count = 10,
  max.cells = NULL,
  max.count = NULL,
  suffix = ".reduced",
  toSparse = FALSE,
  ...
)

Arguments

countsfile

The decompressed file path of counts.tsv.gz generated by countPeaks, or the count table with three columns.

peaksfile

peaksfile or peak table with five columns. If not NULL, then filter peaksfile after filtering countsfile.

min.cells

retain peaks expressed in >= min.cells, the default value is 10.

min.count

retain peaks with read count >= min.count, the default value is 10.

max.cells

retain peaks expressed in < max.cells, the default value is NULL (unlimited). This is used to filter peaks with less expression.

max.count

retain peaks with read count < max.count, the default value is NULL (unlimited). This is used to filter peaks with less expression.

suffix

applicable when countsfile and peaksfile are both provided. Then counts and peaks will be output to <countsfile>.reduced; <peaksfile>.reduced.

toSparse

to output a sparseMatrix (gene-cell) or keep the triplet table as input.

...

Arguments passed to other methods and/or advanced arguments. Advanced arguments:

verbose

If 'TRUE' basic status updates will be printed along the way.

logf

If not NULL, then it should be a character string denoting a file name. Then message will be written to 'logf'.

Value

A data.frame (toSparse=FALSE), or a sparse Matrix (toSparse=TRUE) of counts, or a filename list with (countsfile, peaksfile) (if peaksfile is not NULL).

Examples

## Not run: 
countsfile='../dataFly/APA.tails.no/counts.tsv.gz'
peaksfile='../dataFly/APA.tails.no/peaks-notails.saf'
## only filter countsfile or counts-table, return a df
reducePeaks(countsfile, min.cells = 10, min.count = 10, toSparse=FALSE)

## retain large peaks, and output both counts and peaks, save to .reduced file (>=10 & >=50)
reducePeaks(countsfile=countsfile, peaksfile=peaksfile, min.cells = 10, min.count = 50)

## retain low-expressed peaks, and output both counts and peaks, save to .reduced file (<=9 & <=49)
reducePeaks(countsfile=countsfile, peaksfile=peaksfile, max.cells = 9, max.count = 49,
            min.cells=NULL, min.count=NULL, suffix='.small')

<=9
reducePeaks(countsfile=countsfile, peaksfile=peaksfile, max.cells = 9,  max.count=NULL,
           min.cells=NULL, min.count=NULL, suffix='.smallcells')

<=49
reducePeaks(countsfile=countsfile, peaksfile=peaksfile, max.cells = NULL,  max.count=49,
            min.cells=NULL, min.count=NULL, suffix='.smallcounts')

smallpeaks=.loadPeaks('../dataFly/APA.tails.no/peaks-notails.saf.small')
largepeaks=.loadPeaks('../dataFly/APA.tails.no/peaks-notails.saf.reduced')
smallpeaks1=.loadPeaks('../dataFly/APA.tails.no/peaks-notails.saf.smallcells')
smallpeaks2=.loadPeaks('../dataFly/APA.tails.no/peaks-notails.saf.smallcounts')
fullpeaks=.loadPeaks(peaksfile)
nrow(smallpeaks); nrow(largepeaks); nrow(fullpeaks)
smallset2=unique(rbind(smallpeaks1, smallpeaks2))
nrow(smallset2) + nrow(largepeaks);  nrow(fullpeaks)
## should be the same, but if not the same, may be some peakIDs in fullpeaks are not in the counts table


## End(Not run)

BMILAB/scAPAtrap documentation built on Oct. 13, 2023, 2:36 a.m.