summariseOverrep-methods: Summarise Overrepresented Sequences

summariseOverrepR Documentation

Summarise Overrepresented Sequences

Description

Summarise the Overrepresented sequences found in one or more QC files

Usage

summariseOverrep(x, ...)

## S4 method for signature 'FastpData'
summariseOverrep(x, step = c("Before", "After"), min_count = 0, ...)

## S4 method for signature 'FastpDataList'
summariseOverrep(
  x,
  min_count = 0,
  step = c("Before", "After"),
  vals = c("count", "rate"),
  fn = c("mean", "sum", "max"),
  by = c("reads", "sequence"),
  ...
)

## S4 method for signature 'FastqcDataList'
summariseOverrep(
  x,
  min_count = 0,
  vals = c("Count", "Percentage"),
  fn = c("mean", "sum", "max"),
  pattern = ".*",
  ...
)

## S4 method for signature 'FastqcData'
summariseOverrep(
  x,
  min_count = 0,
  vals = c("Count", "Percentage"),
  fn = c("mean", "sum", "max"),
  pattern = ".*",
  by = "Filename",
  ...
)

Arguments

x

An object of a suitable class

...

Not used

step

Can be 'Before', 'After' or both to obtain data from the Before_filtering or After_filtering modules

min_count

Filter sequences with counts less than this value, both before and after filtering

vals

Values to use for creating summaries across multiple files. For FastpDataList objects these can be "count" and/or "rate", whilst for FastqcDataList objects these values can be "Count" and/or "Percentage"

fn

Functions to use when summarising values across multiple files

by

character vector of columns to summarise by. See dplyr::summarise

pattern

Regular expression to filter the Possible_Source column by

Details

This function prepares a useful summary of all over-represented sequences as reported by either fastp or FastQC

Value

A tibble

Tibble columns will vary between Fastp*, FastqcDataList and FastqcData objects. Calling this function on list-type objects will attempt to summarise the presence each over-represented sequence across all files.

In particular, FastqcData objects will provide the requested summary statistics across all sequences within a file

Examples

## For operations on a FastpData object
f <- system.file("extdata/fastp.json.gz", package = "ngsReports")
fp <- FastpData(f)
summariseOverrep(fp, min_count = 100)

## Applying the function to a FastqcDataList
packageDir <- system.file("extdata", package = "ngsReports")
fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE)
fdl <- FastqcDataList(fl)
summariseOverrep(fdl)

# An alternative viewpoint can be obtained using
fdl |> lapply(summariseOverrep) |> dplyr::bind_rows()



steveped/ngsReports documentation built on July 24, 2024, 10:45 a.m.