summarize_numerical: Summarize numerical data over groupings of annotated regions
In rcavalcante/annotatr: Annotation of Genomic Regions to Genomic Annotations

summarize_numerical

R Documentation

Summarize numerical data over groupings of annotated regions

Description

Given a GRanges of annotated regions, summarize numerical data columns based on a grouping.

Usage

summarize_numerical(
  annotated_regions,
  by = c("annot.type", "annot.id"),
  over,
  quiet = FALSE
)

Arguments

`annotated_regions`	The `GRanges` result of `annotate_regions()`.
`by`	A character vector of the columns of `as.data.frame(annotated_regions)` to group over. Default is `c(annot.type, annot.id)`.
`over`	A character vector of the numerical columns in `as.data.frame(annotated_regions)` to `count`, take the `mean`, and take the `sd` over after grouping according to the `by` column. NOTE: If more than one value is used, the naming scheme for the resuling `dplyr::tbl` summary columns are `COLNAME_n`, `COLNAME_mean`, `COLNAME_sd`. If `over` has length one, then the column names are `n`, `mean`, `sd`.
`quiet`	Print progress messages (FALSE) or not (TRUE).

Details

NOTE: We do not take the distinct values of seqnames, start, end, annot.type as in the other summarize_*() functions because in the case of a region that intersected two distinct exons, using distinct() would destroy the information of the mean of the numerical column over one of the exons, which is not desirable.

Value

A grouped dplyr::tbl_df, and the count, mean, and sd of the cols by the groupings.

Examples

### Test on a very simple bed file to demonstrate different options

# Get premade CpG annotations
data('annotations', package = 'annotatr')

r_file = system.file('extdata', 'test_read_multiple_data_nohead.bed', package='annotatr')
extraCols = c(pval = 'numeric', mu1 = 'integer', mu0 = 'integer', diff_exp = 'character')
r = read_regions(con = r_file, genome = 'hg19', extraCols = extraCols, rename_score = 'coverage')

a = annotate_regions(
       regions = r,
       annotations = annotations,
       ignore.strand = TRUE)

# Testing over normal by
sn1 = summarize_numerical(
       annotated_regions = a,
       by = c('annot.type', 'annot.id'),
       over = c('coverage', 'mu1', 'mu0'),
       quiet = FALSE)

# Testing over a different by
sn2 = summarize_numerical(
       annotated_regions = a,
       by = c('diff_exp'),
       over = c('coverage', 'mu1', 'mu0'))

rcavalcante/annotatr documentation built on Aug. 22, 2024, 7:33 a.m.