summarize_numerical: Summarize numerical data over groupings of annotated regions

View source: R/summarize.R

summarize_numericalR Documentation

Summarize numerical data over groupings of annotated regions

Description

Given a GRanges of annotated regions, summarize numerical data columns based on a grouping.

Usage

summarize_numerical(
  annotated_regions,
  by = c("annot.type", "annot.id"),
  over,
  quiet = FALSE
)

Arguments

annotated_regions

The GRanges result of annotate_regions().

by

A character vector of the columns of as.data.frame(annotated_regions) to group over. Default is c(annot.type, annot.id).

over

A character vector of the numerical columns in as.data.frame(annotated_regions) to count, take the mean, and take the sd over after grouping according to the by column. NOTE: If more than one value is used, the naming scheme for the resuling dplyr::tbl summary columns are COLNAME_n, COLNAME_mean, COLNAME_sd. If over has length one, then the column names are n, mean, sd.

quiet

Print progress messages (FALSE) or not (TRUE).

Details

NOTE: We do not take the distinct values of seqnames, start, end, annot.type as in the other summarize_*() functions because in the case of a region that intersected two distinct exons, using distinct() would destroy the information of the mean of the numerical column over one of the exons, which is not desirable.

Value

A grouped dplyr::tbl_df, and the count, mean, and sd of the cols by the groupings.

Examples

### Test on a very simple bed file to demonstrate different options

# Get premade CpG annotations
data('annotations', package = 'annotatr')

r_file = system.file('extdata', 'test_read_multiple_data_nohead.bed', package='annotatr')
extraCols = c(pval = 'numeric', mu1 = 'integer', mu0 = 'integer', diff_exp = 'character')
r = read_regions(con = r_file, genome = 'hg19', extraCols = extraCols, rename_score = 'coverage')

a = annotate_regions(
       regions = r,
       annotations = annotations,
       ignore.strand = TRUE)

# Testing over normal by
sn1 = summarize_numerical(
       annotated_regions = a,
       by = c('annot.type', 'annot.id'),
       over = c('coverage', 'mu1', 'mu0'),
       quiet = FALSE)

# Testing over a different by
sn2 = summarize_numerical(
       annotated_regions = a,
       by = c('diff_exp'),
       over = c('coverage', 'mu1', 'mu0'))


rcavalcante/annotatr documentation built on March 25, 2023, 9:51 a.m.