smooth.ratio

Window Size Independent Linear Time Smoothing for Biological Count Data

Note: This software was developed as part of a group project for a graduate level course at the University of Maryland, College Park (CMSC 702). The software was never formally published or peer-reviewed, so please use at your own risk.

Installation

You can use devtools to install smooth.ratio directly from Github:

library(devtools)
install_github('khughitt/smooth.ratio')

Usage Example: Whole Genome Bisulfite Sequencing

Overview

This example demonstrates the use of our smoothing approach for improving the signal-to-noise ratio for Whole Genome Bisulfite Sequencing (WGBS) methylation data.

WGBS data is inherently noisy, and as such the signatures of interesting features such as CpG islands is often obscured by numerous low coverage reads.

The data used for this analysis comes from a study by Hansen et al. (2011)^1, and consists of three components:

  1. cpgsites - Genomic location (offset) of CpG sites in the dataset.
  2. methylation - Number of methylation reads at a given CpG site.
  3. coverage - Total number of reads at a given site.

The first variable (cpgsites) is a mx1 integer vector, while the later two variables are each mxn matrices, each with $n = 6$ separate biological samples.

The smoothing method smooths the methylation values across time, weighting the contribution of each site by the ratio of the number of methylated reads to total reads at the site.

Analysis

First let's load sample data and take a look at what we have:

library(smoothratio)
load(file='data/sample.rda')

# Methylation reads at each site
head(methylation)
summary(methylation)

# Total number of reads at each site
head(coverage)
summary(coverage)

# Methylation locations (offset in Chromosome)
head(cpgsites)
range(cpgsites)

Let's now smooth a 20kb region of the chromosome.

indices = which((cpgsites > 830000) & (cpgsites < 850000))
result = smooth.ratio(cpgsites[indices], methylation[indices,], coverage[indices,], sigma_d=250)

# smoothed curve
head(result@fitted)

Finally, let's create some simple plots of our results.

plot(result, columns=1:2)

Done!

References



khughitt/smooth.ratio documentation built on May 20, 2019, 9:23 a.m.