Overview

I need to benchmark the SparseAssays class with the JointSparseAssays class using WGBS data for various sized m-tuples, with $NIL = 0$ and $NIL \geq 0$. I use the ADS sample from the Lister dataset.

Installation

To install SparseSummarizedExperiment, we need to use the development version of Bioconductor. We use packrat to avoid polluting the general R library:

if (!("packrat" %in% installed.packages())) {
  install.packages("packrat")
}
library(packrat)
packrat::init()

Now, install the development version of Bioconductor:

source("http://bioconductor.org/biocLite.R")
useDevel()

Now, install SparseSummarizedExperiment:

install.packages("devtools")
devtools::install_github("PeteHaitch/SparseSummarizedExperiment")
library(SparseSummarizedExperiment)

Also use

install.packages(c("data.table", "pryr"))
devtools::install_github("PeteHaitch/GenomicTuples@devel")
devtools::install_github("PeteHaitch/MethylationTuples")
library(data.table)
library(pryr)
library(GenomicTuples)

Convert methtuple output to a MethPat and SparseMethPat objects

Read in some data

ADS_2 <- fread("DM_ADS.CG.2.tsv")
pryr::object_size(ADS_2)

ADS_2ac <- fread("DM_ADS.CG.2ac.tsv")
pryr::object_size(ADS_2ac)

ADS_4 <- fread("DM_ADS.CG.4.tsv")
pryr::object_size(ADS_4)

SparseMethPat object

Convert genomic co-ordinates into MTuples object

mt_ADS_2 <- MTuples(gtuples = GTuples(seqnames = ADS_2[, chr],
                                      tuples = as.matrix(ADS_2[, .(pos1, pos2)]),
                                      strand = ADS_2[, strand]),
                    methinfo = MethInfo("CG"))
mt_ADS_2ac <- MTuples(gtuples = GTuples(seqnames = ADS_2ac[, chr],
                                        tuples = as.matrix(ADS_2ac[, .(pos1, pos2)]),
                                        strand = ADS_2ac[, strand]),
                      methinfo = MethInfo("CG"))
mt_ADS_4 <- MTuples(gtuples = GTuples(seqnames = ADS_4[, chr],
                                      tuples = as.matrix(ADS_4[, .(pos1, pos2, pos3, pos4)]),
                                      strand = ADS_4[, strand]),
                    methinfo = MethInfo("CG"))

Convert counts into SparseAssays object

counts <- SparseSummarizedExperiment:::.sparsify(ADS_2[, .(MM, MU, UM, UU)])
# TODO: Setting colnames shouldn't be necessary, I thought?
colnames(counts[["data"]]) <- c("MM", "MU", "UM", "UU")
sa_ADS_2 <- SparseAssays(SimpleList(counts =  SimpleList(ADS = counts)))
pryr::object_size(sa_ADS_2)

# TODO: 2ac, 4

Convert counts into JointSparseAssays objects

TODO: The JointSparseAssays class is rather fiddly for which to define [ and [<- methods. Working on this is a low priority, but I will try to make methods for RangedSummarizedExperiment object agnostic to whether the SparseAssays slot is a SparseAssays or JointSparseAssays object.

Create the SparseMethPet object

smp_ADS_2 <- SparseMethPat(sparseAssays = sa_ADS_2, 
                           rowTuples = mt_ADS_2,
                           colData = DataFrame(row.names = "ADS"))
pryr::object_size(smp_ADS_2)
# TODO: 2ac, 4


PeteHaitch/SparseSummarizedExperiment documentation built on May 8, 2019, 1:31 a.m.