Working with Large Data
In sesame: Tools For Analyzing Illumina Infinium DNA Methylation Arrays

File-based Beta Value Storage

Preprocessing IDATs Directly to Files

When a large number of samples are being analyzed, it is desirable to have random access to specific CpG methylation without loading all the data. SeSAMe provides such interface through the fileSet object which is in essence an indexed file-based numeric matrix.

The one function to generate a fileSet is through the openSesameToFile function. In this case, there is no concrete output from the function. The consequence is the generation of a file at the given path. One can operate on the fileSet by referencing the path to the file.

library(sesame)
options(rmarkdown.html_vignette.check_title = FALSE)

The following openSesameToFile call does three things - generates a file called mybetas. - generates an index file called mybetas_idx.rds - returns a fileSet object which serves as an interface to the two files.

fset <- openSesameToFile('mybetas',
    system.file('extdata',package='sesameData'))

Introduction to fileSet

When printed to console, the number of samples and the number of probes are shown.

fset

One can obtain the samples and probes information with the $ operator.

head(fset$samples) # sample IDs
head(fset$probes) # probe IDs

Query fileSet

One can query the specific CpG by probe name(s) and sample name(s). Note that every query to fset is a disk read. Therefore it can be slower than in-memory processing. Here we only retrieve the beta values for the two probes cg00006414 and cg00007981 in the sample 4207113116_B.

sliceFileSet(fset, '4207113116_B', c('cg00006414','cg00007981'))

Read Existing fileSet

In the previous example, we preprocessed IDATs directly to fileSet. We can also read a pre-existing fileSet using the file path using readFileSet function.

fset <- readFileSet('mybetas')
sliceFileSet(fset, '4207113116_A', 'cg00000292')

Write fileSet by Allocation and Filling

fileSet size is always fixed. One cannot dynamically expand or shrink a fileSet. We can write a fileSet by filling the space one sample by one sample. This is achieved by first allocating the space given the number of samples and the probe IDs (optional if platform is one if HM27, HM450 or EPIC).

fset2 <- initFileSet('mybetas2', 'HM450', c('sample1', 'sample2'))

Then one can fill in the beta values by mapFileSet. Here I am illustrating using a randomly generated beta values.

hypothetical_betas <- setNames(runif(fset2$n), fset2$probes)
mapFileSet(fset2, 'sample2', hypothetical_betas)

The mapped value should be equal to the generated beta value. Let's spot-check.

abs(sliceFileSet(fset2,'sample2','cg00000108') -
        hypothetical_betas['cg00000108']) < 1e-7

Any scripts or data that you put into this service are public.

sesame documentation built on Nov. 15, 2020, 2:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sesame
Tools For Analyzing Illumina Infinium DNA Methylation Arrays

Working with Large Data
In sesame: Tools For Analyzing Illumina Infinium DNA Methylation Arrays

File-based Beta Value Storage

Preprocessing IDATs Directly to Files

Introduction to fileSet

Query fileSet

Read Existing fileSet

Write fileSet by Allocation and Filling

Try the sesame package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

sesame Tools For Analyzing Illumina Infinium DNA Methylation Arrays

Working with Large Data In sesame: Tools For Analyzing Illumina Infinium DNA Methylation Arrays

File-based Beta Value Storage

Preprocessing IDATs Directly to Files

Introduction to fileSet

Query fileSet

Read Existing fileSet

Write fileSet by Allocation and Filling

Try the sesame package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

sesame
Tools For Analyzing Illumina Infinium DNA Methylation Arrays

Working with Large Data
In sesame: Tools For Analyzing Illumina Infinium DNA Methylation Arrays