Working with HDF5-based object

read_beds function

For memory efficient read in, one can use an HDF5 based scMethrix object. A small number of bedgraph files are in the memory at the same time, while the resulting object won't be stored in the memory, but on-disk.

Additional arguments to use HDF5:

meth <- read_beds(
  files = bed_files,
  ref_cpgs = mm19_cpgs,
  chr_idx = 1,
  start_idx = 2,
  strand_idx = 3,
  cov_idx = 4,
  M_idx = 5,
  stranded = FALSE,
  zero_based = TRUE, 
  #collapse_strands = FALSE, 
  colData = sample_anno, 
  batch_size = 2,
  h5 = TRUE
)

Basic scMethrix operations work with HDF5-based objects as well. Functions relying on external packages (e.g. imputation and clustering) will require casting to an in-memory matrix before processing.

meth <- scMethrix::remove_uncovered(meth)

It is also possible to transform non-HDF5-based objects to HDF5-based ones and back.

m <- convert_HDF5_scMethrix(meth)
m2 <- convert_scMethrix(m)

Saving and loading

Saving and loading of an HDF5-based object is not possible using the standard save or saveRDS functions. scMethrix offers easy to use saving and loading tools, which are essentially wrappers around the saveHDF5SummarizedExperiment and loadHDF5SummarizedExperiment functions.

target_dir = paste0( getwd(), '/temp/')
save_HDF5_methrix(meth, dir = target_dir, replace = TRUE)
meth <- load_HDF5_methrix(dir = target_dir)

Working with large number of samples

The primary goal of scMethrix is to allow users to handle the whole-genome methylation data. The functions are optimized to keep the speed high and the memory need low. However, additional efforts were taken to allow scMethrix to handle large number of samples (even > 1000) in the samples in the same, efficient way. Therefore, many functions implement the argument batch_size to split these datasets into digestible chunks and n_threads to parallelize the processing of these chunks. Functions currently supporting the arguments batch_size and n_threads: read_beds get_region_summary

The multicore option is platform independent.



CompEpigen/scMethrix documentation built on Nov. 6, 2021, 3:09 p.m.