read_beds: Versatile BedGraph reader.

Description Usage Arguments Details Value Examples

View source: R/read_beds.R

Description

Versatile BedGraph reader.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
read_beds(
  files,
  ref_cpgs = NULL,
  colData = NULL,
  genome_name = "hg19",
  batch_size = min(20, length(files)),
  n_threads = 1,
  h5 = FALSE,
  h5_dir = NULL,
  h5_temp = NULL,
  desc = NULL,
  verbose = TRUE,
  zero_based = FALSE,
  replace = FALSE,
  fill = TRUE,
  pipeline = c("Custom", "Bismark_cov", "MethylDackel", "MethylcTools", "BisSNP",
    "BSseeker2_CGmap"),
  stranded = FALSE,
  strand_collapse = FALSE,
  chr_idx = NULL,
  start_idx = NULL,
  end_idx = NULL,
  beta_idx = NULL,
  M_idx = NULL,
  U_idx = NULL,
  strand_idx = NULL,
  cov_idx = NULL
)

Arguments

files

list of strings; file.paths of BED files

ref_cpgs

data.table; list of CpG sites in the tab-delimited format of chr-start-end. Must be zero-based genome.

colData

list of strings; Sample names. Will be derived from filenames if not provided

genome_name

string; Name of genome. Default hg19

batch_size

integer; Max number of files to hold in memory at once. Default 20

n_threads

integer; number of threads to use. Default 1. Be-careful - there is a linear increase in memory usage with number of threads. This option is does not work with Windows OS.

h5

boolean; Should the coverage and methylation matrices be stored as HDF5Array

h5_dir

string; directory to store H5 based object. This can be NULL and the experiment can be manually saved later

h5_temp

string; temporary directory to store hdf5

desc

string; Description of the experiment

verbose

boolean; flag to output messages or not.

zero_based

boolean; flag for whether the input data is zero-based or not

replace

boolean; flag for whether to delete the contents of h5_dir before saving

fill

boolean; flag whether to fill the output matrixes with all CpGs in ref_cpgs. This must be TRUE for HDF5-based experiments.

pipeline

string; Default NULL. Currently supports "Bismark_cov", "MethylDackel", "MethylcTools", "BisSNP", "BSseeker2_CGmap" If not known use idx arguments for manual column assignments.

stranded

boolean; Whether in input data is stranded. Default FALSE

strand_collapse

boolean; whether to collapse the crick strand into watson strand. Default FALSE

chr_idx

integer; column index for chromosome in bedgraph files

start_idx

integer; column index for start position in bedgraph files

end_idx

integer; column index for end position in bedgraph files

beta_idx

integer; column index for beta values in bedgraph files

M_idx

integer; column index for read counts supporting Methylation in bedgraph files

U_idx

integer; column index for read counts supporting Un-methylation in bedgraph files

strand_idx

integer; column index for strand information in bedgraph files

cov_idx

integer; column index for total-coverage in bedgraph files

Details

Reads BED files and generates methylation matrices. Optionally arrays can be serialized as on-disk HDFS5 arrays.

colData should be input as a headered data.table with a column called "Sample" with names matching the input filenames. Any other columns may be added to include relevant data (e.g. cell type, collection date, etc). During input, this is done as a left join on the inputted files, so the input colData may contain rows for samples that are not actually included in the analysis. This data will be updated on any relevant subsets or merges, etc.

There is an assumption that the first input file will contain the maximum methylation score. It would be extremely unlikely that this assumption is invalid.

Value

An object of class scMethrix

Examples

1
2
3
4
## Not run: 
#Do Nothing

## End(Not run)

CompEpigen/scMethrix documentation built on Nov. 6, 2021, 3:09 p.m.