sesamizeToBeds: process IDAT pairs directly into tabix'ed BED files using...

Description Usage Arguments Details Value Examples

View source: R/sesamizeToBeds.R

Description

Given either a character vector or a list-like object (e.g. a data.frame), with elements $Basename and (optionally) $Sample_Name if using sample names, write out beta values into a BED file per sample (see Details, below). If a data.frame is provided, the IDATs may be from multiple platforms; sesame::readIDATpair() will try to determine which platform they came from.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
sesamizeToBeds(
  target,
  refversion = c("hg19", "hg38"),
  renameBeds = TRUE,
  rschProbes = FALSE,
  tabix = TRUE,
  verbose = TRUE,
  BPPARAM = SerialParam(),
  ...
)

Arguments

target

an IDAT stub (e.g. "5723646052_R02C02") or a data.frame

refversion

a reference genome assembly: "hg19" (default) or "hg38"

renameBeds

if target has an element named Sample_Name, use it? (TRUE)

rschProbes

retain SNP and CpH probes in the output? (FALSE)

tabix

compress and tabix the file(s) generated? (TRUE)

verbose

be verbose while processing? (TRUE)

BPPARAM

a BiocParallelParam object of some sort (SerialParam())

...

additional arguments for sesame::getBetas()

Details

If target is a stub (which can be a path), read the pair of IDATs for it, process typically (noob/nonlinearDyeBias/pOOBAH), mask typically, and write out a BED file for the beta value (M+15/(M+15+U+15)) at each probe.

The BED file(s) will be named target.platform.meth.refversion.bed.gz, e.g.

5723646052_R02C02.HM450.meth.hg19.bed.gz

in the single-sample hm450 example below. This is tabixed if tabix = TRUE, which by default it is, so the corresponding tabix index will be

5723646052_R02C02.HM450.meth.hg19.bed.gz.tbi

If target is a data.frame with columns Basename and, perhaps, Sample_Name, process a number of such files in parallel, using Basename as the stub and, if column Sample_Name is present and renameBeds is true, substitute this in as the prefix for the BED files. In the hm450 example below, this yields

GroupA_3.HM450.meth.hg19.bed.gz # and GroupA_3.HM450.meth.hg19.bed.gz.tbi ... GroupB_2.HM450.meth.hg19.bed.gz # and GroupB_2.HM450.meth.hg19.bed.gz.tbi

If renameBeds is FALSE, the files will be named as for single-sample runs. The resulting BED files, regardless of arguments, are always of the format

chrom start end probeName value *

If BED files exist for the samples being processed, they may be overwritten.

The platform is included in the BED filename for use in annotating across multiple BED files or platforms as necessary.

Value

the filename(s) of the generated BED file(s) and tabix index(es)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# 450k data, hg19 mappings
if (require("minfiData")) {
  hm450BaseDir <- system.file("extdata", package = "minfiData")
  hm450Sheet <- minfi::read.metharray.sheet(hm450BaseDir) 
  hm450Files <- sesamizeToBeds(hm450Sheet[1, ]) # single-sample list
  unlink(hm450Files)
  hm450Files <- sesamizeToBeds(hm450Sheet$Basename[1]) # single-sample string
  unlink(hm450Files)
  with(hm450Sheet, sesamizeToBeds(Basename)) # multi-sample character vector
}

# EPIC data, hg38 mappings
if (require("minfiDataEPIC")) {
  epicBaseDir <- system.file("extdata", package = "minfiDataEPIC")
  epicSheet <- minfi::read.metharray.sheet(epicBaseDir) 
  epicFiles <- sesamizeToBeds(epicSheet$Basename[1], refversion="hg38")
  unlink(epicFiles)
  sesamizeToBeds(epicSheet, refversion="hg38") # multi-sample df 
}

trichelab/h5testR documentation built on July 12, 2020, 5:18 p.m.