getSingleMoleculeMatrices: Extract list of methylation matrices

View source: R/bam2methMats.R

getSingleMoleculeMatricesR Documentation

Extract list of methylation matrices

Description

Input requires a sampleTable with two columns: FileName contains the path to a bam file of aligned sequences. SampleName contains the name of the sample. The function will return a table of all matrix-files for all samples and all regions listed in regionGRs, together with some info about the matrices. Matrices contain values between 0 (not methylated) and 1 (methylated), or NA (undefined)

Usage

getSingleMoleculeMatrices(
  sampleTable,
  genomeFile,
  regionGRs,
  regionType,
  genomeMotifGR,
  minConversionRate = 0.8,
  maxNAfraction = 0.2,
  bedFilePrefix = NULL,
  path = ".",
  convRatePlots = FALSE,
  nThreads = 1,
  samtoolsPath = "",
  overwriteMatrixLog = FALSE
)

Arguments

sampleTable

Table with FileName column listing the full path to bam files belonging to the samples listed in the SampleName column

genomeFile

String with path to fasta file with genome sequence

regionGRs

A genomic regions object with all regions for which matrices should be extracted. The metadata columns must contain a column called "ID" with a unique ID for that region.

regionType

A collective name for this list of regions (e.g TSS or amplicons)

genomeMotifGR

A GenomicRanges object with a unique set of non-overlapping CG, GC and GCGorCGC sites

minConversionRate

Minimal fraction of Cs from a non-methylation context that must be converted to Ts for the read to be included in the final matrix (default=0.8)

maxNAfraction

Maximual fraction of CpG/GpC positions that can be undefined (default=0.2) Note that since making the matrix is a time-consuming process, the reads are not removed from the matrix using this threshold, so that a different threshold can be used at a later time, but it is used to give an idea of the read quality in the summary table of the matrices.

bedFilePrefix

The full path and prefix of the bed file for C, G, CG and GC positions in the genome (i.e path and name of the file without the ".C.bed",".G.bed", ".CG.bed" or ".GC.bed" suffix). Defulat is NULL and assumes the bed file are in the same location as the genome sequence file.

path

Path for output. "plots", "csv" and "rds" directories will be created here. Default is current directory.

convRatePlots

Boolean value: should bisulfite conversion rate plots be created for each region? (default=FALSE)

nThreads

number of threads for parallelisation

samtoolsPath

Path to samtools executable (default="") if not in unix $PATH

overwriteMatrixLog

Should matrixLog file be overwritten (in case of change in analysis or data), or should already computed matrices be used and script skips to next matrix (in case of premature termination of analysis) (default=FALSE)

Value

A list (by sample) of lists (by regions) of methylation matrices


jsemple19/methMatrix documentation built on Aug. 19, 2022, 3:57 p.m.