getMaxLikelihoodMatrix: Generate a matrix of the scaled likelihood of most likely CpG...

View source: R/Likelihood_functions.R

getMaxLikelihoodMatrixR Documentation

Generate a matrix of the scaled likelihood of most likely CpG status for a multi-sample BSseq object.

Description

This function generates a matrix of the scaled likelihoods for most likely CpG status for a multi-sample BSseq object. Each element of the matrix represents the scaled likelihood of the most likely CpG status for the locus in the sample. If no data is available for a locus in a sample, the entry in the CpGMatrix is 2 (nonCpG) and the corresponding MaxLikelihood is 1/3.

Usage

getMaxLikelihoodMatrix(BSseq, e = NULL, allCpG = FALSE)

Arguments

BSseq

An object of class BSseq.

e

An optional numeric vector representing error rates for each sample. If NULL, the error rate for each sample is estimated using estimateErrorRate.

allCpG

A logical value indicating whether to classify loci as allCpG and non-CpG loci and sum the scaled likelihood of homozygous CpG and heterozygous CpG. Should be the same for getMaxLikelihoodMatrix and getCpGMatrix

Value

A numeric matrix where each row represents a locus, each column represents a sample, and the values correspond to the quality scores.

Author(s)

Søren Blikdal Hansen (soren.blikdal.hansen@sund.ku.dk)

See Also

BSseq for the BSseq class, read.bedMethyl for details on reading data into a BSseq object, estimateErrorRate for estimating the CpG-specific error rate. getCpGs for filtering a single-sample BSseg object. getCpGMatrix for generating a matrix with the most likely CpG status matching the getMaxLikelihoodMatrix.

Examples

# Example input files
infiles <- c(system.file("extdata/HG002_nanopore_test.bedMethyl.gz",
                         package = "bsseq"),
             system.file("extdata/HG002_pacbio_test.bedMethyl.gz",
                         package = "bsseq"))

# Run the function to import data
bsseq <- read.bedMethyl(files = infiles,
                        colData = DataFrame(row.names = c("test_nanopore", 
                                                          "test_pacbio")),
                        strandCollapse = TRUE,
                        verbose = TRUE)

# Single samples can be filtered using the getCpGs function
bsseq_nano <- bsseq[, 1]
bsseq_nano_99All_filtered <- bsseq[getCpGs(bsseq_nano, 
                                           type = "allCpG", threshold = 0.99)]

bsseq_pacbio <- bsseq[, 2]
bsseq_pacbio_99All_filtered <- bsseq[getCpGs(bsseq_pacbio, 
                                             type = "allCpG", threshold = 0.99)]

# For filtering multiple samples, we can use a CpGMatrix and a MaxLikelihoodMatrix
# Construct the CpGMatrix and QualityMatrix for the bsseq object
CpGMatrix <- getCpGMatrix(bsseq, allCpG = TRUE)
MaxLikelihoodMatrix <- getMaxLikelihoodMatrix(bsseq, allCpG = TRUE)

# Filter for allCpG loci with a likelihood > 0.99 in both samples
bsseq_combined_99All_filtered <- bsseq[which(rowAlls(CpGMatrix == 0) 
                                          & rowMins(MaxLikelihoodMatrix) > 0.99)]

hansenlab/bsseq documentation built on June 12, 2025, 7:42 p.m.