filter_mismatch_positions: Filter Mismatch Positions in Sequencing Data

View source: R/get_training_data.R

filter_mismatch_positionsR Documentation

Filter Mismatch Positions in Sequencing Data

Description

This function filters read positions from sequencing data based on mismatch rates and specific genomic regions. It is designed to process data from BAM files and can include or exclude specific regions and positions as needed.

Usage

filter_mismatch_positions(
  read_positions,
  bam_file,
  mm_rate_max = 1,
  bed_include_path = NULL,
  positions_to_exclude_paths = NULL
)

Arguments

read_positions

A dataframe containing positions of reads (fx df obtained from extract_features_from_bam()) . Each row should represent a unique read position with relevant information such as observed nucleotides.

bam_file

A string specifying the path to a BAM file.

mm_rate_max

A numeric value representing the maximum mismatch rate allowed for positions. Defaults to 1. Positions with a mismatch rate higher than this threshold will be excluded.

bed_include_path

An optional string specifying the path to a BED file that contains genomic regions to include in the analysis. If NULL, all regions are included.

positions_to_exclude_paths

An optional vector of strings specifying paths to files containing positions that should be excluded from the analysis. If NULL, no positions are excluded.

Value

A list with two elements: data containing the filtered read positions dataframe, and info containing a dataframe with summary information such as the number of mismatches and total coverage.


JakobPedersenLab/dreams documentation built on Feb. 2, 2024, 3:14 p.m.