get_training_data_from_bam_indel: Extract Training Data from BAM File (indels)

View source: R/get_training_data_indels.R

get_training_data_from_bam_indelR Documentation

Extract Training Data from BAM File (indels)

Description

Extracts training data from a BAM file by integrating information from reference and BED files. It processes genomic positions of indels, extracts features, filters based on mismatch rates, and combines positive and negative samples to form the training dataset.

Usage

get_training_data_from_bam_indel(
  bam_path,
  reference_path,
  bed_include_path = NULL,
  factor = 1,
  positions_to_exclude_paths = NULL,
  mm_rate_max = 1
)

Arguments

bam_path

Path to the BAM file.

reference_path

Path to the reference genome file.

bed_include_path

Optional; BED file defining regions to include in the analysis.

factor

The ratio of negative to positive data in the output.

positions_to_exclude_paths

Optional; paths to files defining positions to exclude from training.

mm_rate_max

Maximum mismatch rate allowed in a position.

Value

A list with two elements: data, a data.frame containing the combined positive and negative training data, and info, a data.frame containing metadata about the training set.


JakobPedersenLab/dreams documentation built on Feb. 2, 2024, 3:14 p.m.