assembly_correct_gaps: Correct uneven gaps in an assembled genome

View source: R/assembly_correct_gaps.R

assembly_correct_gapsR Documentation

Correct uneven gaps in an assembled genome

Description

Uneven gap sizes can form during genome assembly as different assembling, polishing, and gap closing tools pile on top of each other. This makes submission of a genome assembly problematic, because varying gap sizes are ambiguous, and ideally we want unknown gaps to have a standardised size.

This function takes a genome assembly and corrects gap sizes given a user-define threshold. It is assumed that any gap larger than the threshold is an unknown gap.

Usage

assembly_correct_gaps(genomeSS, threshold, correct)

Arguments

genomeSS

DNAStringSet: the genome assembly as a 'DNAStringSet' object from the Biostrings package.

threshold

Integer: the minimum string of 'N's to consider a gap.

correct

Integer: the standardised number of 'N's for gaps of unknown size.

Value

Returns the genome assembly back as a DNAStringSet object with all gaps meeting the threshold size standardized to the corrected size. Sequences with corrected gaps are labelled with '_correct_gaps'.


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.