length_filter: Remove Length Outliers from BLAST Results

View source: R/length_filter.R

length_filterR Documentation

Remove Length Outliers from BLAST Results

Description

Filters BLAST hits by removing ORFs whose gene (protein) length is an outlier within the corresponding gene group, as defined by the inter-quartile range (IQR). Hits whose length falls outside the interval [Q1 - down_IQR * IQR, Q3 + up_IQR * IQR] are discarded.

Usage

length_filter(Data = bin_genes, down_IQR = 1.5, up_IQR = 1.5)

Arguments

Data

A data frame containing BLAST results. Must include the columns gene (gene symbol) and length (ORF length in amino acids).

down_IQR

Numeric multiplier applied to the IQR for the lower bound (default: 1.5).

up_IQR

Numeric multiplier applied to the IQR for the upper bound (default: 1.5).

Details

  • Filtering is performed within each gene group; outliers are determined independently for every gene symbol.

  • Progress messages report the number of rows before and after filtering.

  • Missing values in length are ignored when computing quantiles.

Value

The input data frame with outlier rows removed. The returned object is ungrouped regardless of the input grouping.


gclink documentation built on Sept. 9, 2025, 5:39 p.m.