R/filter_calls.R

Defines functions filter_calls

Documented in filter_calls

#' Filter CNV calls according to various criteria
#'
#' This function takes a \code{data.frame} representing a set of calls such as
#' generated by function \code{\link{getdels}} and filters them according to
#' different criteria. The function allows filtering based on the number of
#' data points supporting the events, the individuals in which the events are
#' found, or the genomic regions in which the events are located.
#'
#' @param calls a \code{data.frame} of events such as generated by function
#'   \code{\link{getdels}}
#' @param overall_minlength a single integer or numeric value. The minimum
#'   number of supporting data points for a CNV of any type (homozygous
#'   deletion, hemizygous deletion, duplication) to be kept.
#' @param hetdel_minlength a single integer or numeric value. The minimum
#'   number of supporting data points for a hemizygous deletion to be kept.
#' @param dup_minlength a single integer or numeric value. The minimum
#'   number of supporting data points for a duplication to be kept.
#' @param individuals an optional character vector of samples for which to
#'   extract the CNV calls.
#' @param het_sites a \code{data.frame} of genomic ranges to be used for
#'   filtering out events located in these regions.
#' @param min_overlap a single numeric value between 0 and 1. The minimal
#'   proportion of the length of the event that must overlap with a region
#'   listed in \code{het_sites} for this event to be filtered out. A value of
#'   0 results in an overlap of even a single nucleotide to be removed, whereas
#'   a value of 1 results in only CNVs entirely located in a specified region to
#'   be removed.
#'
#' @return a \code{data.frame} of read counts similar to that given as input,
#'   but with events removed according to the specified filters.
#' @export
#'
#' @examples
#' NULL
filter_calls <- function(calls, overall_minlength, hetdel_minlength,
                         dup_minlength, individuals = NULL,
                         het_sites = NULL, min_overlap = 0) {

  # Keeping only the requested individuals
  if(!is.null(individuals)) {
    calls <- calls[calls$ind %in% individuals, ]
  }

  # Removing calls supported by less than overall_minlength
  calls <- calls[calls$length >= overall_minlength, ]
  # Removing heterozygous deletions supported by less than hetdel_minlength
  calls <- calls[!(calls$type == "hetdel" & calls$length < hetdel_minlength), ]
  # Same thing with duplications
  calls <- calls[!(calls$type == "dup" & calls$length < dup_minlength), ]

  if(!is.null(het_sites)) {
    if(is.null(min_overlap)) stop("min_overlap must be provided for filtering sites out")
    calls <- filter_out(calls, het_sites, min_overlap = min_overlap)
  }

  calls
}
malemay/delgbs documentation built on Feb. 1, 2024, 8:38 a.m.