#' Filter CNV calls according to various criteria
#'
#' This function takes a \code{data.frame} representing a set of calls such as
#' generated by function \code{\link{getdels}} and filters them according to
#' different criteria. The function allows filtering based on the number of
#' data points supporting the events, the individuals in which the events are
#' found, or the genomic regions in which the events are located.
#'
#' @param calls a \code{data.frame} of events such as generated by function
#' \code{\link{getdels}}
#' @param overall_minlength a single integer or numeric value. The minimum
#' number of supporting data points for a CNV of any type (homozygous
#' deletion, hemizygous deletion, duplication) to be kept.
#' @param hetdel_minlength a single integer or numeric value. The minimum
#' number of supporting data points for a hemizygous deletion to be kept.
#' @param dup_minlength a single integer or numeric value. The minimum
#' number of supporting data points for a duplication to be kept.
#' @param individuals an optional character vector of samples for which to
#' extract the CNV calls.
#' @param het_sites a \code{data.frame} of genomic ranges to be used for
#' filtering out events located in these regions.
#' @param min_overlap a single numeric value between 0 and 1. The minimal
#' proportion of the length of the event that must overlap with a region
#' listed in \code{het_sites} for this event to be filtered out. A value of
#' 0 results in an overlap of even a single nucleotide to be removed, whereas
#' a value of 1 results in only CNVs entirely located in a specified region to
#' be removed.
#'
#' @return a \code{data.frame} of read counts similar to that given as input,
#' but with events removed according to the specified filters.
#' @export
#'
#' @examples
#' NULL
filter_calls <- function(calls, overall_minlength, hetdel_minlength,
dup_minlength, individuals = NULL,
het_sites = NULL, min_overlap = 0) {
# Keeping only the requested individuals
if(!is.null(individuals)) {
calls <- calls[calls$ind %in% individuals, ]
}
# Removing calls supported by less than overall_minlength
calls <- calls[calls$length >= overall_minlength, ]
# Removing heterozygous deletions supported by less than hetdel_minlength
calls <- calls[!(calls$type == "hetdel" & calls$length < hetdel_minlength), ]
# Same thing with duplications
calls <- calls[!(calls$type == "dup" & calls$length < dup_minlength), ]
if(!is.null(het_sites)) {
if(is.null(min_overlap)) stop("min_overlap must be provided for filtering sites out")
calls <- filter_out(calls, het_sites, min_overlap = min_overlap)
}
calls
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.