QFeatures-filtering: Filter features based on their rowData

QFeatures-filteringR Documentation

Filter features based on their rowData

Description

The filterFeatures methods enables users to filter features based on a variable in their rowData. The features matching the filter will be returned as a new object of class QFeatures. The filters can be provided as instances of class AnnotationFilter (see below) or as formulas.

Usage

VariableFilter(field, value, condition = "==", not = FALSE)

## S4 method for signature 'QFeatures,AnnotationFilter'
filterFeatures(object, filter, i, na.rm = FALSE, keep = FALSE, ...)

## S4 method for signature 'QFeatures,formula'
filterFeatures(object, filter, i, na.rm = FALSE, keep = FALSE, ...)

Arguments

field

character(1) refering to the name of the variable to apply the filter on.

value

character() or integer() value for the CharacterVariableFilter and NumericVariableFilter filters respectively.

condition

character(1) defining the condition to be used in the filter. For NumericVariableFilter, one of "==", "!=", ">", "<", ">=" or "<=". For CharacterVariableFilter, one of "==", "!=", "startsWith", "endsWith" or "contains". Default condition is "==".

not

logical(1) indicating whether the filtering should be negated or not. TRUE indicates is negated (!). FALSE indicates not negated. Default not is FALSE, so no negation.

object

An instance of class QFeatures.

filter

Either an instance of class AnnotationFilter or a formula.

i

A numeric, logical or character vector pointing to the assay(s) to be filtered.

na.rm

logical(1) indicating whether missing values should be removed. Default is FALSE.

keep

logical(1) indicating whether to keep the features of assays for which at least one of the filtering variables are missing in the rowData. When FALSE (default), all such assay will contain 0 features; when TRUE, the assays are untouched.

...

Additional parameters. Currently ignored.

Value

An filtered QFeature object.

The filtering procedure

filterFeatures() will go through each assay of the QFeatures object and apply the filtering on the corresponding rowData. Features that do not pass the filter condition are removed from the assay. In some cases, one may want to filter for a variable present in some assay, but not in other. There are two options: either provide keep = FALSE to remove all features for those assays (and thus leaving an empty assay), or provide keep = TRUE to ignore filtering for those assays.

Because features in a QFeatures object are linked between different assays with AssayLinks, the links are automatically updated. However, note that the function doesn't propagate the filter to parent assays. For example, suppose a peptide assay with 4 peptides is linked to a protein assay with 2 proteins (2 peptides mapped per protein) and you apply filterFeatures(). All features pass the filter except for one protein. The peptides mapped to that protein will remain in the QFeatures object. If propagation of the filtering rules to parent assay is desired, you may want to use x[i, , ] instead (see the Subsetting section in ?QFeature).

Variable filters

The variable filters are filters as defined in the AnnotationFilter package. In addition to the pre-defined filter, users can arbitrarily set a field on which to operate. These arbitrary filters operate either on a character variables (as CharacterVariableFilter objects) or numerics (as NumericVariableFilters objects), which can be created with the VariableFilter constructor.

Author(s)

Laurent Gatto

See Also

The QFeatures man page for subsetting and the QFeatures vignette provides an extended example.

Examples


## ----------------------------------------
## Creating character and numberic
## variable filters
## ----------------------------------------

VariableFilter(field = "my_var",
               value = "value_to_keep",
               condition = "==")

VariableFilter(field = "my_num_var",
               value = 0.05,
               condition = "<=")

example(aggregateFeatures)

## ----------------------------------------------------------------
## Filter all features that are associated to the Mitochondrion in
## the location feature variable. This variable is present in all
## assays.
## ----------------------------------------------------------------

## using the forumla interface, exact mathc
filterFeatures(feat1, ~  location == "Mitochondrion")

## using the forumula intefrace, martial match
filterFeatures(feat1, ~startsWith(location, "Mito"))

## using a user-defined character filter
filterFeatures(feat1, VariableFilter("location", "Mitochondrion"))

## using a user-defined character filter with partial match
filterFeatures(feat1, VariableFilter("location", "Mito", "startsWith"))
filterFeatures(feat1, VariableFilter("location", "itochon", "contains"))

## ----------------------------------------------------------------
## Filter all features that aren't marked as unknown (sub-cellular
## location) in the feature variable
## ----------------------------------------------------------------

## using a user-defined character filter
filterFeatures(feat1, VariableFilter("location", "unknown", condition = "!="))

## using the forumula interface
filterFeatures(feat1, ~ location != "unknown")

## ----------------------------------------------------------------
## Filter features that have a p-values lower or equal to 0.03
## ----------------------------------------------------------------

## using a user-defined numeric filter
filterFeatures(feat1, VariableFilter("pval", 0.03, "<="))

## using the formula interface
filterFeatures(feat1, ~ pval <= 0.03)

## you can also remove all p-values that are NA (if any)
filterFeatures(feat1, ~ !is.na(pval))

## ----------------------------------------------------------------
## Negative control - filtering for an non-existing markers value,
## returning empty results.
## ----------------------------------------------------------------

filterFeatures(feat1, VariableFilter("location", "not"))

filterFeatures(feat1, ~ location == "not")

## ----------------------------------------------------------------
## Filtering for a  missing feature variable. The outcome is controled
## by keep
## ----------------------------------------------------------------
data(feat2)

filterFeatures(feat2, ~ y < 0)

filterFeatures(feat2, ~ y < 0, keep = TRUE)

## ----------------------------------------------------------------
## Example with missing values
## ----------------------------------------------------------------

data(feat1)
rowData(feat1[[1]])[1, "location"] <- NA
rowData(feat1[[1]])

## The row with the NA is not removed
rowData(filterFeatures(feat1, ~ location == "Mitochondrion")[[1]])
rowData(filterFeatures(feat1, ~ location == "Mitochondrion", na.rm = FALSE)[[1]])

## The row with the NA is removed
rowData(filterFeatures(feat1, ~ location == "Mitochondrion", na.rm = TRUE)[[1]])

## Note that in situations with missing values, it is possible to
## use the `%in%` operator or filter missing values out
## explicitly.

rowData(filterFeatures(feat1, ~ location %in% "Mitochondrion")[[1]])
rowData(filterFeatures(feat1, ~ location %in% c(NA, "Mitochondrion"))[[1]])

## Explicit handling
filterFeatures(feat1, ~ !is.na(location) & location == "Mitochondrion")

## Using the pipe operator
feat1 |>
   filterFeatures( ~ !is.na(location)) |>
   filterFeatures( ~ location == "Mitochondrion")

lgatto/Features documentation built on April 17, 2024, 3:24 p.m.