View source: R/purity-filter.R
purity_filter | R Documentation |
Filter that targets possible contamination between cell lines based on a numeric quantification (likely abundance or sequence count).
purity_filter(
x,
lineages = blood_lineages_default(),
aggregation_key = c("SubjectID", "CellMarker", "Tissue", "TimePoint"),
group_key = c("CellMarker", "Tissue"),
selected_groups = NULL,
join_on = "CellMarker",
min_value = 3,
impurity_threshold = 10,
by_timepoint = TRUE,
timepoint_column = "TimePoint",
value_column = "seqCount_sum"
)
x |
An aggregated integration matrix, obtained via
|
lineages |
A data frame containing cell lineages information |
aggregation_key |
The key used for aggregating |
group_key |
A character vector of column names for re-aggregation.
Column names must be either in |
selected_groups |
Either NULL, a character vector or a data frame for group selection. See details. |
join_on |
Common columns to perform a join operation on |
min_value |
A minimum value to filter the input matrix. Integrations
with a value strictly lower than |
impurity_threshold |
The ratio threshold for impurity in groups |
by_timepoint |
Should filtering be applied on each time point? If
|
timepoint_column |
Column in |
value_column |
Column in |
The input matrix can be re-aggregated with the provided group_key
argument. This key contains the names of the columns to group on
(besides the columns holding genomic coordinates of the integration
sites) and must be contained in at least one of x
or lineages
data frames. If the key is not found only in x
, then a join operation
with the lineages
data frame is performed on the common column(s)
join_on
.
It is possible for the user to specify on which groups the logic of the
filter should be applied to. For example: if we have
group_key = c("HematoLineage")
and we set
selected_groups = c("CD34", "Myeloid","Lymphoid")
it means that a single integration will be evaluated for the filter only
for groups that have the values of "CD34", "Myeloid" and "Lymphoid" in
the "HematoLineage" column.
If the same integration is present in other groups it is
kept as it is. selected_groups
can be set to NULL
if we want
the logic to apply to every group present in the data frame,
it can be set as a simple character vector as the example above if
the group key has length 1 (and there is no need to filter on time point).
If the group key is longer than 1 then the filter is applied only on the
first element of the key.
If a more refined selection on groups is needed, a data frame can be provided instead:
group_key = c("CellMarker", "Tissue") selected_groups = tibble::tribble( ~ CellMarker, ~ Tissue, "CD34", "BM", "CD14", "BM", "CD14", "PB" )
Columns in the data frame should be the same as group key (plus, eventually, the time point column). In this example only those groups identified by the rows in the provided data frame are processed.
A data frame
Other Data cleaning and pre-processing:
aggregate_metadata()
,
aggregate_values_by_key()
,
compute_near_integrations()
,
default_meta_agg()
,
outlier_filter()
,
outliers_by_pool_fragments()
,
realign_after_collisions()
,
remove_collisions()
,
threshold_filter()
data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
aggreg <- aggregate_values_by_key(
x = integration_matrices,
association_file = association_file,
value_cols = c("seqCount", "fragmentEstimate")
)
filtered_by_purity <- purity_filter(
x = aggreg,
value_column = "seqCount_sum"
)
head(filtered_by_purity)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.