gimap_filter: A function to run filtering

View source: R/02-gimap_filter.R

gimap_filterR Documentation

A function to run filtering

Description

This function applies filters to the gimap data. By default it runs both the zero count (across all samples) and the low plasmid cpm filters, but users can select a subset of these filters or even adjust the behavior of each filter

Usage

gimap_filter(
  .data = NULL,
  gimap_dataset,
  filter_type = "both",
  cutoff = NULL,
  filter_zerocount_target_col = NULL,
  filter_plasmid_target_col = NULL,
  filter_replicates_target_col = NULL,
  min_n_filters = 1
)

Arguments

.data

Data can be piped in with tidyverse pipes from function to function. But the data must still be a gimap_dataset

gimap_dataset

A special dataset structure that is setup using the 'setup_data()' function.

filter_type

Can be one of the following: 'zero_count_only', 'low_plasmid_cpm_only' or 'both'. Potentially in the future also 'rep_variation', 'zero_in_last_time_point' or a vector that includes multiple of these filters.

cutoff

default is NULL, relates to the low_plasmid_cpm filter; the cutoff for low log2 CPM values for the plasmid time period; if not specified, The lower outlier (defined by taking the difference of the lower quartile and 1.5 * interquartile range) is used

filter_zerocount_target_col

default is NULL; Which sample column(s) should be used to check for counts of 0? If NULL and not specified, downstream analysis will select all sample columns

filter_plasmid_target_col

default is NULL, and if NULL, will select the first column only; this parameter specifically should be used to specify the plasmid column(s) that will be selected

filter_replicates_target_col

default is NULL, Which sample columns are the final time point replicates; If NULL, the last 3 sample columns are used. This is only used by this function to save a list of which pgRNA IDs have a zero count for all of these samples.

min_n_filters

default is 1; this parameter defines at least how many/the minimum number of independent filters have to flag a pgRNA construct before the construct is filtered when using a combination of filters You should decide on the appropriate filter based on the results of your QC report.

Value

a filtered version of the gimap_dataset returned in the $filtered_data section filter_step_run is a boolean reporting if the filter step was run or not (since it's optional) metadata_pg_ids is a subset the pgRNA IDs such that these are the ones that remain in the dataset following completion of filtering transformed_log2_cpm is a subset the log2_cpm data such that these are the ones that remain in the dataset following completion of filtering removed_pg_ids is a record of which pgRNAs are filtered out once filtering is complete all_reps_zerocount_ids is not actually filtered data necessarily. Instead it's just a record of which pgRNAs have a zero count in all final timepoint replicates

Examples



gimap_dataset <- get_example_data("gimap", data_dir = tempdir()) %>%
  gimap_filter()

# To see filtered data
# gimap_dataset$filtered_data

# If you want to only use a single filter or some subset,
# specify which using the filter_type parameter
gimap_dataset <- get_example_data("gimap") %>%
  gimap_filter(filter_type = "zero_count_only")
# or
gimap_dataset <- get_example_data("gimap") %>%
  gimap_filter(filter_type = "low_plasmid_cpm_only")

# If you want to use multiple filters and more than one to flag a pgRNA
# construct before it's filtered out, use the `min_n_filters` argument
gimap_dataset <- get_example_data("gimap") %>%
  gimap_filter(
    filter_type = "both",
    min_n_filters = 2
  )

# You can also specify which columns the filters will be applied to
gimap_dataset <- get_example_data("gimap") %>%
  gimap_filter(
    filter_type = "zero_count_only",
    filter_zerocount_target_col = c(1, 2)
  )


gimap documentation built on June 8, 2025, 10:13 a.m.