View source: R/02-gimap_filter.R
gimap_filter | R Documentation |
This function applies filters to the gimap data. By default it runs both the zero count (across all samples) and the low plasmid cpm filters, but users can select a subset of these filters or even adjust the behavior of each filter
gimap_filter(
.data = NULL,
gimap_dataset,
filter_type = "both",
cutoff = NULL,
filter_zerocount_target_col = NULL,
filter_plasmid_target_col = NULL,
filter_replicates_target_col = NULL,
min_n_filters = 1
)
.data |
Data can be piped in with tidyverse pipes from function to function. But the data must still be a gimap_dataset |
gimap_dataset |
A special dataset structure that is setup using the 'setup_data()' function. |
filter_type |
Can be one of the following: 'zero_count_only', 'low_plasmid_cpm_only' or 'both'. Potentially in the future also 'rep_variation', 'zero_in_last_time_point' or a vector that includes multiple of these filters. |
cutoff |
default is NULL, relates to the low_plasmid_cpm filter; the cutoff for low log2 CPM values for the plasmid time period; if not specified, The lower outlier (defined by taking the difference of the lower quartile and 1.5 * interquartile range) is used |
filter_zerocount_target_col |
default is NULL; Which sample column(s) should be used to check for counts of 0? If NULL and not specified, downstream analysis will select all sample columns |
filter_plasmid_target_col |
default is NULL, and if NULL, will select the first column only; this parameter specifically should be used to specify the plasmid column(s) that will be selected |
filter_replicates_target_col |
default is NULL, Which sample columns are the final time point replicates; If NULL, the last 3 sample columns are used. This is only used by this function to save a list of which pgRNA IDs have a zero count for all of these samples. |
min_n_filters |
default is 1; this parameter defines at least how many/the minimum number of independent filters have to flag a pgRNA construct before the construct is filtered when using a combination of filters You should decide on the appropriate filter based on the results of your QC report. |
a filtered version of the gimap_dataset returned in the $filtered_data section filter_step_run is a boolean reporting if the filter step was run or not (since it's optional) metadata_pg_ids is a subset the pgRNA IDs such that these are the ones that remain in the dataset following completion of filtering transformed_log2_cpm is a subset the log2_cpm data such that these are the ones that remain in the dataset following completion of filtering removed_pg_ids is a record of which pgRNAs are filtered out once filtering is complete all_reps_zerocount_ids is not actually filtered data necessarily. Instead it's just a record of which pgRNAs have a zero count in all final timepoint replicates
gimap_dataset <- get_example_data("gimap", data_dir = tempdir()) %>%
gimap_filter()
# To see filtered data
# gimap_dataset$filtered_data
# If you want to only use a single filter or some subset,
# specify which using the filter_type parameter
gimap_dataset <- get_example_data("gimap") %>%
gimap_filter(filter_type = "zero_count_only")
# or
gimap_dataset <- get_example_data("gimap") %>%
gimap_filter(filter_type = "low_plasmid_cpm_only")
# If you want to use multiple filters and more than one to flag a pgRNA
# construct before it's filtered out, use the `min_n_filters` argument
gimap_dataset <- get_example_data("gimap") %>%
gimap_filter(
filter_type = "both",
min_n_filters = 2
)
# You can also specify which columns the filters will be applied to
gimap_dataset <- get_example_data("gimap") %>%
gimap_filter(
filter_type = "zero_count_only",
filter_zerocount_target_col = c(1, 2)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.