threshold_filter: Filter data frames with custom predicates

Description Usage Arguments Details Value See Also Examples

View source: R/analysis-functions.R

Description

\lifecycle

experimental Filter a single data frame or a list of data frames with custom predicates assembled from the function parameters.

Usage

1
threshold_filter(x, threshold, cols_to_compare = "Value", comparators = ">")

Arguments

x

A data frame or a list of data frames

threshold

A numeric/integer vector or a named list of numeric/integer vectors

cols_to_compare

A character vector or a named list of character vectors

comparators

A character vector or a named list of character vectors. Must be one of the allowed values between c("<", ">", "==", "!=", ">=", "<=")

Details

A single data frame as input

If the user chooses to operate on a single data frame, the other parameters should only be vectors: numeric vector for threshold and character vectors for both cols_to_compare and comparators. A filtering condition is obtained by combining element by element cols_to_compare + comparators + threshold (similarly to the paste function). For example:

threshold = c(20, 35, 50) cols_to_compare = c("a", "b", "c") comparators = "<"

given these vectors, the input data frame will be filtered by checking which values in column "a" are less than 20 AND which values in column "b" are less than 35 AND which values in column "c" are less than 50. Things the user should keep in mind are:

A list of data frames as input

The input for the function may also be a list of data frames, either named or unnamed.

Unnamed list

If the input is a simple unnamed list, the other parameters should be simple vectors (as for data frames). All the predicates will simply be applied to every data frame in the list: this is useful if it's desirable to filter for the same conditions different data frames that have the same structure but different data.

Named list

It is also possible to filter different data frames with different sets of conditions. Besides having the possibility of defining the other parameters as simple vector, which has the same results as operating on an unnamed list, the user can define the parameters as named lists containing vectors. For example:

example_df <- tibble::tibble(a = c(20, 30, 40),
                             b = c(40, 50, 60),
                             c = c("a", "b", "c"),
                             d = c(3L, 4L, 5L))
example_list <- list(first = example_df,
                     second = example_df,
                     third = example_df)
print(example_list)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## $first
## # A tibble: 3 x 4
##       a     b c         d
##   <dbl> <dbl> <chr> <int>
## 1    20    40 a         3
## 2    30    50 b         4
## 3    40    60 c         5
## 
## $second
## # A tibble: 3 x 4
##       a     b c         d
##   <dbl> <dbl> <chr> <int>
## 1    20    40 a         3
## 2    30    50 b         4
## 3    40    60 c         5
## 
## $third
## # A tibble: 3 x 4
##       a     b c         d
##   <dbl> <dbl> <chr> <int>
## 1    20    40 a         3
## 2    30    50 b         4
## 3    40    60 c         5
filtered <- threshold_filter(example_list,
                             threshold = list(first = c(20, 60),
                                              third = c(25)),
                             cols_to_compare = list(first = c("a", "b"),
                                                    third = c("a")),
                             comparators = list(first = c(">", "<"),
                                                third = c(">=")))
print(filtered)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## $first
## # A tibble: 1 x 4
##       a     b c         d
##   <dbl> <dbl> <chr> <int>
## 1    30    50 b         4
## 
## $second
## # A tibble: 3 x 4
##       a     b c         d
##   <dbl> <dbl> <chr> <int>
## 1    20    40 a         3
## 2    30    50 b         4
## 3    40    60 c         5
## 
## $third
## # A tibble: 2 x 4
##       a     b c         d
##   <dbl> <dbl> <chr> <int>
## 1    30    50 b         4
## 2    40    60 c         5

The above signature will roughly be translated as:

It is also possible to use some parameters as vectors and some as lists: vectors will be recycled for every element filtered.

filtered <- threshold_filter(example_list,
                             threshold = list(first = c(20, 60),
                                              third = c(25, 65)),
                             cols_to_compare = c("a", "b"),
                             comparators = list(first = c(">", "<"),
                                                third = c(">=", "<=")))

In this example, different threshold and comparators will be applied to the same columns in all data frames.

Things the user should keep in mind are:

Value

A data frame or a list of data frames

See Also

Other Analysis functions: CIS_grubbs(), comparison_matrix(), compute_abundance(), cumulative_count_union(), sample_statistics(), separate_quant_matrices(), top_integrations()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
example_df <- tibble::tibble(
    a = c(20, 30, 40),
    b = c(40, 50, 60),
    c = c("a", "b", "c"),
    d = c(3L, 4L, 5L)
)
example_list <- list(
    first = example_df,
    second = example_df,
    third = example_df
)

filtered <- threshold_filter(example_list,
    threshold = list(
        first = c(20, 60),
        third = c(25)
    ),
    cols_to_compare = list(
        first = c("a", "b"),
        third = c("a")
    ),
    comparators = list(
        first = c(">", "<"),
        third = c(">=")
    )
)

ISAnalytics documentation built on April 9, 2021, 6:01 p.m.