Description Usage Arguments Details Value See Also Examples
View source: R/analysis-functions.R
experimental Filter a single data frame or a list of data frames with custom predicates assembled from the function parameters.
1 | threshold_filter(x, threshold, cols_to_compare = "Value", comparators = ">")
|
x |
A data frame or a list of data frames |
threshold |
A numeric/integer vector or a named list of numeric/integer vectors |
cols_to_compare |
A character vector or a named list of character vectors |
comparators |
A character vector or a named list of
character vectors. Must be one of the allowed values between
|
If the user chooses to operate on a single data frame, the other parameters
should only be vectors: numeric vector for threshold
and character
vectors for both cols_to_compare
and comparators
.
A filtering condition is obtained by combining element by element
cols_to_compare
+ comparators
+ threshold
(similarly to the
paste
function). For example:
threshold = c(20, 35, 50)
cols_to_compare = c("a", "b", "c")
comparators = "<"
given these vectors, the input data frame will be filtered by checking which values in column "a" are less than 20 AND which values in column "b" are less than 35 AND which values in column "c" are less than 50. Things the user should keep in mind are:
The vectors of length 1 are going to be recycled if one or
more parameters are longer (in the example, the comparators
value)
If vectors are not of length 1 they must have the same length
Columns to compare, of course, need to be included in the input data frame and need to be numeric/integer
The filtering will perform a logical "AND" on all the conditions, only rows that satisfy ALL the conditions are preserved
The input for the function may also be a list of data frames, either named or unnamed.
If the input is a simple unnamed list, the other parameters should be simple vectors (as for data frames). All the predicates will simply be applied to every data frame in the list: this is useful if it's desirable to filter for the same conditions different data frames that have the same structure but different data.
It is also possible to filter different data frames with different sets of conditions. Besides having the possibility of defining the other parameters as simple vector, which has the same results as operating on an unnamed list, the user can define the parameters as named lists containing vectors. For example:
example_df <- tibble::tibble(a = c(20, 30, 40), b = c(40, 50, 60), c = c("a", "b", "c"), d = c(3L, 4L, 5L)) example_list <- list(first = example_df, second = example_df, third = example_df) print(example_list)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ## $first
## # A tibble: 3 x 4
## a b c d
## <dbl> <dbl> <chr> <int>
## 1 20 40 a 3
## 2 30 50 b 4
## 3 40 60 c 5
##
## $second
## # A tibble: 3 x 4
## a b c d
## <dbl> <dbl> <chr> <int>
## 1 20 40 a 3
## 2 30 50 b 4
## 3 40 60 c 5
##
## $third
## # A tibble: 3 x 4
## a b c d
## <dbl> <dbl> <chr> <int>
## 1 20 40 a 3
## 2 30 50 b 4
## 3 40 60 c 5
|
filtered <- threshold_filter(example_list, threshold = list(first = c(20, 60), third = c(25)), cols_to_compare = list(first = c("a", "b"), third = c("a")), comparators = list(first = c(">", "<"), third = c(">="))) print(filtered)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## $first
## # A tibble: 1 x 4
## a b c d
## <dbl> <dbl> <chr> <int>
## 1 30 50 b 4
##
## $second
## # A tibble: 3 x 4
## a b c d
## <dbl> <dbl> <chr> <int>
## 1 20 40 a 3
## 2 30 50 b 4
## 3 40 60 c 5
##
## $third
## # A tibble: 2 x 4
## a b c d
## <dbl> <dbl> <chr> <int>
## 1 30 50 b 4
## 2 40 60 c 5
|
The above signature will roughly be translated as:
Filter the element "first" in the list by checking that values in column "a" are bigger than 20 AND values in column "b" are less than 60
Don't apply any filter to the element "second" (returns the data frame as is)
Filter the element "third" by checking that values in column "a" are equal or bigger than 25.
It is also possible to use some parameters as vectors and some as lists: vectors will be recycled for every element filtered.
filtered <- threshold_filter(example_list, threshold = list(first = c(20, 60), third = c(25, 65)), cols_to_compare = c("a", "b"), comparators = list(first = c(">", "<"), third = c(">=", "<=")))
In this example, different threshold and comparators will be applied to the same columns in all data frames.
Things the user should keep in mind are:
Names for the list parameters must be the same names in the input list
Only elements explicited in list parameters as names will be filtered
Lengths of both vectors and lists must be consistent
A data frame or a list of data frames
Other Analysis functions:
CIS_grubbs()
,
comparison_matrix()
,
compute_abundance()
,
cumulative_count_union()
,
sample_statistics()
,
separate_quant_matrices()
,
top_integrations()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | example_df <- tibble::tibble(
a = c(20, 30, 40),
b = c(40, 50, 60),
c = c("a", "b", "c"),
d = c(3L, 4L, 5L)
)
example_list <- list(
first = example_df,
second = example_df,
third = example_df
)
filtered <- threshold_filter(example_list,
threshold = list(
first = c(20, 60),
third = c(25)
),
cols_to_compare = list(
first = c("a", "b"),
third = c("a")
),
comparators = list(
first = c(">", "<"),
third = c(">=")
)
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.