top_perc | R Documentation |
The top_perc
function selects the top percentage of data based on a specified trait and computes summary statistics.
It allows for grouping by additional columns and offers flexibility in the type of statistics calculated.
The function can also retain the selected data if needed.
top_perc(data, perc, trait, by = NULL, type = "mean_sd", keep_data = FALSE)
data |
A
|
perc |
Numeric vector of percentages for data selection
|
trait |
Character string specifying the 'selection column'
|
by |
Optional character vector for 'grouping columns'
|
type |
Statistical summary type
|
keep_data |
Logical flag for data retention
|
A list or data frame:
If keep_data
is FALSE, a data frame with summary statistics.
If keep_data
is TRUE, a list where each element is a list containing summary statistics (stat
) and the selected top data (data
).
The perc
parameter accepts values between -1 and 1. Positive values select the top percentage, while negative values select the bottom percentage.
The function performs initial checks to ensure required arguments are provided and valid.
Grouping by additional columns (by
) is optional and allows for more granular analysis.
The type
parameter specifies the type of summary statistics to compute, with "mean_sd" as the default.
If keep_data
is set to TRUE, the function will return both the summary statistics and the selected top data for each percentage.
rstatix::get_summary_stats()
Statistical summary computation
dplyr::top_frac()
Percentage-based data selection
# Example 1: Basic usage with single trait
# This example selects the top 10% of observations based on Petal.Width
# keep_data=TRUE returns both summary statistics and the filtered data
top_perc(iris,
perc = 0.1, # Select top 10%
trait = c("Petal.Width"), # Column to analyze
keep_data = TRUE) # Return both stats and filtered data
# Example 2: Using grouping with 'by' parameter
# This example performs the same analysis but separately for each Species
# Returns nested list with stats and filtered data for each group
top_perc(iris,
perc = 0.1, # Select top 10%
trait = c("Petal.Width"), # Column to analyze
by = "Species") # Group by Species
# Example 3: Complex example with multiple percentages and grouping variables
# Reshape data from wide to long format for Sepal.Length and Sepal.Width
iris |>
tidyr::pivot_longer(1:2,
names_to = "names",
values_to = "values") |>
mintyr::top_perc(
perc = c(0.1, -0.2),
trait = "values",
by = c("Species", "names"),
type = "mean_sd")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.