Find the "best" record within subgroups of a dataframe.

Description

Finding the an extreme record for each group within a dataset is a more challenging routine task in R and SQL. This function provides a easy interface to that functionality either using R (fast for small data frames) or SQL (fastest for large data)

Usage

1
bestBy(df, by, best, clmns=names(df), inverse=FALSE, sql=FALSE)

Arguments

df

a data frame.

by

the factor (or name of a factor in df) used to determine the grouping.

clmns

the colums to include in the output.

best

the column to sort on (both globally and for each sub/group)

inverse

the sorting order of the sort column as specified by 'best'

sql

whether or not to use SQLite to perform the operation.

Value

A data frame of 'best' records from each factor level

Author(s)

David Schruth

See Also

groupBy

Examples

1
2
3
4
5
6
7
8
9
blast.results <- data.frame(score=c(1,2,34,4,5,3,23), 
                            query=c('z','x','y','z','x','y','z'), 
                            target=c('a','b','c','d','e','f','g')
                            )
best.hits.R <- bestBy(blast.results, by='query', best='score', inverse=TRUE)
best.hits.R
## or using SQLite
best.hits.sql <- bestBy(blast.results, by='query', best='score', inverse=TRUE, sql=TRUE)
best.hits.sql

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.