R/filter_by_keyword.R

Defines functions filter_by_keyword_ filter_by_keyword

Documented in filter_by_keyword

#' Filter genes by keyword
#'
#' Filter the HGNC data set by a keyword (or a regex) to be looked up in the
#' columns containing gene names or symbols. By default, it will look up in
#' `symbol`, `name`, `alias_symbol`, `alias_name`, `prev_symbol` and
#' `prev_name`. Note that this function dives into list-columns for matching and
#' returns a gene entry if at least one of the strings matches the `keyword`.
#'
#' @param tbl A tibble containing the HGNC data set, typically obtained with
#'   `import_hgnc_dataset()`.
#' @param keyword A keyword or a regular expression to be used as search
#'   criterion.
#' @param cols Columns to be looked up.
#'
#' @return A [tibble][tibble::tibble-package] of the HGNC data set filtered by
#'   observations matching the `keyword`.
#'
#' @examples
#' \dontrun{
#' # Start by retrieving the HGNC data set
#' hgnc_tbl <- import_hgnc_dataset()
#'
#' # Search for entries containing "TP53" in the HGNC data set
#' hgnc_tbl |>
#'   filter_by_keyword('TP53') |>
#'   dplyr::select(1:4)
#'
#' # The same as above but restrict the search to the `symbol` column
#' hgnc_tbl |>
#'   filter_by_keyword('TP53', cols = 'symbol') |>
#'   dplyr::select(1:4)
#'
#' # Match "TP53" exactly in the `symbol` column
#' hgnc_tbl |>
#'   filter_by_keyword('^TP53$', cols = 'symbol') |>
#'   dplyr::select(1:4)
#'
#' # `filter_by_keyword()` is vectorised over `keyword`
#' hgnc_tbl |>
#'   filter_by_keyword(c('^TP53$', '^PIK3CA$'), cols = 'symbol') |>
#'   dplyr::select(1:4)
#' }
#'
#' @export
filter_by_keyword <-
  function(tbl,
           keyword,
           cols = c('symbol',
                    'name',
                    'alias_symbol',
                    'alias_name',
                    'prev_symbol',
                    'prev_name')) {

  purrr::map_dfr(keyword, .f = filter_by_keyword_, tbl = tbl, cols = cols)
  }


filter_by_keyword_ <-
  function(tbl,
           keyword,
           cols = c('symbol',
                    'name',
                    'alias_symbol',
                    'alias_name',
                    'prev_symbol',
                    'prev_name')) {

    dplyr::filter(tbl,
                  dplyr::if_any(.cols = cols,
                                str_detect2, pattern = keyword))
  }

Try the hgnc package in your browser

Any scripts or data that you put into this service are public.

hgnc documentation built on June 18, 2025, 9:09 a.m.