knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", warning = FALSE, message = FALSE )
The goal of funspotr (R function spotter) is to make it easy to identify which R functions and packages are used in files and projects. It was initially written to create reference tables of the functions and packages used in a few popular github repositories[^prior-posts].
There are roughly three types of functions in funspotr:
list_files_*()
: that identify files in a repository or related locationspot_*()
: that identify functions or packages in files[^prior-posts]: Prior posts (some of which used a now deprecated API):
- Identifying R Functions & Packages Used in GitHub Repos (funspotr part 1)
- Identifying R Functions & Packages in Github Gists (funspotr part 2)
- Network Plots of Code Collections (funspotr part 3)
funspotr is set-up for parsing R, Rmarkdown or Quarto files. If you want to parse a Jupyter notebook you should first convert it to an appropriate file type. If you pass in a file type that is not recognized (e.g. a .txt file) funspotr will attempt to parse it as if it is a .R script.
funspotr is primarily designed for identifying the functions / packages in self-contained files or collections of self-contained files (e.g. a blogdown project^[Rather than, for example, targets workflows. Also, in some cases funspotr may not identify every function and/or package in a file (see [Limitations, problems, musings] or read the source code for details).]). Though see [Package dependencies in another file] for examples of using it in other contexts.
Install the latest stable version of funspotr from CRAN with:
install.packages("funspotr")
You can install the development version of funspotr from GitHub with:
# install.packages("devtools") devtools::install_github("brshallo/funspotr")
funspotr can be used to create reference tables of the functions and packages used in R projects.
The primary function in funspotr is spot_funs()
which returns a dataframe showing the functions and associated packages used in a file.
library(funspotr) file_lines <- " library(dplyr) require(tidyr) as_tibble(mpg) %>% mutate(class = as.character(class)) %>% group_by(class) %>% nest() %>% mutate(stats = purrr::map(data, ~lm(cty ~ hwy, data = .x))) made_up_fun() " file_output <- tempfile(fileext = ".R") writeLines(file_lines, file_output) spot_funs(file_path = file_output)
funs
: functions in filepkgs
: best guess as to the package the functions came from spot_funs()
reference page for details.] funspotr has a few list_files_*()
functions that return a dataframe of relative_paths
and absolute_paths
of all the R, Rmarkdown, or quarto files in a specified location (currently: github repo, gists, or local). These can be combined with a variant of spot_funs()
that maps the function across each file path found, spot_funs_files()
:
library(dplyr) # repo for an old presentation I gave gh_ex <- list_files_github_repo( repo = "brshallo/feat-eng-lags-presentation", branch = "main") %>% spot_funs_files() gh_ex
relative_paths
: relative filepathabsolute_paths
: absolute filepath (in this case URL to raw file on github)spotted
: purrr::safely()
style list-column of results^[list-column output where each item is a list containing result
and error
.] from mapping spot_funs()
across absolute_paths
. These results may then be unnested with the helper funspotr::unnest_results()
to provide a table of functions and packages by filepath. This can be manipulated like any other dataframe -- say we want to filter to only those files where here, readr or rsample packages are used.
gh_ex %>% unnest_results() %>% filter(pkgs %in% c("here", "readr", "rsample"))
The outputs from funspotr::unnest_results()
can also be passed into funspotr::network_plot()
to build a network visualization of the connections between functions/packages and files^[Took inspiration from plot()
method in cranly.].
You might only want to parse certain file types or a subset of the files in a repo.
preview_files <- list_files_github_repo( repo = "brshallo/feat-eng-lags-presentation", branch = "main") preview_files
Say we only want to parse the "types-of-splits.R" and "Rmd-to-R.R" files.
preview_files %>% filter(stringr::str_detect(relative_paths, "types-of-splits|Rmd-to-R")) %>% spot_funs_files() %>% unnest_results()
Note that if you have a lot of files in a repo you may need to set-up sleep periods or clone the repo locally and then parse the files from there so as to stay within the limits of github API hits.
Functions created in the file as well as functions from unavailable packages (or packages that don't exist) will output as pkgs = "(unknown)"
.
file_lines_missing_pkgs <- " library(dplyr) as_tibble(mpg) hello_world <- function() print('hello world') madeuppkg::made_up_fun() hello_world() " missing_pkgs_ex <- tempfile(fileext = ".R") writeLines(file_lines_missing_pkgs, missing_pkgs_ex) spot_funs(file_path = missing_pkgs_ex)
To spot which package a function is from you must have the package installed locally. Hence for files on others' github repos or that you created on a different machine, it is a good idea to start with funspotr::check_pkgs_availability()
to see which packages you are missing and install the missing packages locally. If you don't want to edit your global library you may want to use renv or other environment management tools.
funspotr has an internal helper funspotr::install_missing_pkgs()
for installing missing packages:
spot_pkgs(file_output) %>% check_pkgs_availability() %>% funspotr::install_missing_pkgs()
Alternatively, you may want to clone the repository locally and then use renv::dependencies()
and only then start using funspotr^[renv is a more robust approach to finding and installing dependencies -- particularly in cases where you are missing many dependencies or don't want to alter the packages in your global library.].
spot_funs()
is currently set-up for self-contained files. But spot_funs_custom()
allows the user to explicitly specify pkgs
where functions may come from. This is useful in cases where the packages loaded are not in the same location as the file_path
(e.g. they are loaded via source()
, or a DESCRIPTION file, or some other workflow). For example, below is a made-up example where the library()
calls are made in a separate file and source()
'd in.
# file where packages are loaded file_libs <- "library(dplyr) library(lubridate)" file_libs_output <- tempfile(fileext = ".R") writeLines(file_libs, file_libs_output) # File of interest where things happen file_run <- glue::glue( "source('{ file_libs_output }') tibble::tibble(days_from_today = 0:10) %>% mutate(date = today() + days(days_from_today)) ", file_libs_output = stringr::str_replace_all(file_libs_output, "\\\\", "/") ) file_run_output <- tempfile(fileext = ".R") writeLines(file_run, file_run_output) # Identify packages using both files and then pass in explicitly to `spot_funs_custom()` pkgs <- c(spot_pkgs(file_libs_output), spot_pkgs(file_run_output, show_explicit_funs = TRUE)) spot_funs_custom( pkgs = pkgs, file_path = file_run_output)
Also see funspotr::spot_pkgs_from_description()
.
Passing in show_each_use = TRUE
to ...
in spot_funs()
or spot_funs_files()
will return all instances of a function call rather than just once for each file.
Compared to the initial example, mutate()
now shows-up at both rows 4 and 8:
spot_funs(file_path = file_output, show_each_use = TRUE)
To automatically have your packages used as the tags for a blog post you can add an inline function funspotr::spot_tags()
to a bullet in the tags
or categories
argument of your YAML header. For example:
cat(htmltools::includeText("inst/ex-rmd-tags.Rmd"))
spot_funs()
worksfunspotr mimics the search space of each file prior to identifying pkgs
/funs
. At a high-level...
pkg::fun()
) are loaded individually via import and are loaded last (putting them at the top of the search space)^[This heuristic is imperfect and means that a file with "library(dplyr); select(); MASS::select()" would view both select()
calls as coming from {MASS} -- when what it should do is view the first was as coming from {dplyr} and the second from {MASS}.].(steps 1 and 2 needed so that step 4 has the best chance of identifying the package a function comes from in the file.)
utils::getParseData()
and filter to just functionsutils::find()
to identify associated packageExplainer slide from Rstudio Conf 2022 presentation:
np.*
), all of the stuff with recreating the search space would likely be unnecessary and you could more easily just identify packages/functions by parsing the text.].mean
in this example: lapply(x, mean)
. Similarly it will not identify functions within switch()
. See #13.knitr::read_chunk()
and knitr::purl()
in a file passed to funspotr will also frequently cause an error in parsing. See knitr#1753 & knitr#1938spot_funs()
is primarily for cases where package dependencies are loaded in the same file that they are used in^[i.e. in interactive R scripts or Rmd or qmd documents where you use library()
or related calls within the script.]. Scripts that are not self-contained typically should have the pkgs
argument provided explicitly via spot_funs_custom()
.aes()
function from ggplot.].spot_funs()
through other folder structures not yet mentioned.renv::dependencies()
or a parsing based approach. The simple regex's I use have a variety of problems^[e.g. in this case lines <- "library(pkg)"
the pkg
would show-up as a dependency despite just being part of a quote rather than actually loaded. See #14 for disucssion of other approaches.]. R CMD check
does function parsing. funspotr's current approach is comparatively slow and uses imperfect heuristics.+
list_files*()
)list_files_github_repo()
it may make sense to instead clone the repo locally and then run list_files_wd()
from the repo prior to running spot_funs_files()
as this will limit the number of API hits to github.Sys.sleep()
, ...).Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.