Store your data

has_tokens <- nzchar(Sys.getenv("GITHUB_PAT")) && nzchar(Sys.getenv("GITLAB_PAT_PUBLIC"))
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4,
  eval = has_tokens
)

Set connections to hosts.

Example workflow makes use of public GitHub and GitLab, but it is plausible, that you will use your internal git platforms, where you need to define host parameter. See vignette("set_hosts") article on that.

library(GitStats)

git_stats <- create_gitstats() |>
  set_github_host(
    orgs = "r-world-devs",
    token = Sys.getenv("GITHUB_PAT")
  ) |>
  set_gitlab_host(
    orgs = c("mbtests"),
    token = Sys.getenv("GITLAB_PAT_PUBLIC")
  )

Optionally speed up processing.

git_stats |>
  set_parallel(10L)

As scanning scope was set to organizations (orgs parameter in set_*_host()), GitStats will pull all repositories from these organizations.

repos <- get_repos(git_stats, progress = FALSE)
dplyr::glimpse(repos)

You can always go for the lighter version of get_repos, i.e. get_repos_urls() which will print you a vector of URLs instead of whole table.

repos_urls <- get_repos_urls(git_stats)
dplyr::glimpse(repos_urls)

Local Storage

After pulling, the data is saved by default to GitStats.

commits <- git_stats |>
  get_commits(
    since = "2025-06-01",
    until = "2025-06-14",
    progress = FALSE
  )
git_stats
dplyr::glimpse(commits)

SQLite Storage

For local saving we recommend though using SQLite storage. You can set it up with set_sqlite_storage() function. Then, all data pulled with get_*() functions will be stored in the SQLite database and retrieved from there when you run the function again.

commits <- git_stats |>
  set_sqlite_storage("my_local_db") |>
  get_commits(
    since = "2025-06-01",
    until = "2025-06-14",
    progress = FALSE
  )
dplyr::glimpse(commits)
git_stats

Therefore, it is now not be dependent on the GitStats object, but on the local database, so you can even create a new GitStats and connect it to the same database and data will be there.

new_git_stats <- create_gitstats() |>
  set_github_host(
    orgs = "r-world-devs",
    token = Sys.getenv("GITHUB_PAT")
  ) |>
  set_gitlab_host(
    orgs = c("mbtests"),
    token = Sys.getenv("GITLAB_PAT_PUBLIC")
  ) |>
  set_sqlite_storage("my_local_db")

commits <- new_git_stats |>
  get_commits(
    since = "2025-06-01",
    until = "2025-06-14",
    verbose = TRUE
  )
dplyr::glimpse(commits)

Caching feature is by default turned on. You may switch it off:

commits <- new_git_stats |>
  get_commits(
    since = "2025-06-01",
    until = "2025-06-14",
    verbose = TRUE,
    cache = FALSE,
    progress = FALSE
  )
dplyr::glimpse(commits)

Incremental pulling

When you pull data with get_*() functions, it is stored in the local database. If you run the same function again, it will check if there is already data for the same parameters and pull only the missing data. This way, you can keep your database up to date without pulling all data again.

commits <- new_git_stats |>
  get_commits(
    since = "2025-06-01",
    until = "2025-06-30",
    verbose = TRUE,
    progress = FALSE
  )
dplyr::glimpse(commits)

Remove Storage

Remove storage if you wish.

new_git_stats |>
  remove_sqlite_storage()

Postgres Storage

For more permanent storage, you can set up a connection to your database with set_postgres_storage() function. Then, all data pulled with get_*() functions will be stored in the database and retrieved from there when you run the function again.



Try the GitStats package in your browser

Any scripts or data that you put into this service are public.

GitStats documentation built on April 23, 2026, 9:10 a.m.