This file provides guidance to AI assistants working with this Framework project. Edit the sections without regeneration markers freely - they won't be overwritten.
This project uses Framework for reproducible data analysis. Every notebook and script
MUST begin with scaffold() which initializes the environment.
When you call scaffold(), it automatically:
.env (database credentials, API keys)auto_attach: true (see Packages section below)functions/ directory - they are globally availableDO NOT call library() for packages listed in the auto-attach section below.
They are already loaded by scaffold(). Calling library() again wastes time and clutters output.
DO NOT use source() to load functions from the functions/ directory.
They are auto-loaded by scaffold(). Just call them directly.
These packages are loaded automatically by scaffold(). NEVER use library() for them:
Configure packages in settings.yml and run ai_regenerate() to update this section.
These are installed but not auto-loaded. Use library() only when needed.
ALWAYS use Framework's package management:
# Add a CRAN package (will be installed on next scaffold)
package_add("janitor")
# Add and auto-attach
package_add("forcats", auto_attach = TRUE)
# Add from GitHub
package_add("tidyverse/dplyr@main")
DO NOT use install.packages() directly - it bypasses Framework's tracking.
CRITICAL: All data operations MUST go through Framework functions. This ensures integrity tracking, encryption support, and reproducibility.
ALWAYS use data_read():
# From data catalog (preferred)
survey <- data_read("inputs.raw.survey")
# Direct path
customers <- data_read("inputs/raw/customers.csv")
NEVER use these functions:
- ❌ read.csv() - no tracking, no encryption support
- ❌ read_csv() - no tracking, no encryption support
- ❌ readRDS() - no tracking, no encryption support
- ❌ read_excel() - no tracking, no encryption support
If you see code using these functions, replace it with data_read().
ALWAYS use data_save():
# Save to intermediate (tracked, integrity-checked)
data_save(cleaned_df, "inputs/intermediate/cleaned.csv")
# Save to final (locked, prevents accidental overwrites)
data_save(final_df, "inputs/final/analysis_ready.csv", locked = TRUE)
NEVER use these functions:
- ❌ write.csv() - no tracking
- ❌ write_csv() - no tracking
- ❌ saveRDS() - no tracking
| Purpose | Directory | Example |
|---------|-----------|---------|
| Raw data (immutable) | inputs/raw/ | Source files, never modify |
| Cleaned data | inputs/intermediate/ | After cleaning, before analysis |
| Analysis-ready | inputs/final/ | Curated datasets for analysis |
| Output tables | outputs/tables/ | CSV/Excel exports |
| Output figures | outputs/figures/ | Saved plots |
Read data from catalog or file path. Supports CSV, RDS, Excel, Stata, SPSS, SAS.
df <- data_read("inputs.raw.survey") # From catalog
df <- data_read("inputs/raw/file.csv") # Direct path
Save data with integrity tracking.
data_save(df, "inputs/intermediate/cleaned.csv")
data_save(df, "inputs/final/analysis_ready.csv", locked = TRUE)
Compute once, cache result. Use for expensive operations.
model <- cache_fetch("my_model", {
# This only runs if cache doesn't exist or is expired
train_expensive_model(data)
})
Manual cache read/write.
cache("processed_data", large_dataframe) # Write
df <- cache_get("processed_data") # Read (NULL if missing)
Save analysis results with metadata.
result_save("regression_model", model, type = "model")
result_save("summary_stats", stats_df, type = "table")
Quick export to outputs/tables/.
save_table(summary_df, "quarterly_summary")
save_table(report_df, "annual_report", format = "xlsx")
Execute SQL and return results.
users <- query_get("SELECT * FROM users WHERE active = 1", "main_db")
Create new files from templates.
make_notebook("01-data-cleaning") # Creates notebooks/01-data-cleaning.qmd
make_script("data-processing") # Creates scripts/data-processing.R
data_read()data_save()result_save() or save_table()# Cache model fitting (only re-runs if cache expired)
model <- cache_fetch("fitted_model", {
fit_complex_model(training_data)
}, expire_days = 7)
inputs/raw/Add your project-specific notes, conventions, and documentation here.
This section is never modified by ai_regenerate().
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.