This file provides guidance to AI assistants working with this Framework project. Edit the sections without regeneration markers freely - they won't be overwritten.
This project uses Framework for reproducible data analysis. Every notebook and script
MUST begin with scaffold() which initializes the environment.
When you call scaffold(), it automatically:
.env (database credentials, API keys)auto_attach: true (see Packages section below)functions/ directory - they are globally availableDO NOT call library() for packages listed in the auto-attach section below.
They are already loaded by scaffold(). Calling library() again wastes time and clutters output.
DO NOT use source() to load functions from the functions/ directory.
They are auto-loaded by scaffold(). Just call them directly.
These packages are loaded automatically by scaffold(). NEVER use library() for them:
Configure packages in settings.yml and run ai_regenerate() to update this section.
These are installed but not auto-loaded. Use library() only when needed.
ALWAYS use Framework's package management:
# Add a CRAN package (will be installed on next scaffold)
package_add("janitor")
# Add and auto-attach
package_add("forcats", auto_attach = TRUE)
# Add from GitHub
package_add("tidyverse/dplyr@main")
DO NOT use install.packages() directly - it bypasses Framework's tracking.
CRITICAL: All data operations MUST go through Framework functions. This ensures integrity tracking, encryption support, and reproducibility.
ALWAYS use data_read():
# From data catalog (preferred)
survey <- data_read("inputs.raw.survey")
# Direct path
customers <- data_read("inputs/private/raw/customers.csv")
NEVER use these functions:
- ❌ read.csv() - no tracking, no encryption support
- ❌ read_csv() - no tracking, no encryption support
- ❌ readRDS() - no tracking, no encryption support
- ❌ read_excel() - no tracking, no encryption support
If you see code using these functions, replace it with data_read().
ALWAYS use data_save():
# Save to intermediate (tracked, integrity-checked)
data_save(cleaned_df, "inputs/private/intermediate/cleaned.csv")
# Save to public final (de-identified only!)
data_save(final_df, "inputs/public/final/analysis_ready.csv", locked = TRUE)
NEVER use these functions:
- ❌ write.csv() - no tracking
- ❌ write_csv() - no tracking
- ❌ saveRDS() - no tracking
| Purpose | Directory | Notes |
|---------|-----------|-------|
| Private raw data | inputs/private/raw/ | PII/PHI, never commit |
| Public raw data | inputs/public/raw/ | De-identified source files |
| Private intermediate | inputs/private/intermediate/ | Cleaned data with PII |
| Public intermediate | inputs/public/intermediate/ | De-identified cleaned data |
| Private final | inputs/private/final/ | Analysis-ready with PII |
| Public final | inputs/public/final/ | Safe to share |
| Private outputs | outputs/private/ | Reports with PII |
| Public outputs | outputs/public/ | Shareable artifacts |
Read data from catalog or file path. Supports CSV, RDS, Excel, Stata, SPSS, SAS.
df <- data_read("inputs.raw.survey") # From catalog
df <- data_read("inputs/raw/file.csv") # Direct path
Save data with integrity tracking.
data_save(df, "inputs/intermediate/cleaned.csv")
data_save(df, "inputs/final/analysis_ready.csv", locked = TRUE)
Compute once, cache result. Use for expensive operations.
model <- cache_fetch("my_model", {
# This only runs if cache doesn't exist or is expired
train_expensive_model(data)
})
Manual cache read/write.
cache("processed_data", large_dataframe) # Write
df <- cache_get("processed_data") # Read (NULL if missing)
Save analysis results with metadata.
result_save("regression_model", model, type = "model")
result_save("summary_stats", stats_df, type = "table")
Quick export to outputs/tables/.
save_table(summary_df, "quarterly_summary")
save_table(report_df, "annual_report", format = "xlsx")
Execute SQL and return results.
users <- query_get("SELECT * FROM users WHERE active = 1", "main_db")
Create new files from templates.
make_notebook("01-data-cleaning") # Creates notebooks/01-data-cleaning.qmd
make_script("data-processing") # Creates scripts/data-processing.R
This is a privacy-sensitive project. Critical rules:
inputs/private/ or outputs/private/ directories - they contain PII/PHIprivate/ subdirectoriespublic/ directoriesdata_save(..., private = TRUE) for sensitive outputsframework check:sensitive before commits to scan for data leaksRaw PII Data -> inputs/private/raw/
|
v (clean, de-identify)
Intermediate -> inputs/private/intermediate/
|
v (aggregate, anonymize)
Public-safe -> inputs/public/final/
Add your project-specific notes, conventions, and documentation here.
This section is never modified by ai_regenerate().
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.