knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE # Set to FALSE since API calls require credentials )
rsynthbio
is an R package that provides a convenient interface to the Synthesize Bio API, allowing users to generate realistic gene expression data based on specified biological conditions. This package enables researchers to easily access AI-generated transcriptomic data for various modalities including bulk RNA-seq, single-cell RNA-seq, microarray data, and more.
You can install rsynthbio
from CRAN:
install.packages("rsynthbio")
If you want the development version, you can install using the remotes
package to install from GitHub:
if (!("remotes" %in% installed.packages())) { install.packages("remotes") } remotes::install_github("synthesizebio/rsynthbio")
Once installed, load the package:
library(rsynthbio)
Before using the Synthesize Bio API, you need to set up your API token. The package provides a secure way to handle authentication:
# Securely prompt for and store your API token # The token will not be visible in the console set_synthesize_token() # You can also store the token in your system keyring for persistence # across R sessions (requires the 'keyring' package) set_synthesize_token(use_keyring = TRUE)
Loading your API key for a session.
# In future sessions, load the stored token load_synthesize_token_from_keyring() # Check if a token is already set has_synthesize_token()
You can obtain an API token by registering at Synthesize Bio.
For security reasons, remember to clear your token when you're done:
# Clear token from current session clear_synthesize_token() # Clear token from both session and keyring clear_synthesize_token(remove_from_keyring = TRUE)
Never hard-code your token in scripts that will be shared or committed to version control.
Some Synthesize models support generation of different gene expression data types.
In the v2 model, you should use "bulk" for bulk gene expression.
# Check available modalities get_valid_modalities()
The first step to generating AI-generated gene expression data is to create a query. The package provides a sample query that you can modify:
# Get a sample query query <- get_valid_query() # Inspect the query structure str(query)
The query consists of:
output_modality
: The type of gene expression data to generate (see get_valid_modalities
)mode
: The prediction mode (e.g., "mean estimation" or "sample generation")inputs
: A list of biological conditions to generate data forWe train our models with diverse multi-omics datasets. There are two model types/modes available today:
Sample generation: This runs in "diffusion" mode and generates different results for each sample requested. Use this mode to understand the distribution of expression across sample groups.
Mean estimation: This is deterministic. For a given metadata specification, you will get the same values.
# Request raw counts data result <- predict_query(query)
This result will be a list of two dataframes: metadata
and expression
You can customize the query to fit your specific research needs:
# Change output modality query$output_modality <- "single_cell_rna-seq" # Adjust number of samples query$inputs[[1]]$num_samples <- 10 # Modify cell line information query$inputs[[1]]$metadata$cell_line <- "MCF7" query$inputs[[1]]$metadata$perturbation <- "TP53" # Add a new condition query$inputs[[3]] <- list( metadata = list( tissue = "lung", disease = "adenocarcinoma", sex = "male", age = "57 years", sample_type = "primary tissue" ), num_samples = 3 )
Once your query is ready, you can send it to the API to generate gene expression data.
# Request raw counts data result <- predict_query(query, as_counts = TRUE)
If you want the full API response beyond just than just the result of the metadata and expression returned put raw_response = TRUE
.
# Access metadata and expression matrices metadata <- result$metadata expression <- result$expression # Check dimensions dim(expression) # View metadata sample head(metadata)
You may want to process the data in chunks or save it for later use:
# Save results to RDS file saveRDS(result, "synthesize_results.rds") # Load previously saved results result <- readRDS("synthesize_results.rds") # Export as CSV write.csv(result$expression, "expression_matrix.csv") write.csv(result$metadata, "sample_metadata.csv")
You can validate your queries before sending them to the API:
# Validate structure validate_query(query) # Validate modality validate_modality(query)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.