compare_model_predictions: Compare predictions from different models
In mLLMCelltype: Cell Type Annotation Using Large Language Models

View source: R/compare_model_predictions.R

compare_model_predictions

R Documentation

Compare predictions from different models

Description

This function runs the same input through multiple models and compares their predictions. It provides both individual predictions and a consensus analysis.

Usage

compare_model_predictions(
  input,
  tissue_name,
  models = c("claude-opus-4-7", "gpt-5.5", "gemini-3.1-pro-preview", "deepseek-v4-flash",
    "qwen3.6-plus", "grok-4.3"),
  api_keys,
  top_gene_count = 10,
  consensus_threshold = 0.5,
  base_urls = NULL
)

Arguments

`input`	Either a data frame from Seurat's FindAllMarkers() containing columns 'cluster', 'gene', and 'avg_log2FC', or a list with 'genes' field for each cluster
`tissue_name`	Tissue context (e.g., 'human PBMC', 'mouse brain') for more accurate annotations
`models`	Vector of model names to use for comparison. Default includes top models from each provider
`api_keys`	Named list of API keys for the models, with provider or model names as keys. Every model in `models` must resolve to a non-NULL API key.
`top_gene_count`	Number of top genes to use per cluster when input is from Seurat. Default: 10
`consensus_threshold`	Minimum agreement threshold for consensus (0-1). Default: 0.5. Consensus is only evaluated when at least two non-missing model predictions are available for a cluster.
`base_urls`	Optional base URLs for API endpoints. Can be a string or named list for provider-specific custom endpoints.

Value

List containing individual model predictions and consensus analysis If a cluster has fewer than two valid predictions after alignment/padding, its consensus-related outputs are NA.

Note

This function uses create_standardization_prompt from prompt_templates.R Supported models:

OpenAI: 'gpt-5.5', 'gpt-5.4', 'gpt-5.4-mini'
Anthropic: 'claude-opus-4-7', 'claude-opus-4-6', 'claude-sonnet-4-6', 'claude-haiku-4-5-20251001'
DeepSeek: 'deepseek-v4-flash', 'deepseek-v4-pro'
Google: 'gemini-3.1-pro-preview', 'gemini-3-flash-preview', 'gemini-3.1-flash-lite'
Alibaba: 'qwen3.6-max-preview', 'qwen3.6-plus', 'qwen3.6-flash'
Stepfun: 'step-3.5-flash', 'step-3.5-flash-2603', 'step-3'
Zhipu/Z.AI: 'glm-5.1', 'glm-5-turbo', 'glm-5'
MiniMax: 'MiniMax-M2.7', 'MiniMax-M2.7-highspeed', 'MiniMax-M2.5'
X.AI: 'grok-4.3', 'grok-4.3-latest', 'grok-latest'
OpenRouter: Provides access to models from multiple providers through a single API. Format: 'provider/model-name'
- OpenAI models: 'openai/gpt-5.5', 'openai/gpt-5.4-mini'
- Anthropic models: 'anthropic/claude-opus-4.7', 'anthropic/claude-sonnet-4.6'
- Google models: 'google/gemini-3.1-pro-preview', 'google/gemini-3-flash-preview'
- X.AI models: 'x-ai/grok-4.3'
- Stepfun models: 'stepfun/step-3.5-flash'

With provider names as keys: list("openai" = "sk-...", "anthropic" = "sk-ant-...", "openrouter" = "sk-or-...")
With model names as keys: list("gpt-5.5" = "sk-...", "claude-sonnet-4-6" = "sk-ant-...")

The system first tries to find the API key using the provider name. If not found, it then tries using the model name. Example:

api_keys <- list(
  "openai" = Sys.getenv("OPENAI_API_KEY"),
  "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"),
  "openrouter" = Sys.getenv("OPENROUTER_API_KEY"),
  "claude-opus-4-7" = "your-claude-opus-key"
)

Examples

## Not run: 
# Compare predictions using different models
api_keys <- list(
  "claude-sonnet-4-6" = "your-anthropic-key",
  "deepseek-v4-pro" = "your-deepseek-key",
  "gemini-3.1-pro-preview" = "your-gemini-key",
  "qwen3.6-plus" = "your-qwen-key"
)

results <- compare_model_predictions(
  input = list(gs1=c('CD4','CD3D'), gs2='CD14'),
  tissue_name = 'PBMC',
  api_keys = api_keys
)

## End(Not run)

mLLMCelltype documentation built on May 11, 2026, 9:06 a.m.