Annotation Boost (Optional)
Overview
Annotation Boost is an advanced validation tool that enhances annotation confidence through multiple iterations of analysis. It's particularly useful for:
- Validating low-confidence annotations
- Getting detailed insights into specific cell clusters
- Resolving ambiguous cell type assignments
- Generating comprehensive validation reports
Quick Start
CASSIA.runCASSIA_annotationboost( full_result_path = "batch_results_summary.csv", marker = marker_data, cluster_name = "CD4+ T cell", major_cluster_info = "Human PBMC", output_name = "Cluster1_report", model = "anthropic/claude-sonnet-4.5", provider = "openrouter", )Python
Input
- Full results CSV from CASSIA batch analysis (
_summary.csv) - Original marker gene file (Ideally use the raw marker file - do not filter!)
- Cluster context information
- Specific cluster identifier
- (Optional) Conversations JSON file from batch annotation (
_conversations.json)
Parameters
Required Parameters
| Parameter | Description |
|---|---|
full_result_path | Path to the CASSIA results CSV file (_summary.csv) |
marker | Marker gene data (data frame or path). Use the same marker data as the initial analysis (do not filter) |
cluster_name | Exact name of the target cluster to validate |
major_cluster_info | Context about the dataset (e.g., "Human PBMC", "Mouse Brain") |
output_name | Base name for the output validation report |
Optional Parameters
| Parameter | Default | Description |
|---|---|---|
num_iterations | 5 | Number of validation rounds |
model | - | LLM model to use. Recommended: anthropic/claude-sonnet-4.5 or better |
provider | - | API provider for the model |
conversations_json_path | "auto" | Path to the conversations JSON file, or "auto" to auto-detect from full_result_path (e.g., batch_summary.csv → batch_conversations.json) |
conversation_history_mode | "full" | How to use prior conversation history: "full", "final", or "none" |
search_strategy | "breadth" | Strategy for exploring hypotheses: "breadth" or "depth" |
report_style | "per_iteration" | Format of the final report: "per_iteration" or "total_summary" |
reasoning | - | Reasoning effort level: "low", "medium", "high". Only supported by OpenAI GPT-5 series models |
Output
The analysis generates the following output files:
{output_name}_summary.html: HTML report with detailed analysis results and visualizations.{output_name}_raw_conversation.txt: Raw conversation text containing the full analysis dialogue.