Fast Mode
CASSIA's Fast Mode offers a streamlined, one-line solution for running the complete analysis pipeline. This mode combines annotation, scoring, and annotation boost for correcting low quality annotations in a single function call, using optimized default parameters.
runCASSIA_pipeline(
output_file_name = "my_analysis",
tissue = "brain",
species = "human",
marker = marker_data,
max_workers = 4
)
R
Input Type Description markerData frame or path Marker gene data with cluster and gene columns tissueString Tissue type of the sample (e.g., "brain", "lung") speciesString Species of the sample (e.g., "human", "mouse")
Parameter Type Description output_file_nameString Base name for the output folder and files tissueString The tissue type of the sample speciesString The species of the sample markerData frame/path Marker gene data
Parameter Default Description max_workers6 Number of parallel processes to use overall_provider"openrouter" Main provider for all pipeline stages ("openai", "anthropic", "openrouter") score_threshold75 Annotations below this score (0-100) trigger Annotation Boost additional_infoNULL Optional experimental context (e.g., "treated with drug X") validator_involvement"v1" Validation strictness ("v1" = moderate, "v0" = high) merge_annotationsTRUE If TRUE, merges detailed cell types into broader categories annotation_model"anthropic/claude-sonnet-4.5" Model for initial cell type annotation annotation_provider"openrouter" Provider for annotation model score_model"openai/gpt-5.1" Model for quality scoring score_provider"openrouter" Provider for score model annotationboost_model"anthropic/claude-sonnet-4.5" Model for refining low-confidence annotations annotationboost_provider"openrouter" Provider for annotation boost model merge_model"google/gemini-2.5-flash" Model for the merging step merge_provider"openrouter" Provider for merge model overall_reasoningNULL Reasoning effort for all stages ("low", "medium", "high") annotation_reasoningNULL Override reasoning level for annotation stage only score_reasoningNULL Override reasoning level for scoring stage only annotationboost_reasoningNULL Override reasoning level for annotation boost stage only merge_reasoningNULL Override reasoning level for merging stage only
The pipeline generates a timestamped main folder (CASSIA_Pipeline_{tissue}_{species}_{timestamp}) containing three organized subfolders:
CASSIA_Pipeline_brain_human_20240115_143022/
├── 01_annotation_report/
│ └── {name}_report.html # Interactive HTML report
├── 02_annotation_boost/
│ └── {cluster_name}/ # One folder per low-scoring cluster
│ └── {name}_{cluster}_boosted_report.html
└── 03_csv_files/
├── {name}_summary.csv # Initial annotation results
├── {name}_conversations.json # Full conversation history
├── {name}_scored.csv # Results with quality scores
├── {name}_merged.csv # Merged annotations (if enabled)
└── {name}_FINAL_RESULTS.csv # Combined final results
Folder File Description 01_annotation_report{name}_report.htmlInteractive HTML report with all annotations 02_annotation_boostPer-cluster folders Boost analysis for clusters scoring below threshold 03_csv_files{name}_FINAL_RESULTS.csvMain output - combined results with scores and merged annotations03_csv_files{name}_summary.csvInitial cell type annotations 03_csv_files{name}_scored.csvAnnotations with quality scores 03_csv_files{name}_merged.csvBroader category groupings (if do_merge_annotations = TRUE) 03_csv_files{name}_conversations.jsonFull LLM conversation history for reproducibility
seurat_corrected <- add_cassia_to_seurat(
seurat_obj = seurat_corrected, # The seurat object you want to add the CASSIA results to
cassia_results_path = "/FastAnalysisResults_scored.csv", #where the scored results saved, specify the path
cluster_col = "celltype", # Column in Seurat object with cell types
cassia_cluster_col="True Cell Type" # Column in the scored results with the true cell types
)
# This will add six new columns to the seurat object: the general celltype, all three sub cell types, the most likely celltype, the second likely celltype, the third likely celltype, and mixed celltype, and the quality score of each cell type.
R
For optimal performance, adjust max_workers based on your system's CPU cores
Use additional_info to provide relevant experimental context
Monitor score_threshold to balance stringency with throughput
Next we introduce each function in detail...