Uncertainty Quantification (Optional)
Uncertainty quantification in CASSIA helps assess annotation reliability through multiple analysis iterations and similarity scoring. This process is crucial for:
Identifying robust cell type assignments
Detecting mixed or ambiguous clusters
Quantifying annotation confidence
Understanding prediction variability
Cost Warning : Running multiple iterations with LLM models can incur significant costs. Each iteration makes separate API calls, so the total cost will be approximately n times the cost of a single run.
from CASSIA import runCASSIA_n_times_similarity_score
result = runCASSIA_n_times_similarity_score(
tissue="large intestine",
species="human",
marker_list=["CD38", "CD138", "JCHAIN", "MZB1", "SDC1"],
model="openai/gpt-5.1",
provider="openrouter",
n=5,
reasoning="medium"
)
print(f"Main cell type: {result['general_celltype_llm']}")
print(f"Similarity score: {result['similarity_score']}")
Python
import CASSIA
# Step 1: Run multiple iterations
CASSIA.runCASSIA_batch_n_times(
n=5,
marker=marker_data,
output_name="my_annotation",
model="openai/gpt-5.1",
provider="openrouter",
tissue="large intestine",
species="human",
reasoning="medium"
)
# Step 2: Calculate similarity scores
CASSIA.runCASSIA_similarity_score_batch(
marker=marker_data,
file_pattern="my_annotation_*_summary.csv",
output_name="similarity_results",
model="openai/gpt-5.1",
provider="openrouter",
reasoning="medium"
)
Python
Input Description Format marker_listMarker genes for single cluster List of gene names markerMarker gene data for batch DataFrame or file path tissueTissue type context String (e.g., "brain", "large intestine") speciesSpecies context String (e.g., "human", "mouse") file_patternPattern to match iteration results Glob pattern with * wildcard
Parameter Required Default Description tissueYes - Tissue type for context speciesYes - Species for context marker_listYes - List of marker genes modelYes - LLM model to use providerYes - API provider ("openrouter", "openai", "anthropic") nNo 5 Number of analysis iterations temperatureNo 0.3 LLM temperature (lower = more consistent) max_workersNo 3 Parallel processing workers main_weightNo 0.5 Weight for main cell type in similarity (0-1) sub_weightNo 0.5 Weight for subtype in similarity (0-1) validator_involvementNo "v1" Validator mode ("v0" strict, "v1" moderate) additional_infoNo None Additional context string generate_reportNo True Generate HTML report report_output_pathNo "uq_report.html" Path for HTML report reasoningNo None Reasoning effort level ("low", "medium", "high") - only for GPT-5 models
Parameter Required Default Description nYes - Number of analysis iterations (recommended: 5) markerYes - Marker gene data (DataFrame or path) output_nameYes - Base name for output files modelYes - LLM model to use providerYes - API provider tissueYes - Tissue type speciesYes - Species max_workersNo 4 Overall parallel processing limit batch_max_workersNo 2 Workers per iteration reasoningNo None Reasoning effort level ("low", "medium", "high") - only for GPT-5 models
Parameter Required Default Description markerYes - Marker gene data file_patternYes - Pattern to match iteration results (e.g., "output_*_summary.csv") output_nameYes - Base name for results modelYes - LLM model for scoring providerYes - API provider max_workersNo 4 Number of parallel workers main_weightNo 0.5 Importance of main cell type match (0-1) sub_weightNo 0.5 Importance of subtype match (0-1) generate_reportNo True Generate HTML report report_output_pathNo "uq_batch_report.html" Path for HTML report reasoningNo None Reasoning effort level ("low", "medium", "high") - only for GPT-5 models
File Description {output_name}_{n}_summary.csvResults from each iteration {output_name}_similarity.csvSimilarity scores across iterations uq_report.html / uq_batch_report.htmlHTML visualization report
Key Description general_celltype_llmConsensus main cell type sub_celltype_llmConsensus sub cell type similarity_scoreOverall similarity across iterations (0-1) consensus_typesCell types that appeared most frequently Possible_mixed_celltypes_llmDetected mixed cell type populations original_resultsRaw results from each iteration
Score Range Interpretation Action > 0.9 High consistency Robust annotation 0.75 - 0.9 Moderate consistency Review recommended < 0.75 Low consistency Use Annotation Boost or Subclustering
Review Data : Check marker gene quality and cluster heterogeneity
Try Advanced Agents : Use Annotation Boost Agent or Subclustering
Adjust Parameters : Increase iteration count for more reliable consensus