Uncertainty Quantification (Optional)
Uncertainty quantification in CASSIA helps assess annotation reliability through multiple analysis iterations and similarity scoring. This process is crucial for:
Identifying robust cell type assignments
Detecting mixed or ambiguous clusters
Quantifying annotation confidence
Understanding prediction variability
Cost Warning : Running multiple iterations with LLM models can incur significant costs. Each iteration makes separate API calls, so the total cost will be approximately n times the cost of a single run.
library(CASSIA)
# Step 1: Run multiple iterations
runCASSIA_batch_n_times(
n = 5,
marker = marker_data,
output_name = "my_annotation",
model = "openai/gpt-5.1",
provider = "openrouter",
tissue = "brain",
species = "human",
reasoning = "medium"
)
# Step 2: Calculate similarity scores
runCASSIA_similarity_score_batch(
marker = marker_data,
file_pattern = "my_annotation_*_summary.csv",
output_name = "similarity_results",
model = "openai/gpt-5.1",
provider = "openrouter",
reasoning = "medium"
)
R
Input Description Format markerMarker gene data Data frame or file path tissueTissue type context String (e.g., "brain", "large intestine") speciesSpecies context String (e.g., "human", "mouse") file_patternPattern to match iteration results Glob pattern with * wildcard
Parameter Required Default Description nYes - Number of analysis iterations (recommended: 5) markerYes - Marker gene data (data frame or path) output_nameYes - Base name for output files modelYes - LLM model to use providerYes - API provider tissueYes - Tissue type speciesYes - Species max_workersNo 4 Overall parallel processing limit batch_max_workersNo 2 Workers per iteration (max_workers * batch_max_workers should match your cores) reasoningNo NULL Reasoning effort level ("low", "medium", "high") - only for GPT-5 models
Parameter Required Default Description markerYes - Marker gene data file_patternYes - Pattern to match iteration results (e.g., "output_*_summary.csv") output_nameYes - Base name for results modelYes - LLM model for scoring providerYes - API provider max_workersNo 4 Number of parallel workers main_weightNo 0.5 Importance of main cell type match (0-1) sub_weightNo 0.5 Importance of subtype match (0-1) generate_reportNo TRUE Generate HTML report report_output_pathNo "uq_batch_report.html" Path for HTML report reasoningNo NULL Reasoning effort level ("low", "medium", "high") - only for GPT-5 models
File Description {output_name}_{n}_summary.csvResults from each iteration {output_name}_similarity.csvSimilarity scores across iterations uq_batch_report.htmlHTML visualization report
Score Range Interpretation Action > 0.9 High consistency Robust annotation 0.75 - 0.9 Moderate consistency Review recommended < 0.75 Low consistency Use Annotation Boost or Subclustering
Review Data : Check marker gene quality and cluster heterogeneity
Try Advanced Agents : Use Annotation Boost Agent or Subclustering
Adjust Parameters : Increase iteration count for more reliable consensus