Extended Analysis
This tutorial covers advanced features of CASSIA, including uncertainty quantification, annotation boost, and specialized agents.
Optional Step: Uncertainty Quantification
This could be useful to study the uncertainty of the annotation, and potentially improve the accurracy.
Note: This is step could be costy, since multiple iteration will be performed.
output_name="intestine_detailed" # Run multiple iterations iteration_results = CASSIA.runCASSIA_batch_n_times( n=2, marker=unprocessed_markers, output_name=output_name + "_Uncertainty", model="anthropic/claude-sonnet-4.5", provider="openrouter", tissue="large intestine", species="human", max_workers=6, batch_max_workers=3 # Conservative setting for API rate limits ) # Calculate similarity scores similarity_scores = CASSIA.runCASSIA_similarity_score_batch( marker=unprocessed_markers, file_pattern=output_name + "_Uncertainty_*_summary.csv", output_name="intestine_uncertainty", max_workers=6, model="openai/gpt-5.1", provider="openrouter", main_weight=0.5, sub_weight=0.5 )Python
Optional Step: Annotation Boost on Selected Cluster
The monocyte cluster is sometimes annotated as mixed population of immune cell and neuron/glia cells.
Here we use annotation boost agent to test these hypotheses in more detail.
# Run validation plus for the high mitochondrial content cluster CASSIA.runCASSIA_annotationboost( full_result_path = output_name + "_summary.csv", marker = unprocessed_markers, output_name = "monocyte_annotationboost", cluster_name = "monocyte", major_cluster_info = "Human Large Intestine", num_iterations = 5, model = "anthropic/claude-sonnet-4.5", provider = "openrouter", conversations_json_path = output_name + "_conversations.json" # Provides annotation context )Python
Optional Step: Retrieve Augmented Generation (RAG)
This is particularly useful if you have a very specific and detialed annottaion to work with. It can significantly imrpove the granularity and accuracy of the annotation. It automatically extract marker information and genearte a report as additional informatyion for default CASSIA pipeline.
pip install cassia-ragbash
from cassia_rag import run_complete_analysis import os os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key" os.environ["OPENAI_API_KEY"] = "your-openai-key" run_complete_analysis( tissue_type="Liver", # tissue you are analyzing target_species="Tiger", # species you are analyzing reference_species="Human", # either Human or mouse, if other species, then use Human instead of mouse model_choice='claude', # either claude or gpt, highly recommend claude compare=True, # if you want to compare with reference species, for example fetal vs human, then set to True db_path="~/Canonical_Marker (1).csv", # path to the database max_workers=8 )Python
Optional Step: Compare the Subtypes Using Multiple LLMs (Symphony Compare)
This agent can be used after you finish the default CASSIA pipeline, and are still unsure about a celltype. You can use this agent to get a more confident subtype annotation. Here we use the Plasma Cells cluster as examples. To distinguish if it is more like a general plasma cell or other celltypes.
# The markers here are copied from CASSIA's previous results. marker = "IGLL5, IGLV6-57, JCHAIN, FAM92B, IGLC3, IGLC2, IGHV3-7, IGKC, TNFRSF17, IGHG1, AC026369.3, IGHV3-23, IGKV4-1, IGKV1-5, IGHA1, IGLV3-1, IGLV2-11, MYL2, MZB1, IGHG3, IGHV3-74, IGHM, ANKRD36BP2, AMPD1, IGKV3-20, IGHA2, DERL3, AC104699.1, LINC02362, AL391056.1, LILRB4, CCL3, BMP6, UBE2QL1, LINC00309, AL133467.1, GPRC5D, FCRL5, DNAAF1, AP002852.1, AC007569.1, CXorf21, RNU1-85P, U62317.4, TXNDC5, LINC02384, CCR10, BFSP2, APOBEC3A, AC106897.1" # Run Symphony Compare results = CASSIA.symphonyCompare( tissue = "large intestine", celltypes = ["Plasma Cells", "IgA-secreting Plasma Cells", "IgG-secreting Plasma Cells", "IgM-secreting Plasma Cells"], marker_set = marker, species = "human", model_preset = "premium", output_basename = "plasma_cell_subtype", enable_discussion = True ) print(f"Consensus: {results['consensus']} (confidence: {results['confidence']:.1%})")Python
Optional Step: Subclustering
This agent can be used to study subclustered population, such as a T cell population or a Fibroblast cluster. We recommend to apply the default cassia first, and on a target cluster, apply Seurat pipeline to subcluster the cluster and get the findallmarke results to be used here. Here we present the results for the cd8-positive, alpha-beta t cell cluster as example. This cluster is a cd8 population mixed with other celltypes.
CASSIA.runCASSIA_subclusters(marker = subcluster_results, major_cluster_info = "cd8 t cell", output_name = "subclustering_results", model = "anthropic/claude-sonnet-4.5", provider = "openrouter")Python
It is recommended to run the CS score for the subclustering to get a more confident answer.
CASSIA.runCASSIA_n_subcluster( n=5, marker=subcluster_results, major_cluster_info="cd8 t cell", base_output_name="subclustering_results_n", model="anthropic/claude-sonnet-4.5", temperature=0, provider="openrouter", max_workers=5, n_genes=50 ) # Calculate similarity scores CASSIA.runCASSIA_similarity_score_batch( marker = subcluster_results, file_pattern = "subclustering_results_n_*.csv", output_name = "subclustering_uncertainty", max_workers = 6, model = "openai/gpt-5.1", provider = "openrouter", main_weight = 0.5, sub_weight = 0.5 )Python