|

Google AI Releases C2S-Scale 27B Model that Translate Complex Single-Cell Gene Expression Data into ‘cell sentences’ that LLMs can Understand

A group of researchers from Google Research, Google DeepMind, and Yale launched C2S-Scale 27B, a 27-billion-parameter basis mannequin for single-cell evaluation constructed on Gemma-2. The mannequin formalizes single-cell RNA-seq (scRNA-seq) profiles as “cell sentences”—ordered lists of gene symbols—so that a language mannequin can natively parse and purpose over mobile states. Beyond benchmarking positive aspects, the analysis group studies an experimentally validated, context-dependent pathway: CK2 inhibition (silmitasertib/CX-4945) mixed with low-dose interferon amplifies antigen presentation, a mechanism that might make “chilly” tumors extra aware of immunotherapy. The result’s ~50% improve in antigen presentation in vitro underneath the mixed situation.

Understanding the mannequin

C2S-Scale converts a high-dimensional expression vector into textual content by rank-ordering genes and emitting the top-Ok symbols as a gene-name sequence. This illustration aligns single-cell information with customary LLM toolchains and permits duties reminiscent of cell-type prediction, tissue classification, cluster captioning, perturbation prediction, and organic QA to be phrased as textual content prompts and completions.

https://github.com/vandijklab/cell2sentence

Training information, stack, and launch

C2S-Scale-Gemma-2-27B is constructed on Gemma-2 27B (decoder-only Transformer), educated on Google TPU v5, and launched underneath CC-BY-4.0. The coaching corpus aggregates >800 public scRNA-seq datasets spanning >57M cells (human and mouse) with related metadata and textual context; pretraining unifies transcriptomic tokens and organic textual content into a single multimodal corpus.

The key outcome: an interferon-conditional amplifier

The analysis group constructed a dual-context digital display over >4,000 medicine to search out compounds that increase antigen presentation (MHC-I program) solely in immune-context-positive settings—i.e., main affected person samples with low interferon tone—whereas having negligible impact in immune-context-neutral cell-line information. The mannequin predicted a hanging context break up for silmitasertib (CK2 inhibitor): sturdy MHC-I upregulation with low-dose interferon, little to none with out interferon. The analysis group studies in-lab validation in human neuroendocrine fashions unseen in coaching, with the mixture (silmitasertib + low-dose interferon) producing a marked, synergistic improve in antigen presentation (≈50% of their assays).

The amplifier lowers the response threshold to interferon fairly than initiating antigen presentation de novo; flow-cytometry readouts present HLA-A,B,C upregulation solely underneath mixed therapy (together with IFN-β and IFN-γ), throughout two neuroendocrine fashions, with consultant MFI positive aspects (e.g., 13.6% @10 nM and 34.9% @1000 nM silmitasertib in a single mannequin).

Key Takeaways

  • C2S-Scale 27B (Gemma-2) encodes scRNA-seq profiles as textual “cell sentences,” enabling LLM-native single-cell evaluation workflows.
  • In a two-context digital display (>4,000 compounds), the mannequin predicted an interferon-conditional amplifier: CK2 inhibition (silmitasertib) boosts MHC-I antigen-presentation solely with low-dose IFN.
  • Wet-lab exams in human neuroendocrine cell fashions confirmed the prediction, with ~50% antigen-presentation improve for silmitasertib+IFN versus both alone; this stays preclinical/in vitro.
  • Open weights and utilization docs are reside on Hugging Face (vandijklab) with each 27B and 2B Gemma variants for analysis use.

Editorial Comments

C2S-Scale 27B is a technically credible step for LLMs in biology: translating scRNA-seq into “cell sentences” lets a Gemma-2 mannequin run programmatic queries over cell states and perturbations, and in observe it surfaced an interferon-conditional amplifier—silmitasertib (CK2 inhibition)—that will increase MHC-I antigen presentation solely with low-dose IFN, a mechanism the group then validated in vitro. The worth right here isn’t headline rhetoric however the workflow: text-native screening throughout >4k compounds underneath twin immune contexts to suggest a context-dependent pathway that might convert immune-“chilly” tumors towards visibility. That stated, all proof is preclinical and bench-scale; the precise learn is “hypothesis-generating AI” with open weights enabling replication and stress-testing, not a medical declare.


Check out the Technical Paper, Model on HF, GitHub Page and Technical details . Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The publish Google AI Releases C2S-Scale 27B Model that Translate Complex Single-Cell Gene Expression Data into ‘cell sentences’ that LLMs can Understand appeared first on MarkTechPost.

Similar Posts