Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture
A crew researchers from China have launched AntAngelMed, a massive open-source medical language mannequin that the crew describes as the biggest and most able to its sort presently out there.
What Is AntAngelMed?
AntAngelMed is a medical-domain language mannequin with 103 billion whole parameters, however it doesn’t activate all of these parameters throughout inference. Instead, it makes use of a Mixture-of-Experts (MoE) structure with a 1/32 activation ratio, which means solely 6.1 billion parameters are lively at any given time when processing a question.
It helps to understand how MoE architectures work. In a normal dense mannequin, each parameter participates in processing each token. In an MoE mannequin, the community is split into many ‘professional’ sub-networks, and a routing mechanism selects solely a small subset of them to deal with every enter. This means that you can have a very massive whole parameter depend — which generally correlates with sturdy data capability — whereas holding the precise compute value of inference proportional to the smaller lively parameter depend.
AntAngelMed inherits this design from Ling-flash-2.0, a base mannequin developed by inclusionAI and guided by what the crew calls Ling Scaling Laws. The particular optimizations layered on high embrace: refined professional granularity, a tuned shared professional ratio, consideration stability mechanisms, sigmoid routing with out auxiliary loss, an MTP (Multi-Token Prediction) layer, QK-Norm, and Partial-RoPE (Rotary Position Embedding utilized to a subset of consideration heads slightly than all of them). According to the analysis crew, these design decisions collectively enable small-activation MoE fashions to ship as much as 7× effectivity in comparison with equally sized dense architectures which implies with solely 6.1B activated parameters, AntAngelMed can match roughly 40B dense mannequin efficiency. Separately, as output size grows throughout inference, the relative velocity benefit may also attain 7× or extra over dense fashions of comparable dimension.

Training Pipeline
AntAngelMed makes use of a three-stage coaching course of designed to layer common language understanding on high of deep medical area adaptation.
The first stage is continuous pre-training on large-scale medical corpora, together with encyclopedias, net textual content, and tutorial publications. This part is constructed on high of the Ling-flash-2.0 checkpoint, giving the mannequin a sturdy common reasoning basis earlier than medical specialization begins.
The second stage is Supervised Fine-Tuning (SFT), the place the mannequin is skilled on a multi-source instruction dataset. This dataset mixes common reasoning duties — math, programming, logic — to protect chain-of-thought capabilities, alongside medical situations akin to physician–affected person Q&A, diagnostic reasoning, and security and ethics instances.
The third stage is Reinforcement Learning utilizing the GRPO (Group Relative Policy Optimization) algorithm, mixed with task-specific reward fashions. GRPO, initially launched within the DeepSeekMath paper, is a variant of PPO that estimates baselines from group scores slightly than a separate critic mannequin, making it computationally lighter. Here, reward indicators are designed to form mannequin habits towards empathy, structured medical responses, security boundaries, and evidence-based reasoning — all with the purpose of decreasing hallucinations on medical questions.
Inference Performance
On H20 {hardware}, AntAngelMed exceeds 200 tokens per second, which the analysis crew studies is roughly 3× sooner than a 36 billion parameter dense mannequin. With YaRN (Yet Another RoPE extensioN) extrapolation, it helps a 128K context size — lengthy sufficient to deal with full medical paperwork, prolonged affected person histories, or multi-turn medical dialogues.
The analysis crew has additionally launched an FP8 quantized model of the mannequin. When this quantization is mixed with EAGLE3 speculative decoding optimization, inference throughput at a concurrency of 32 improves considerably over FP8 alone: 71% on HumanEval, 45% on GSM8K, and 94% on Math-500. These benchmarks measure coding and math reasoning duties — not medical duties straight — however function proxies for the mannequin’s common throughput stability throughout output varieties.
Benchmark Results
On HealthBench, the open-source medical analysis benchmark from OpenAI that makes use of simulated multi-turn medical dialogues to measure real-world medical efficiency, AntAngelMed ranks first amongst all open-source fashions and surpasses a vary of high proprietary fashions as effectively, with a significantly important benefit on the HealthBench-Hard subset.
On MedAIBench, an analysis system maintained by China’s National Artificial Intelligence Medical Industry Pilot Facility, AntAngelMed ranks on the high degree, with significantly sturdy scores in medical data Q&A and medical ethics and security classes.
On MedBench, a benchmark for Chinese healthcare LLMs protecting 36 independently curated datasets and roughly 700,000 samples throughout 5 dimensions — medical data query answering, medical language understanding, medical language era, complicated medical reasoning, and security and ethics — AntAngelMed ranks first general.
Marktechpost’s Visual Explainer
Key Takeaways
- AntAngelMed is a 103B-parameter open-source medical LLM that prompts solely 6.1B parameters at inference time utilizing a 1/32 activation-ratio MoE structure inherited from Ling-flash-2.0.
- It makes use of a three-stage coaching pipeline: continuous pre-training on medical corpora, SFT with blended common and medical instruction knowledge, and GRPO-based reinforcement studying for security and diagnostic reasoning.
- On H20 {hardware}, the mannequin exceeds 200 tokens/s and helps 128K context size by way of YaRN extrapolation — roughly 3× sooner than a comparable 36B dense mannequin.
- AntAngelMed ranks first amongst open-source fashions on OpenAI’s HealthBench, surpasses a number of proprietary fashions, and tops each MedAIBench and MedBench leaderboards.
- The mannequin is accessible on Hugging Face, ModelScope, and GitHub; mannequin weights are Apache 2.0, code is MIT, and an FP8 quantized model can also be launched.
Check out the Model Weights on HF, GitHub Repo and Technical details. Also, be happy to observe us on Twitter and don’t overlook to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to accomplice with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and many others.? Connect with us
The submit Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture appeared first on MarkTechPost.
