IBM Released new Granite 4.0 Models with a Novel Hybrid Mamba-2/Transformer Architecture: Drastically Reducing Memory Use without Sacrificing Performance

ByRicardo October 2, 2025

IBM simply launched Granite 4.0, an open-source LLM household that swaps monolithic Transformers for a hybrid Mamba-2/Transformer stack to chop serving reminiscence whereas maintaining high quality. Sizes span a 3B dense “Micro,” a 3B hybrid “H-Micro,” a 7B hybrid MoE “H-Tiny” (~1B lively), and a 32B hybrid MoE “H-Small” (~9B lively). The fashions are Apache-2.0, cryptographically signed, and—per IBM—the primary open fashions coated by an accredited ISO/IEC 42001:2023 AI administration system certification. They can be found on watsonx.ai and through Docker Hub, Hugging Face, LM Studio, NVIDIA NIM, Ollama, Replicate, Dell Pro AI Studio/Enterprise Hub, Kaggle, with Azure AI Foundry…

So, what’s new?

Granite 4.0 introduces a hybrid design that interleaves a small fraction of self-attention blocks with a majority of Mamba-2 state-space layers (9:1 ratio). As per IBM technical blog, relative to traditional Transformer LLMs, Granite 4.0-H can scale back RAM by >70% for long-context and multi-session inference, translating into decrease GPU price at a given throughput/latency goal. IBM’s inside comparisons additionally present the smallest Granite 4.0 fashions outperforming Granite 3.3-8B regardless of utilizing fewer parameters.

Tell me what are the launched variants?

IBM is transport each Base and Instruct variants throughout 4 preliminary fashions:

Granite-4.0-H-Small: 32B complete, ~9B lively (hybrid MoE).
Granite-4.0-H-Tiny: 7B complete, ~1B lively (hybrid MoE).
Granite-4.0-H-Micro: 3B (hybrid dense).
Granite-4.0-Micro: 3B (dense Transformer for stacks that don’t but help hybrids).

All are Apache-2.0 and cryptographically signed; IBM states Granite is the primary open mannequin household with accredited ISO/IEC 42001 protection for its AI administration system (AIMS). Reasoning-optimized (“Thinking”) variants are deliberate later in 2025.

How is it skilled, context, and dtype?

Granite 4.0 was skilled on samples as much as 512K tokens and evaluated as much as 128K tokens. Public checkpoints on Hugging Face are BF16 (quantized and GGUF conversions are additionally printed), whereas FP8 is an execution possibility on supported {hardware}—not the format of the launched weights.

Lets perceive it’s efficiency alerts (enterprise-relevant)

IBM highlights instruction following and tool-use benchmarks:

IFEval (HELM): Granite-4.0-H-Small leads most open-weights fashions (trailing solely Llama 4 Maverick at far bigger scale).

https://www.ibm.com/new/bulletins/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models

BFCLv3 (Function Calling): H-Small is aggressive with bigger open/closed fashions at lower cost factors.

MTRAG (multi-turn RAG): Improved reliability on advanced retrieval workflows.

How can I get entry?

Granite 4.0 is reside on IBM watsonx.ai and distributed through Dell Pro AI Studio/Enterprise Hub, Docker Hub, Hugging Face, Kaggle, LM Studio, NVIDIA NIM, Ollama, OPAQUE, Replicate. IBM notes ongoing enablement for vLLM, llama.cpp, NexaML, and MLX for hybrid serving.

My ideas/feedback

I see Granite 4.0’s hybrid Mamba-2/Transformer stack and active-parameter MoE as a sensible path to decrease TCO: >70% reminiscence discount and long-context throughput positive factors translate instantly into smaller GPU fleets without sacrificing instruction-following or tool-use accuracy (IFEval, BFCLv3, MTRAG). The BF16 checkpoints with GGUF conversions simplify native analysis pipelines, and ISO/IEC 42001 plus signed artifacts handle provenance/compliance gaps that usually stall enterprise deployment. Net end result: a lean, auditable base mannequin household (1B–9B lively) that’s simpler to productionize than prior 8B-class Transformers.

Check out the Hugging Face Model Card and Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit IBM Released new Granite 4.0 Models with a Novel Hybrid Mamba-2/Transformer Architecture: Drastically Reducing Memory Use without Sacrificing Performance appeared first on MarkTechPost.

Agentic AI AI Agents

A Coding Guide to Build an Autonomous Agentic AI for Time Series Forecasting with Darts and Hugging Face
ByRicardo October 4, 2025

In this tutorial, we construct an superior agentic AI system that autonomously handles time collection forecasting utilizing the Darts library mixed with a light-weight HuggingFace mannequin for reasoning. We design the agent to function in a notion–reasoning–motion cycle, the place it first analyzes patterns within the knowledge, then selects an acceptable forecasting mannequin, generates predictions,…

Read More A Coding Guide to Build an Autonomous Agentic AI for Time Series Forecasting with Darts and Hugging Face
Agentic AI AI Agents

Top 10 AI Agent and Agentic AI News Blogs (2025 Update)
ByRicardo August 13, 2025

In the rapidly evolving field of agentic AI and AI Agents, staying informed is essential. Here’s a comprehensive, up-to-date list of the Top 10 AI Agent and Agentic AI News Blogs (2025 Update)—from industry leaders to academic voices—offering insights, tutorials, and reviews focused on AI agents and Agentic AI in 2025. 1. OpenAI Blog The…

Read More Top 10 AI Agent and Agentic AI News Blogs (2025 Update)
Agentic AI AI Agents

Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents
ByRicardo October 16, 2025

Agentic methods are stochastic, context-dependent, and policy-bounded. Conventional QA—unit assessments, static prompts, or scalar “LLM-as-a-judge” scores—fails to expose multi-turn vulnerabilities and offers weak audit trails. Developer groups want protocol-accurate conversations, express coverage checks, and machine-readable proof that may gate releases with confidence. Qualifire AI has open-sourced Rogue, a Python framework that evaluates AI brokers over…

Read More Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents
AI Infrastructure Artificial Intelligence

Top 15+ Most Affordable Proxy Providers 2025
ByRicardo July 22, 2025

The global proxy market is experiencing rapid expansion in 2025, with the industry estimated to be valued at $2.5billion and exhibiting a robust growth rate of 18% compound annual growth rate (CAGR) driven by booming demand for residential proxies, real-time data collection for AI, and the rise of cloud-based proxy services. AI-powered use cases are now…

Read More Top 15+ Most Affordable Proxy Providers 2025
Agentic AI AI Agents

How to Build a Fully Offline Multi-Tool Reasoning Agent with Dynamic Planning, Error Recovery, and Intelligent Function Routing
ByRicardo November 21, 2025

In this tutorial, we discover how to construct a totally offline, multi-step reasoning agent that makes use of the Instructor library to generate structured outputs and reliably orchestrate complicated instrument calls. In this implementation, we design an agent able to selecting the best instrument, validating inputs, planning multi-stage workflows, and recovering from errors. We convey…

Read More How to Build a Fully Offline Multi-Tool Reasoning Agent with Dynamic Planning, Error Recovery, and Intelligent Function Routing
Agentic AI AI Agents

BlackRock Introduces AlphaAgents: Advancing Equity Portfolio Construction with Multi-Agent LLM Collaboration
ByRicardo August 19, 2025

The use of artificial intelligence (AI) in financial markets has grown rapidly, with large language models (LLMs) increasingly applied to equity analysis, portfolio management, and stock selection. BlackRock research team proposed AlphaAgents for investment research. The AlphaAgents framework leverages the power of multi-agent systems to improve investment outcomes, reduce cognitive bias, and enhance the decision-making…

Read More BlackRock Introduces AlphaAgents: Advancing Equity Portfolio Construction with Multi-Agent LLM Collaboration

IBM Released new Granite 4.0 Models with a Novel Hybrid Mamba-2/Transformer Architecture: Drastically Reducing Memory Use without Sacrificing Performance

So, what’s new?

Tell me what are the launched variants?

How is it skilled, context, and dtype?

Lets perceive it’s efficiency alerts (enterprise-relevant)

How can I get entry?

My ideas/feedback

A Coding Guide to Build an Autonomous Agentic AI for Time Series Forecasting with Darts and Hugging Face

Top 10 AI Agent and Agentic AI News Blogs (2025 Update)

Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents

Top 15+ Most Affordable Proxy Providers 2025

How to Build a Fully Offline Multi-Tool Reasoning Agent with Dynamic Planning, Error Recovery, and Intelligent Function Routing

BlackRock Introduces AlphaAgents: Advancing Equity Portfolio Construction with Multi-Agent LLM Collaboration

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

So, what’s new?

Tell me what are the launched variants?

How is it skilled, context, and dtype?

Lets perceive it’s efficiency alerts (enterprise-relevant)

How can I get entry?

My ideas/feedback

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!