IBM Releases Granite 4.1, Open-Source AI Models Trained on 15T Tokens
ARMONK, N.Y. — IBM on Tuesday released Granite 4.1, a family of three open-source large language models that the company says can match or exceed the performance of its previous-generation models while using significantly simpler and smaller architectures.
The release, detailed in a technical blog post on Hugging Face, includes models at 3 billion, 8 billion and 30 billion parameters — all dense decoder-only transformers licensed under Apache 2.0. The models were trained on 15 trillion tokens across a multi-phase pipeline using NVIDIA GB200 NVL72 GPU clusters, according to IBM’s technical disclosure.
IBM said the 8 billion-parameter Granite 4.1 model matches or outperforms the company’s previous Granite 4.0-H-Small, a 32 billion-parameter mixture-of-experts model, across multiple benchmarks including instruction following, math reasoning and tool calling, according to the Hugging Face post.
Architecture and Training
The Granite 4.1 models use grouped query attention, rotary position embeddings and SwiGLU activations — standard components in modern transformer architectures, according to the technical breakdown. IBM moved away from the mixture-of-experts approach used in prior Granite releases in favor of dense models, which the company said simplifies deployment.
Training proceeded through five phases of pre-training, starting with 10 trillion tokens of general web, code and technical data, then progressively shifting toward higher-quality and more specialized content. The final phases included chain-of-thought reasoning data and a long-context extension stage that pushed the models’ context windows to 128,000 tokens for the 3B model and 512,000 tokens for the 8B and 30B variants, according to the post.
Post-training included supervised fine-tuning on approximately 4.1 million curated samples and a four-stage reinforcement learning pipeline using on-policy Group Relative Policy Optimization with DAPO loss, as described in the technical documentation. The RL stages addressed math, general chat quality, identity calibration and math performance recovery in sequence.
Performance
On standard benchmarks, the instruction-tuned models scored 80.16 on MMLU for the 30B model, 92.49 on GSM8K math reasoning for the 8B model, and 89.63 on HumanEval code generation for the 30B model, according to IBM’s reported results. Tool-calling performance on the Berkeley Function Calling Leaderboard v3 ranged from 60.80 for the 3B model to 73.68 for the 30B model.
The models support 12 languages including English, German, Spanish, French, Japanese, Portuguese, Arabic and Chinese, according to the documentation.
Enterprise Implications
IBM’s Granite models serve as the foundation for the company’s watsonx enterprise AI platform. The Apache 2.0 license permits commercial use without restrictions, positioning Granite 4.1 against other permissively licensed open models from Meta, Mistral and others.
FP8 quantized versions are also available, reducing disk footprint and GPU memory requirements by approximately 50 percent while preserving model quality, according to IBM. The models are optimized for the vLLM inference framework.
The release adds to a growing field of open-source models. Meta’s Llama family, Mistral’s models and Alibaba’s Qwen series have all seen recent updates, with Granite 4.1 among the options for organizations seeking deployable open models with permissive licensing.
All three Granite 4.1 models are available immediately on the Hugging Face Model Hub, with additional documentation on IBM’s Granite documentation site and source code on GitHub.