Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.
Ever since the groundbreaking research paper “Attention is All You Need” debuted in 2017, the concept of transformers has dominated the generative AI landscape.
Transformers however are not the only path forward with generative AI. A new approach from AI21 Labs dubbed “Jamba” looks to go beyond transformers. Jamba combines the Mamba model, which is based on the Structured State Space model (SSM), along with a transformer architecture to create an optimized gen AI model. Jamba is an acronym that stands for Joint Attention and Mamba (Jamba) architecture, and it aims to bring the best attributes of SSM and transformers together. Jamba is being released as an open-source model under the Apache 2.0 license.
To be clear, it’s not likely that Jamba will replace current transformer-based large language models (LLM) today, but it will likely be a supplement in certain areas. According to AI21 Labs Jamba can outperform traditional transformer-based models on generative reasoning tasks as measured by benchmarks such as HellaSwag. However, it currently does not outperform transformer-based models on other critical benchmarks such as the Massive Multitask Language Understanding (MMLU) for problem-solving.
Jamba isn’t just a new Jurassic take from AI21 Labs
AI21 Labs has a particular focus on gen AI for enterprise use cases. The company raised $155 million in Aug. 2023 to support it’s growing efforts.
The company’s enterprise tools include Wordtune, an optimized service to help enterprises generate content that matches an organization’s tone and brand. A121 Labs told VentureBeat in 2023 that it often competes and directly wins against gen AI giant OpenAI for enterprise business.
To date, AI21 Labs’ LLM technology has relied on the transformer architecture, just like every other LLM. Just over a year ago, the company introduced its Jurassic-2 LLM family, which is part of the AI21 Studio natural language processing (NLP)-as-a-service platform and is also available via APIs for enterprise integrations.
Jamba is not an evolution of Jurassic, it’s something quite different as a hybrid SSM and transformer model.
Attention isn’t all you need, you also need context
Transformers have dominated the gen AI landscape to date, but still have some shortcomings. Most notable is the fact that inference generally slows as context windows grow.
As the AI21 Labs researchers note, a transformer’s attention mechanism scales with sequence length and slows down throughput, as each token depends on the entire sequence that came before it. This places long context use cases outside the scope of efficient production.
The other issue highlighted by AI21 Labs is the large memory footprint requirement for scaling transformers. The transformer memory footprint scales with context length, making it challenging to run long context windows or numerous parallel batches without extensive hardware resources.
The context and memory resources issues are two concerns that the SSM approach looks to solve.
The Mamba SSM architecture was originally proposed by researchers at Carnegie Mellon and Princeton Universities, with less memory requirement and a different attention mechanism to handle large context windows. However, the Mamba approach struggles to provide the same output level as a transformer model. The Jamba hybrid SSM Transformer approach is an attempt to combine the resource and context optimization of the SSM architecture with the strong output capabilities of a transformer.
AI21 Labs’ Jamba model offers a 256K context window and can deliver 3x throughput on long contexts compared to Mixtral 8x7B. AI21 Labs also claims that Jamba is the only model in its size class that fits up to 140K context on a single GPU.
Of note, just like Mixtral, Jamba uses a Mixture of Experts (MoE) model. However, Jamba uses MoE as part of its hybrid SSM Transformer approach, which allows for an extreme level of optimization. Specifically, Jamba’s MoE layers allow it to draw on just 12B of its available 52B parameters at inference, making those 12B active parameters more efficient than a Transformer-only model of equivalent size, according to AI21 Labs.
It’s still early days for Jamba and it’s not yet part of an enterprise offering from AI21 Labs. The company plans to offer an instruct version on the AI21 Platform as a beta soon.