Meet “Jais”, The World’s Most Advanced Arabic Large Language Model

  • Publish date: Wednesday، 30 August 2023

Open Sourced by G42’s Inception

Related articles
Gemini Advanced and Gemini App are now Available in Arabic!
AI Dominates COP28 in the UAE
Experience The Power Of OLED With LG At The Dubai Mall

Developed in partnership with MBZUAI, Jais was trained on the

Condor Galaxy 1 AI supercomputer on 116 billion Arabic tokens and

279 billion English tokens of data.

Abu Dhabi, August 30 — Inception, the pioneering G42 company dedicated to pushing the boundaries of AI, announced the open-source release of Jais, the world’s highest quality Arabic Large Language Model. Jais is a 13-billion parameter model trained on a newly developed 395-billion-token Arabic and English dataset.

With a name inspired by UAE’s highest peak, Jais will bring the advantages of generative AI across the Arabic-speaking world. The model is the result of a collaboration between Inception, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) -- the world’s first graduate research university dedicated to AI -- and Cerebras Systems. It was trained on Condor Galaxy, the recently announced multi-exaFLOP AI supercomputer built by G42 and Cerebras.

Jais’ release marks a significant milestone in the realm of AI for the Arabic world. It is a model homegrown in the UAE’s capital, Abu Dhabi, offering more than 400 million Arabic speakers the opportunity to harness the potential of generative AI. It will facilitate and expedite innovation, highlighting Abu Dhabi’s leading position as a hub for AI, innovation, culture preservation, and international collaboration.

By open-sourcing Jais, Inception aims to engage the scientific, academic, and developer communities to accelerate the growth of a vibrant Arabic language AI ecosystem. This can serve as a model for other languages currently underrepresented in mainstream AI.

"We believe that innovation thrives when we collaborate," says Andrew Jackson, CEO of Inception. "With this release, we are setting a new standard for AI advancement in the Middle East and ensuring that the Arabic language, with its depth and heritage, finds its voice within the AI landscape. Jais is a testament to our commitment to excellence and our dedication to democratizing AI and promoting innovation."

Jais outperforms existing Arabic models by a sizable margin. It is also competitive with English models of similar size despite being trained on significantly less English data. This exciting result shows that the model’s English component learned from the Arabic data and vice versa, opening a new era in LLM’s development and training.

MBZUAI President and University Professor Eric Xing said, ”Developing such a high-caliber Arabic LLM demanded cutting-edge AI research in addition to an in-depth and nuanced understanding of the Arabic language, its diversity and heritage, and the growing importance of LLMs across all echelons of society. Thanks to our research and partnerships with Inception and other top regional and global organizations, MBZUAI will continue pioneering LLMs that are efficient, effective, and accurate.”

In tandem with the model release, Inception and MBZUAI also established an academic partnership to give early-release access to current and future Arabic LLMs developed by the team for testing purposes. Academic partners for the launch included Carnegie Mellon University, Ecole Polytechnique, Hamad bin Khalifa University, Sorbonne Paris Nord - LIPN, NYU Abu Dhabi’s CAMeL Lab, and The University of Edinburgh. Several organizations, including the UAE Ministry of Foreign Affairs, the UAE Ministry of Industry and Advanced Technology, The Department of Health – Abu Dhabi, Abu Dhabi National Oil Company (ADNOC), Etihad Airways, First Abu Dhabi Bank (FAB), and e& will start utilizing Jais, offering valuable insights to further enhance the model. 

Jais’ development and training

Jais is a transformer-based large language model that incorporates many cutting-edge features, including ALiBi position embeddings, which enables the model to extrapolate to much longer inputs, providing better context handling and accuracy. Other state-of-the-art techniques include SwiGLU and maximal update parameterization to improve the model’s training efficiency and accuracy.

Jais’ training, fine-tuning, and evaluation were undertaken by an Inception/MBZUAI joint team on the Condor Galaxy 1 (CG-1), the recently announced, state-of-the-art AI supercomputer co-developed by G42 and Cerebras Systems. The 13-billion parameter open-source model was trained on a unique and purpose-built dataset of 116 billion Arabic tokens designed to capture the complexity, nuance, and richness of Arabic. It also included 279 billion English word tokens aimed at increasing the model’s performance through cross-language transfer. Inception and MBZUAI will continue to expand and refine Jais as its user community grows.

“Our strategic partnership with G42 is already delivering pioneering results. A few weeks ago, we introduced the first multi-exaFLOP AI supercomputer, Condor Galaxy 1 (CG-1). Now, the partnership delivers another key breakthrough: the leading Arabic LLM for the open-source community,” said Andrew Feldman, co-founder and CEO of Cerebras Systems. “At Cerebras our passion is building groundbreaking technology. One of the great rewards is seeing the innovative ways it is used. Jais is a significant contribution to the international open-source community. It is also a testament to how incredibly easy CG-1 is to use and how it enables extremely rapid AI model development.”

Today, Inception sits at the intersection of the academic, business, and regulatory realms to unlock synergies, foster collaboration, and accelerate the commercialization of AI across industries.

Jais is available for download on Hugging Face. Users can also try Jais online upon registering interest on Jais’ website and receiving an invite to access the playground environment. To know more about Jais and how it benchmarks against other models, you can read the Jais white paper.

About Inception

Inception, a G42 company focused on applied AI research and advancements, has been at the forefront of driving innovation in AI since 2017. As a disruptor in the field, Inception operates at the intersection of academic, research, business, and regulatory domains to unlock synergies, foster collaboration, and accelerate the commercialization of responsible AI across industries. To learn more about Inception, visit: www.inceptioniai.org

About Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
MBZUAI is a graduate research university focused on artificial intelligence, computer science, and digital technologies across industrial sectors. The university aims to empower students, businesses, and governments to advance artificial intelligence as a global force for positive progress. MBZUAI offers various graduate programs designed to pursue advanced, specialized knowledge and skills in artificial intelligence, including computer vision, machine learning, natural language processing, robotics, and computer science. For more information, please visit www.mbzuai.ac.ae

To apply for admission, visit mbzuai.ac.ae or contact admissions@mbzuai.ac.ae. For press inquiries, please contact:

About Cerebras Systems

Cerebras Systems is a team of pioneering deep learning researchers, computer architects, and solutions specialists of all types. We have come together to bring generative AI to enterprises and organizations of all sizes around the world. Our flagship product, the CS-2 system, powered by WSE-2, the world’s largest and fastest AI processor, makes training large models simple and easy, by avoiding the complexity of distributed computing. Our software tools simplify the deployment and training process, providing deep insights and ensuring best in class accuracy. Through our team of world-class ML researchers and practitioners who bring decades of experience developing and deploying the most advanced AI models, we help our customers stay on the cutting edge of AI. Cerebras solutions are available in the cloud, through the Cerebras AI Model Studio or on premise. For further information, visit https://www.cerebras.net.