About Us

We are a group of volunteer researchers dedicated to promoting equal access to multimodal and multilingual AI. Our goal is to build a permissive and open stack for developing multimodal LLMs. This initiative is a collaborative effort led by OntocordAI.

The -m in Aurora-M2 refers to our focus on multimodal, multilingual, multidomain mixture-of-experts (MoE) models, each of which we aim to explore and develop through ongoing research.

Building on our previous success—Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code—we are training a family of models aligned with laws, regulations, and policies for controllable AI. The series will include models with parameter sizes of 3B, 8B, and 21B, aligned with the comprehensive policy framework of the EU AI Act, specifically Annex III of the Act.

As part of our commitment to openness, we plan to open-source the entire training pipeline and experimental process—including data synthesis and the evolving methodologies we employ in model training. Stay with us!

Team

Team Members

Huu Nguyen	Harsh Raj	Ken Tsui	Minh Chien Vu	Diganta Misra	Victory May
Marianna Nezhurina	Christoph Schuhmann	Robert Kaczmarczyk	Jenia Jitsev