Pre-Training Permissive-Data 2025-04-12
MixtureVitae: A Permissive, High-Performance, Open-Access Pretraining Dataset
We introduce an open source pretraining dataset designed to lower legal copyright uncertainty while still delivering high-performance. Our dataset, ca...