Developing models together, openly
Marin is an open lab for building foundation models—together. We’re training powerful models from scratch, and sharing and programmatically documenting every step: the code, the data, the experiments, the mistakes…all in real-time. We invite anyone who shares our vision of open science and open-source to join and contribute, whether you want to try out a new architecture, training algorithm, dataset, evaluation…there is a lot to do!
News (2025-05-19): Read our announcement!
Want to jump in? Install the Marin code and run your first experiment!
Building a foundation model requires countless experiments trying out endless variants of algorithms and datasets. All the experiments we’re doing are captured as GitHub issues (here is a summary).
Here’s the lifecycle of an experiment:
Some examples:
We trained some models in Marin:
Have a new architecture or training procedure that you think is more efficient? Participate in the Marin speedrun competition (inspired by the nanogpt speedrun), pick your compute budget, and create the fastest method to train a model to a certain quality! Here’s an example submission. We will offer free compute to scale up top performers. Get started here.
Want to add new capabilities to the Marin models? Visit our datashop, where you can upload a dataset or craft a prompt to curate a relevant dataset for your task.
For example, we used Llama 3 70B to filter for mathematical educational data (like FineMath). [issue, PR, code, execution]
Marin wouldn’t be possible without the generous support of the Google TPU Research Cloud program. We also benefit immensely from the broader open ecosystem, who have released numerous tools and datasets.