Компактная, но мощная языковая модель, обученная по уникальному итеративному методу «Tootsie Roll». Marin 8B демонстрирует выдающиеся результаты в написании кода и сложных математических рассуждениях. Этот ИИ доказывает, что инновационный подход к обучению важнее простого наращивания параметров.
The "Tootsie Roll" Process A core premise of the Marin 8B run was that we didn't fully know the best recipe— so we just started training with what we had, and planned to adapt along the way. Internally, we referred to this as the "Tootsie" process, a reference to Tootsie Rolls, which use a "graining" process where each day's batch contains a bit of the previous day's, seeding crystallization or something. (We are not food scientists.) This is admittedly a bit of a strained metaphor, but the idea was that we'd keep folding in new data, training techniques, and whatever else as the training process went on. (As it would turn out, dear reader, we would often change more than the data...) Model Basics Model Size We decided to build a roughly 7-8 billion parameter model mostly out of pragmatism: we initially only had reserved capacity to train a model of that size for long enough. Architecture We settled on the Llama architecture for the usual reasons: it has been shown to work well, easier to plug into existing inference stacks, no one ever got fired for buying IBM, etc. We used the same settings as Llama 3.1 8B.