StripedHyena — это прорывная гибридная архитектура, которая бросает вызов классическим трансформерам. Благодаря сочетанию внимания и сверточных блоков Hyena, эта ИИ-модель демонстрирует потрясающую эффективность в работе с длинными контекстами и сложной генерацией текста.
StripedHyena is the first alternative model architecture competitive with the best open-source Transformers of similar sizes in short and long-context evaluations. StripedHyena is a deep signal processing, hybrid architecture composed of rotary (grouped) attention and gated convolutions arranged in Hyena blocks, with improved scaling over decoder-only Transformers. StripedHyena is designed to leverage the specialization of each of its layer classes, with Hyena layers implementing the bulk of the computation required for sequence processing and attention layers supplementing the ability to perform targeted pattern recall. Efficient autoregressive generation via a recurrent mode (>500k generation with a single 80GB GPU) Low latency, faster decoding and higher throughput than Transformers. Significantly faster training and finetuning at long context (>3x at 131k) Improved scaling laws over state-of-the-art architectures (e.g., Transformer++) on both natural language and biological sequences. Robust to training beyond the compute-optimal frontier e.g., training way beyond Chinchilla-optimal token amounts