CD-GraB (WT2): ускорение обучения языковых моделей

Q: Кто разработал CD-GraB (WT2)?

Модель CD-GraB (WT2) разработана компанией Cornell University (United States of America).

// задачи

Языковое моделирование

// описание

Продвинутый алгоритм оптимизации, разработанный в Корнеллском университете для ускорения обучения языковых моделей. CD-GraB использует балансировку градиентов для умной сортировки данных, что позволяет ИИ сходиться быстрее, чем при стандартном случайном перемешивании. Технология повышает эффективность тренировки нейросетей на больших датасетах.

// abstract

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.

// faq

Что такое CD-GraB (WT2)?+

Кто разработал CD-GraB (WT2)?+

Какие задачи решает CD-GraB (WT2)?+

// похожие модели