Продвинутая версия архитектуры BERT, специально адаптированная для работы со сложной морфологией арабского языка. Благодаря улучшенной сегментации текста, этот ИИ точнее понимает контекст и эффективнее справляется с генерацией ответов. Лучшее решение для локализации NLP-проектов в арабском сегменте.
AraBERT is an Arabic pretrained language model based on Google's BERT architechture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT Paper and in the AraBERT Meetup There are two versions of the model, AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the Farasa Segmenter. We evaluate AraBERT models on different downstream tasks and compare them to mBERT, and other state of the art models (To the extent of our knowledge). The Tasks were Sentiment Analysis on 6 different datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD