Эта версия CLIP, обученная на колоссальном датасете LAION-2B, представляет собой вершину в области компьютерного зрения и понимания контекста. Нейросеть мастерски связывает текстовые описания с визуальными образами, обеспечивая высочайшую точность в задачах классификации и поиска изображений.
Model Description A CLIP ViT-H/14 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip). Model training done by Romain Beaumont on the stability.ai cluster. Uses As per the original OpenAI CLIP model card, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model. The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog (https://laion.ai/blog/laion-5b/) and upcoming paper include additional discussion as it relates specifically to the training dataset.