aLLMA Lab | Encoder-only foundation models for Azerbaijani

We are working to build foundation models for Azerbaijani. These models are encoder-only, which makes them suitable for various NLU tasks. First iteration of the project has been completed. We present our results in ACL 2024, as part of the 1st SIGTURK Workshop. Our models are publicly hosted on Hugging Face. As a part of the project, we have also released a text corpus, a tokenizer and several evaluation datasets for Azerbaini.