Web-scale text corpus for Azerbaijani
There are several Azerbaijani text corpora at the scale of hundreds of millions of words. We intend to push this number to billions without sacrificing the quality. This requires sophisticated automation pipelines in several stages.