Web-scale text corpus for Azerbaijani

There are several Azerbaijani text corpora at the scale of hundreds of millions of words. We intend to push this number to billions without sacrificing the quality. This requires sophisticated automation pipelines in several stages.