For more information on the specific data points, you can explore the Official WALS Features List or the WALS-Bench dataset on Hugging Face.
: Noun phrase conjunction (63A) versus verbal conjunction (64A). Verbal Categories (Chapters 65–70) :
This specific set is often used in for the following purposes:
: Leveraging the broad cross-linguistic data in WALS to improve how models handle the hundreds of languages that lack large amounts of training text.
: Definite (37A) and Indefinite (38A) article systems.
The "RoBERTa" designation suggests this data has been pre-processed or formatted for use with the (Robustly Optimized BERT Pretraining Approach) large language model, likely for tasks like cross-lingual transfer or testing a model's metalinguistic knowledge. Included Linguistic Features (Chapters 37–70)
: Obligatory possessive inflection (58A) and possessive classification (59A).