Common Voice Scripted Speech 23.0 - Yidgha

Locale: ydg

Size: 85.26 MB

Task: ASR

Format: MP3

License: CC-0


[Yadgha] — Yadgha (ydg)

This datasheet is for version 23.0 of the the Mozilla Common Voice Scripted Speech dataset for Yadgha (ydg). The dataset contains 12 hours of recorded speech (11 hours validated) from 15 speakers.

Language

Yadgha (ISO 639-3: ydg), also known as Lutkohiwar, is spoken in the Lutkoh Valley, situated approximately 46 km west of Chitral town. The Yadgha people trace their origins to the Munjan valley in Afghanistan, having migrated to the Lutkoh Valley 31 generations ago. The Yadgha community consists of around 6,000 speakers, although this number is gradually decreasing. Speakers of the language are shifting to Khowar, the lingua franca of Chitral valley. Yadgha is a written language, and several poets compose poetry in it. However, limited literacy activities are currently underway to support the language's preservation

Variants

There is no different variety of the language.

Demographic information

The dataset includes the following distribution of age and gender.

Gender

Self-declared gender information, frequency refers to the number of clips annotated with this gender.

Age

Self-declared age information, frequency refers to the number of clips annotated with this age band.

Text corpus

The text came from my own writing. The number of sentences are 2000

Writing system

The writing of Yadgha language is Perso-Arabic, develop by the community with support of Forum for language initiatives, a few years back

Symbol table

آ ا ب پ ت ٹ ث ج چ ح خ ݯ ځ څ ݮ د ذ ر ز ڑ ژ ݱ س ش ݰ ص ض ط ظ ع غ ف ڤ ک گ ګ م ن ں و ہ ة ھ ء ی ے

Sample

There follows a randomly selected sample of five sentences from the corpus. نَمن یاغو شَماؤ نغن غور ڤے انسان خدان پیدا کڑے تو چر زیمونے نے ہورغن تیار اوئے

Sources

I wrote sentences are my own. There is very few written material of the language. Those are world list and alphabet book

Text domains

General

Processing

I wrote the sentences my own. I am a poet of the language and usually do write my poetry. Using the skill I develop the corpus that comprised on various general topics.

Funding

Meesum Alam

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.