Common Voice Scripted Speech 23.0 - Yidgha
Locale: ydg
Size: 85.26 MB
Task: ASR
Format: MP3
License: CC-0
[Yadgha] — Yadgha (ydg
)
This datasheet is for version 23.0 of the the Mozilla Common Voice Scripted Speech dataset
for Yadgha (ydg
). The dataset contains 12 hours of recorded
speech (11 hours validated) from 15 speakers.
Language
Yadgha (ISO 639-3: ydg), also known as Lutkohiwar, is spoken in the Lutkoh Valley, situated approximately 46 km west of Chitral town. The Yadgha people trace their origins to the Munjan valley in Afghanistan, having migrated to the Lutkoh Valley 31 generations ago. The Yadgha community consists of around 6,000 speakers, although this number is gradually decreasing. Speakers of the language are shifting to Khowar, the lingua franca of Chitral valley. Yadgha is a written language, and several poets compose poetry in it. However, limited literacy activities are currently underway to support the language's preservation
Variants
There is no different variety of the language.
Demographic information
The dataset includes the following distribution of age and gender.
Gender
Self-declared gender information, frequency refers to the number of clips annotated with this gender.
Age
Self-declared age information, frequency refers to the number of clips annotated with this age band.
Text corpus
The text came from my own writing. The number of sentences are 2000
Writing system
The writing of Yadgha language is Perso-Arabic, develop by the community with support of Forum for language initiatives, a few years back
Symbol table
آ ا ب پ ت ٹ ث ج چ ح خ ݯ ځ څ ݮ د ذ ر ز ڑ ژ ݱ س ش ݰ ص ض ط ظ ع غ ف ڤ ک گ ګ م ن ں و ہ ة ھ ء ی ے
Sample
There follows a randomly selected sample of five sentences from the corpus. نَمن یاغو شَماؤ نغن غور ڤے انسان خدان پیدا کڑے تو چر زیمونے نے ہورغن تیار اوئے
Sources
I wrote sentences are my own. There is very few written material of the language. Those are world list and alphabet book
Text domains
General
Processing
I wrote the sentences my own. I am a poet of the language and usually do write my poetry. Using the skill I develop the corpus that comprised on various general topics.
Funding
Meesum Alam
Licence
This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.