Common Voice Scripted Speech 23.0 - Wakhi
Locale: wbl
Size: 118.27 MB
Task: ASR
Format: MP3
License: CC-0
Wakhi (Wuk̃hikwor) — Wakhi (wbl
)
This datasheet is for version 23.0 of the the Mozilla Common Voice Scripted Speech dataset
for Wakhi (wbl
). The dataset contains 16 hours of recorded
speech (13 hours validated) from 13 speakers.
Language
Wakhi or Wakhani is indigenously termed as K̃hikwor )contraction of Wuk̃hikwor). It's an old eastern Iranian or Iranic language within the Pamiri branch. Though, diasporas also live in Russia and Turkey as well as in the European, American an Australian continents, the Wakhi (or K̃hikwor)language is spoken indigenously in Pakistan,China, Afghanistan, Tajikistan and Kirghizistan.
Variants
The dataset has two key distinct varieties: Hunza Wakhi and out of Hunza Wakhi varieties. Hunza Wakhi differs not only in terms of its accent but somehow in grammatical features, too. Wakhi speakers of Ghizer and Chitral district have relatively similar accent. The grammatical aspects of these districts share a greater amount of Wakhi variety with the adjacent countries of their indigenous settlements. However, it's important to recognize that the huge vocabularies in all countries also differ to a significant level from each other, especially within technological realm in addition to following the respective national languages and some other genuine considerations.
Demographic information
The dataset includes the following distribution of age and gender.
Gender
Self-declared gender information, frequency refers to the number of clips annotated with this gender.
Age
Self-declared age information, frequency refers to the number of clips annotated with this age band.
Text corpus
The corpus has sentences of Hunza Wakhi and based on extensive anthropological and linguistic fieldwork in Pakistan, China, Afghanistan and Tajikistan.T
Writing system
The script used is Roman Anglicized writing system, which is the approved script by Wakhi Tajik Cultural Association (WTCA), Pakistan, an Ishkoman Wakhi Welfare Organization (IWWO).Through this script, the literate Wakhi people easily and happily interact with each other across the borders on social media forums: it thus facilitates their creativities, thought expression in textual form and binds them together.
Symbol table
D̃d̃ Dh Ee Ẽẽ Ff Gg Gh g̃h Ii Jj J̃j̃ Kk Kh K̃h Ll Mm Nn Oo Pp Qq Rr Ss Sh S̃h Tt T̃t̃ Th Uu Ũũ Vv Ww Yy Zz Zh Z̃hZ̃z̃.
Sample
There follows a randomly selected sample of five sentences from the corpus. Kum insone ki cẽ haq en k̃hat e disht, yowe k̃hũ Khũdhoy disht. Yemi ya inson ki yowes̃h aql-e bũnyodher bafig̃h et shakig̃hev yewerd. Agar ki aql-e ya jũz cam en nik̃hinden, insoni cẽ kũ haywon’v en be lup darinda. Woz sakes̃h dem k̃hũ jahon insonev winen ki dẽ aql en qiti, cerenges̃h darinda wocen. Parwardigore haya dẽstan inson-e rũwes̃h e jũr k̃hak dẽstan, bihisht et dũz̃akh-e tasawũr ratk.
Sources
Texts (sentences) made out of my own brain (creation) during the assignment period. 2) Texts out of selected Wakhi poetries. 3) New Wakhi transcriptions (texts) of the interviews out of my extensive fieldwork in Pakistan, china, Afghanistan and Tajikistan. 4) Wakhi publications from formal website: www.fazalamin.com
Text domains
General
Community links
There are uncountable social media forums (FaceBook, YouTube channels, or Insta) made with the name of Wakhi or in Wakhi language and cultural context. I can’t rmember all. Here, I’m offering some of them, where I normally post the links of my creative work published on my website or YouTube channels. It becomes difficult for a visually disabled peron like me to trace out the links of them. I’m just mentioing their names. FaceBook (FB) Page of Wakhi Tajik Cultural Association (WTCA) FB Page of K̃hikwor Zik-e Razhek FB Page of PamirTimes FB Group of K̃hik Dũnyo (Wakhi World) FB Group of Wuk̃h TV (Wux TV) FB Group of Wakhi Research Center FB Group of Wakhan Corridor FB Group of Wakhi Music FB Group of Wakhi Crowd FB Group of Pamirs YouTube Channel of Wakhi Tajik Cultural Association (WTCA) YouTube channel of Pamir Television
Contribute
Datasheet authors
Mazdak Beg, Ahmad Jami Sakhi, Mr. Amanullah , Fazal Amin Beg
Funding
Mozilla Foundation
Licence
This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.