Common Voice Scripted Speech 23.0 - Wakhi

Locale: wbl

Size: 118.27 MB

Task: ASR

Format: MP3

License: CC-0


Wakhi (Wuk̃hikwor) — Wakhi (wbl)

This datasheet is for version 23.0 of the the Mozilla Common Voice Scripted Speech dataset for Wakhi (wbl). The dataset contains 16 hours of recorded speech (13 hours validated) from 13 speakers.

Language

Wakhi or Wakhani is indigenously termed as K̃hikwor )contraction of Wuk̃hikwor). It's an old eastern Iranian or Iranic language within the Pamiri branch. Though, diasporas also live in Russia and Turkey as well as in the European, American an Australian continents, the Wakhi (or K̃hikwor)language is spoken indigenously in Pakistan,China, Afghanistan, Tajikistan and Kirghizistan.

Variants

The dataset has two key distinct varieties: Hunza Wakhi and out of Hunza Wakhi varieties. Hunza Wakhi differs not only in terms of its accent but somehow in grammatical features, too. Wakhi speakers of Ghizer and Chitral district have relatively similar accent. The grammatical aspects of these districts share a greater amount of Wakhi variety with the adjacent countries of their indigenous settlements. However, it's important to recognize that the huge vocabularies in all countries also differ to a significant level from each other, especially within technological realm in addition to following the respective national languages and some other genuine considerations.

Demographic information

The dataset includes the following distribution of age and gender.

Gender

Self-declared gender information, frequency refers to the number of clips annotated with this gender.

Age

Self-declared age information, frequency refers to the number of clips annotated with this age band.

Text corpus

The corpus has sentences of Hunza Wakhi and based on extensive anthropological and linguistic fieldwork in Pakistan, China, Afghanistan and Tajikistan.T

Writing system

The script used is Roman Anglicized writing system, which is the approved script by Wakhi Tajik Cultural Association (WTCA), Pakistan, an Ishkoman Wakhi Welfare Organization (IWWO).Through this script, the literate Wakhi people easily and happily interact with each other across the borders on social media forums: it thus facilitates their creativities, thought expression in textual form and binds them together.

Symbol table

D̃d̃ Dh Ee Ẽẽ Ff Gg Gh g̃h Ii Jj J̃j̃ Kk Kh K̃h Ll Mm Nn Oo Pp Qq Rr Ss Sh S̃h Tt T̃t̃ Th Uu Ũũ Vv Ww Yy Zz Zh Z̃hZ̃z̃.

Sample

There follows a randomly selected sample of five sentences from the corpus. Kum insone ki cẽ haq en k̃hat e disht, yowe k̃hũ Khũdhoy disht. Yemi ya inson ki yowes̃h aql-e bũnyodher bafig̃h et shakig̃hev yewerd. Agar ki aql-e ya jũz cam en nik̃hinden, insoni cẽ kũ haywon’v en be lup darinda. Woz sakes̃h dem k̃hũ jahon insonev winen ki dẽ aql en qiti, cerenges̃h darinda wocen. Parwardigore haya dẽstan inson-e rũwes̃h e jũr k̃hak dẽstan, bihisht et dũz̃akh-e tasawũr ratk.

Sources

  1. Texts (sentences) made out of my own brain (creation) during the assignment period. 2) Texts out of selected Wakhi poetries. 3) New Wakhi transcriptions (texts) of the interviews out of my extensive fieldwork in Pakistan, china, Afghanistan and Tajikistan. 4) Wakhi publications from formal website: www.fazalamin.com

Text domains

General

Community links

There are uncountable social media forums (FaceBook, YouTube channels, or Insta) made with the name of Wakhi or in Wakhi language and cultural context. I can’t rmember all. Here, I’m offering some of them, where I normally post the links of my creative work published on my website or YouTube channels. It becomes difficult for a visually disabled peron like me to trace out the links of them. I’m just mentioing their names. FaceBook (FB) Page of Wakhi Tajik Cultural Association (WTCA) FB Page of K̃hikwor Zik-e Razhek FB Page of PamirTimes FB Group of K̃hik Dũnyo (Wakhi World) FB Group of Wuk̃h TV (Wux TV) FB Group of Wakhi Research Center FB Group of Wakhan Corridor FB Group of Wakhi Music FB Group of Wakhi Crowd FB Group of Pamirs YouTube Channel of Wakhi Tajik Cultural Association (WTCA) YouTube channel of Pamir Television

Contribute

http://www.fazalamin.com

Datasheet authors

Mazdak Beg, Ahmad Jami Sakhi, Mr. Amanullah , Fazal Amin Beg

Funding

Mozilla Foundation

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.