Mozilla Data Collective

[Ushojo] — Ushojo (`ush`)

This datasheet is for version 23.0 of the the Mozilla Common Voice Spontaneous Speech dataset for Ushojo (ush). The dataset contains 6 hours of recorded speech (5 hours validated) from 10 speakers.

Language

Ushojo is an Indo-Aryan language spoken by about 1000-1200 people in Bishigram near Madyan in Swat Pakistan

Demographic information

The dataset includes the following distribution of age and gender.

Gender

Self-declared gender information, frequency refers to the number of clips annotated with this gender.

Age

Self-declared age information, frequency refers to the number of clips annotated with this age band.

Transcriptions

Spontaneous speech prompted to the system and then transcribed into audio.

Writing system

Shina, Torwali based on Perso-Arabic

Symbol table

ݜ، ڙ، ڇ، أ، نڑ different from Urdu

Questions

There follows a randomly selected sample of transcribed responses from the corpus. تُو کامیک رونگ خوشاریلا؟ می تا ہر فن خوشاریما کے فن خو فن بینو. تو کدا کدا خارو کئی بجونو خوشاریما؟ پٹوئیا کارے جے کاروبار اِسٹارٹ بینو؟ تو آسو جیب رس بئیلا؟

Responses

There follows a randomly selected sample of transcribed responses from the corpus. می تہ ہر فن خوشاریما کے فن خو فن بینو۔ می تہ ݜیلو رونگ لالو خوش ہنو۔ آسو شیِدلے موسم در خار کئی بجونو خوشاریما۔ کے گرمی نی بیلو۔ موسم برابر بیلو۔ ہاں۔ مہ تی توسی جیب شنوٹو شنوٹو رز بئیلا۔ مہ تی کامن وائس بارا در تپوس کیلا۔

Community links

With the community. I have good network with them

Discussions

Contribute

Datasheet authors

Zubair Torwali, ztorwali@gmail.com 2. Javid Iqbal Torwali email: jitorwali@gmail.com

Citation guidelines

Javid Iqbal Torwali ' 2. Ihsan Ullah 3. Tariq Aziz 4. Zubair Torwali

Funding

yes we acknowledge

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.

Common Voice Spontaneous Speech 1.0 - Ushojo

[Ushojo] — Ushojo (`ush`)

Language

Demographic information

Gender

Age

Transcriptions

Writing system

Symbol table

Questions

Responses

Recommended post-processing

Community links

Discussions

Contribute

Datasheet authors

Citation guidelines

Funding

Licence

Common Voice Spontaneous Speech 1.0 - Ushojo

[Ushojo] — Ushojo (ush)

Language

Demographic information

Gender

Age

Transcriptions

Writing system

Symbol table

Questions

Responses

Recommended post-processing

Community links

Discussions

Contribute

Datasheet authors

Citation guidelines

Funding

Licence

[Ushojo] — Ushojo (`ush`)