Common Voice Spontaneous Speech 1.0 - Ushojo

Locale: ush

Size: 102.40 MB

Task: ASR

Format: MP3

License: CC-0


[Ushojo] — Ushojo (ush)

This datasheet is for version 23.0 of the the Mozilla Common Voice Spontaneous Speech dataset for Ushojo (ush). The dataset contains 6 hours of recorded speech (5 hours validated) from 10 speakers.

Language

Ushojo is an Indo-Aryan language spoken by about 1000-1200 people in Bishigram near Madyan in Swat Pakistan

Demographic information

The dataset includes the following distribution of age and gender.

Gender

Self-declared gender information, frequency refers to the number of clips annotated with this gender.

Age

Self-declared age information, frequency refers to the number of clips annotated with this age band.

Transcriptions

Spontaneous speech prompted to the system and then transcribed into audio.

Writing system

Shina, Torwali based on Perso-Arabic

Symbol table

ݜ، ڙ، ڇ، أ، نڑ different from Urdu

Questions

There follows a randomly selected sample of transcribed responses from the corpus. تُو کامیک رونگ خوشاریلا؟ می تا ہر فن خوشاریما کے فن خو فن بینو. تو کدا کدا خارو کئی بجونو خوشاریما؟ پٹوئیا کارے جے کاروبار اِسٹارٹ بینو؟ تو آسو جیب رس بئیلا؟

Responses

There follows a randomly selected sample of transcribed responses from the corpus. می تہ ہر فن خوشاریما کے فن خو فن بینو۔ می تہ ݜیلو رونگ لالو خوش ہنو۔ آسو شیِدلے موسم در خار کئی بجونو خوشاریما۔ کے گرمی نی بیلو۔ موسم برابر بیلو۔ ہاں۔ مہ تی توسی جیب شنوٹو شنوٹو رز بئیلا۔ مہ تی کامن وائس بارا در تپوس کیلا۔

Recommended post-processing

More datasets needed

Community links

With the community. I have good network with them

Discussions

No

Contribute

NA

Datasheet authors

  1. Zubair Torwali, ztorwali@gmail.com 2. Javid Iqbal Torwali email: jitorwali@gmail.com

Citation guidelines

  1. Javid Iqbal Torwali ' 2. Ihsan Ullah 3. Tariq Aziz 4. Zubair Torwali

Funding

yes we acknowledge

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.