License:
CC-BY-4.0
Steward:
Spire LabTask: TTS
Release Date: 5/6/2026
Format: JSON,WAV
Size: 20.87 GB
Share
As a part of the SYSPIN project, we are releasing 40 hours of studio recorded Telugu male TTS data. Validated audio and text files are made available to the public. This will open up opportunities for academic researchers, students, small and large-scale industries and research labs to innovate and develop algorithms and text-to-speech synthesizers in all the nine Indian languages included in this project proposal. It will also bring competitiveness among different research groups in coming up with ideas for developing high-quality synthesized speech to improve voice-based services. Open voice data is the foundation for local AI innovators to build applications that are geared towards the specific capabilities and requirements of users in India.
Licensing
Creative Commons Attribution 4.0 International (CC-BY-4.0)
https://spdx.org/licenses/CC-BY-4.0.htmlRestrictions/Special Constraints
Any restrictions you want to impose on the dataset
Forbidden Usage
Use cases that are not allowed with this dataset
Version: S1.0
Released by: SPIRE Lab, Indian Institute of Science (IISc), Bengaluru, India
Dataset URL: https://syspin.iisc.ac.in/datasets/telugu%20male%20tts%20data
Download Portal: https://spiredatasets.ee.iisc.ac.in/syspincorpus
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Citation: Abhayjeet et al., "SYSPIN_S1.0 Corpus — A TTS Corpus of 900+ hours in nine Indian Languages", 2025
The SYSPIN Telugu Male TTS Dataset is a single-speaker, studio-recorded, read-speech corpus for Text-to-Speech (TTS) synthesis in the Telugu language, spoken by a professional male voice artist. It is one of eighteen speaker-language datasets produced under the SYSPIN (SYnthesizing SPeech in INdian languages) initiative — a large-scale, open-source effort by SPIRE Lab at IISc Bengaluru to build high-quality TTS corpora across nine Indian languages: Bengali, Bhojpuri, Chhattisgarhi, Hindi, Kannada, Magahi, Maithili, Marathi, and Telugu
| Field | Detail |
|---|---|
| Project name | SYSPIN (SYnthesizing SPeech in INdian languages) |
| Research lab | SPIRE Lab, Dept. of Electrical Engineering, IISc Bengaluru |
| Project lead | Prof. Prasanta Kumar Ghosh, IISc Bengaluru |
| Funding | German Development Cooperation (GIZ) / ARTPARK |
| Associated challenges | LIMMITS 2023 (ICASSP), LIMMITS 2024 (ICASSP), LIMMITS 2025 |
| Sister project | RESPIN (ASR corpus, same language set) |
| Total corpus size | 920 hours across 9 languages, 18 speakers, 462,311 sentences |
| Property | Value |
|---|---|
| Language | Telugu (ISO 639-1: te) |
| Script | Telugu script (Unicode block U+0C00–U+0C7F) |
| Speaker gender | Male |
| Number of speakers | 1 (single speaker) |
| Speaker type | Professional native voice artist |
| Duration | ~40 hours |
| Domains | agriculture,books,education,finance,general,health,others,politics,running text from book,weather |
| Recording environment | Professional studio (anechoic / acoustically treated) |
| Data type | Read speech (scripted utterances) |
| Property | Value |
|---|---|
| File format | WAV (PCM) |
| Sample rate | 48,000 Hz |
| Bit depth | 24-bit |
| Channels | Mono |
| Encoding | Linear PCM |
| Background noise | Absent (studio-controlled environment) |
Note: Downstream users may resample to 16 kHz or 22.05 kHz as required by their synthesis frameworks (e.g., ESPnet, VITS, FastPitch). The original 48 kHz / 24-bit files are the canonical release format.
| Property | Detail |
|---|---|
| Script encoding | UTF-8, Telugu Unicode |
| Text composition | Original sentences composed by native Telugu speakers |
| Domain coverage | Agriculture (crop terminology, farming practices) and Finance (banking, economics) |
| Text normalization | Symbols, numbers, abbreviations, and dates converted to context-specific spoken form |
| Phonetic diversity | Texts designed for broad phoneme and prosody coverage |
| Transcription accuracy | Target: 0% word error rate (zero-tolerance standard required for TTS training) |
| Validation pipeline | Combined automated and manual quality checks by native speakers |
Neural TTS training: End-to-end models (VITS, NaturalSpeech, YourTTS), autoregressive models (Tacotron 2, FastSpeech 2), and codec-based systems (EnCodec, SoundStorm)
Vocoder training/evaluation: HiFi-GAN, WaveGlow, BigVGAN
Multi-speaker and multi-lingual TTS: Alongside other SYSPIN speaker datasets for cross-lingual voice transfer
Voice cloning research: Base corpus for speaker-adaptation experiments
TTS evaluation / MOS benchmarking: Ground-truth natural speech reference for Mean Opinion Score studies
Acoustic model research: Prosody modeling, duration modeling, pitch analysis for Telugu
Language/speech technology accessibility: Downstream applications in agriculture advisory services, financial literacy, e-governance, and healthcare for Telugu-speaking populations
The dataset represents a single speaker and does not capture speaker diversity, dialectal variation, or prosodic style variation across the Telugu-speaking population.
Speech rate is controlled and uniform, consistent with professional TTS recording standards; spontaneous or conversational speech characteristics are absent.
Domain-specific vocabulary (agricultural and financial terminology) may affect model generalization to other domains without fine-tuning.
telugu_male_tts/
├── wav/ # Audio files (WAV, 48 kHz, 24-bit, mono)
│ ├── ....wav
│ └── ...
├── ..._Transcripts.json # transcripts
the Transcripts.json file contains a "Transcripts" key that looks as follows:
"Transcripts": {
"IISc_SYSPINProject_te_m_GENE_02044": {
"Transcript": "ఇంపెడెన్స్ విశ్లేషణ ఉపయోగించి బయోమెట్రిక్ సమాచారాన్ని కొలిచే స్మార్ట్ స్కేల్స్ అడుగుల ద్వారా తేలికపాటి విద్యుత్ ప్రేరణలను పంపుతాయి. ",
"Domain": "GENERAL"
},
"IISc_SYSPINProject_te_m_EDUC_02060": {
"Transcript": "\"అనుష్కతో కలిసి మరీన్ పరేడ్ బీచ్లో బెంచ్పై కూర్చొని ఆస్వాదించిన క్షణాలు జీవితాంతం గుర్తుండి పోతాయి\" అని బిసిసిఐ టివికి ఇచ్చిన ఇంటర్వ్యూలో కోహ్లి తెలిపాడు.",
"Domain": "EDUCATION"
},
...
This dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). Users are free to share and adapt the material for any purpose, including commercial use, provided appropriate credit is given.
Abhayjeet et al., "SYSPIN_S1.0 Corpus — A TTS Corpus of 900+ hours
in nine Indian Languages", 2025.