Voicebank-Demand Dataset
Voicebank-Demand, also known as VCTK-Demand, is a noise suppression dataset with a sampling rate of 48kHz.
There are two train datasets: one is a 28-speaker version, and the other is a 56-speaker version.
In many papers, including ours, the 28-speaker version is used.
Preparing dataset
Download
Download the train data, test data, and logfiles from here.
Download a trainscript file of the testset from here.
Downsample
If needed, downsample the dataset using scripts/resample.py
.
For example, if you want to downsample to 16kHz, run the code below:
python -m scripts.resample --to-sr 16000 --from-dir ~/Datasets/voicebank-demand/48k --to-dir ~/Datasets/voicebank-demand/16k
After downloading, the directory may look like this:
voicebank-demand
├─ 16k
| ├─ clean_testset_wav
| ├─ clean_trainset_28spk_wav
| ├─ noisy_testset_wav
| └─ noisy_trainset_28spk_wav
├─ 48k
| ├─ clean_testset_wav
| ├─ clean_trainset_28spk_wav
| ├─ noisy_testset_wav
| └─ noisy_trainset_28spk_wav
└─ logfiles
├─ log_readme.txt
├─ log_testset.txt
├─ log_trainset_28spk.txt
└─ transcript_testset.txt
Modify Configuration file
You have to change the dataset path and sampling rate of configuration file.
For example, to train FastEnhancer-B, change data
section in configs/fastenhancer/b.yaml
.
Dataset Code
For Voicebank-demand dataset, we load clean and noisy speech pairs.
To check or modify the dataset code, see utils/data/voicebank_demand.py
.
Training Code
To check or modify the training code for Voicebank-Demand, see wrappers/ns.py
.
Clean and noisy pairs are loaded for training.