Technical Name Improving the Efficiency of Dysarthria Voice Conversion System by the Data Augmentation Approach
Project Operator National Chung Cheng University
Project Host 賴穎暉
Summary
The DVC 3.1 system adopts text-to-speech (text-to-speech)multi-domain speech conversion models to synthesize large sets of similar speech to patients’target speakers’ speech,furthermore, exploits phonetic posteriorgrams (PPGs)Gated CNN models to perform the voice conversion for patient's voice characteristics. Finally, take advantage of WaveGlow vocoder to convert the converted features into speech for the listener to listen.
Scientific Breakthrough
From the perspective of AI technology to accomplish dysarthria speech conversion system, insufficient training corpuspatient’s speech variance are extremely important challenges. Two patients are invited to join the test of the DVC 3.1 system. The results show that the speech intelligibility of two dysarthria patients has improved from 17.8140.14 to 80.2483.44 without increasing the recording burden on patients. And compared with the other two baselines, the DVC3.1 system has better voice conversion performance.
Industrial Applicability
This study proposes a speech augmentation technology to conquer an overwhelming recording burden meanwhile, the DVC 3.1 system was used to enhance the speech intelligibility of patients. The data augmentation technology can enhance the diversity of training data,furthermore, can promote the effectiveness of recognition for dysarthria. At present, this technology has been successfully transferred to the industry for DVC product development,this technology will be applied to various speech applications to improve system efficiencyreduce data recording costs.
Keyword Speech assistive devices data augmentation voice Conversion text-to-speech AI hearing assistive devices Biomedical Engineering Biomedical Engineering ASR vocoder
  • Contact
  • Hui-ya Lin
other people also saw