首页--工业技术论文--无线电电子学、电信技术论文--通信论文--电声技术和语音信号处理论文--语音信号处理论文

基于深度神经网络的语音去混响方法研究

ABSTRACT第5-6页
摘要第7-14页
符号对照表第14-16页
缩略语对照表第16-24页
1. Introduction第24-40页
    1.1 Scope and Motivation第24-26页
    1.2 Room Acoustics and Reverberation Evaluation第26-28页
    1.3 Effects of Reverberation第28-31页
        1.3.1 Effects of Reverberation on Speech Perception第28页
        1.3.2 Effects of Reverberation on Automatic Speech Recognition第28-31页
    1.4 Signal Model and Problem Formulation第31页
    1.5 Related Works第31-36页
        1.5.1 Speech Enhancement Approaches第32-33页
        1.5.2 Channel Inversion and Equalization第33-34页
        1.5.3 Beamforming Using Microphone Arrays第34-36页
    1.6 Contributions and Organization第36-40页
        1.6.1 Contributions第37-38页
        1.6.2 Organization第38-40页
2. Deep Neural Network Approach to Speech Dereverberation第40-64页
    2.1 Introduction第40-41页
    2.2 System Overview第41-46页
        2.2.1 DNN Training Procedure第42-44页
        2.2.2 Feature Extraction第44-45页
        2.2.3 Waveform Reconstruction第45-46页
    2.3 Output Layer Activation and Target Feature Normalization第46-48页
        2.3.1 Sigmoid Activation and Min-Max Normalization第46-47页
        2.3.2 Linear Activation and Mean-Variance Normalization第47-48页
    2.4 Experiment and Analysis第48-56页
        2.4.1 Evaluation with 40-Hour Training Data第49-53页
        2.4.2 Evaluation with 4-Hour Training Data第53-56页
    2.5 Discussions on Generalization Capabilities第56-58页
        2.5.1 Generalization to Room Sizes第56-57页
        2.5.2 Generalization to Loudspeaker and Microphone Positions第57页
        2.5.3 Generalization to Recorded RIRs第57-58页
    2.6 The Importance of Phase in Speech Dereverberation第58-61页
        2.6.1 Experiment Valiation第58-60页
        2.6.2 Frequency Sampling of STFT on Unwrapped Phase第60-61页
    2.7 Conclusion第61-64页
3. Reverberation-Time-Aware DNN Approach to Speech Dereverberation第64-78页
    3.1 Introduction第64-65页
    3.2 System Overview第65-66页
    3.3 Key Parameters in DNN Dereverberation第66-67页
        3.3.1 Frame Shift Size in Speech Framing第66-67页
        3.3.2 Acoustic Context Window Size at DNN Input第67页
    3.4 Experiment and Analysis第67-72页
        3.4.1 Frame-Shift-Aware DNN(FSA-DNN oracle)第68-70页
        3.4.2 Acoustic-Context-Aware DNN(ACA-DNN oracle)第70-71页
        3.4.3 Reverberation-Time-Aware DNN(RTA-DNN oracle)第71页
        3.4.4 Reverberation-Time-Aware DNN(Estimated RT60)第71-72页
    3.5 Discussions on Generalization Capabilities第72-76页
        3.5.1 Generalization to Room Sizes第75页
        3.5.2 Generalization to Loudspeaker and Microphone Positions第75-76页
        3.5.3 Generalization to Recorded RIRs第76页
    3.6 Conclusion第76-78页
4. Reverberation-Time-Aware DNN Approach for Microphone Array Dereverberation第78-102页
    4.1 Introduction第78-80页
    4.2 System Overview第80-88页
        4.2.1 Standard Multi-Microphone DNN-based Systems第80-81页
        4.2.2 Proposed Multi-Microphone DNN-based System (DNNSpatial)第81-83页
        4.2.3 Proposed Reverberation-Time-Aware DNNSpatial (RTA-DNNSpatial)第83-88页
    4.3 Experiment and Analysis第88-96页
        4.3.1 DNNSpatial第88-92页
        4.3.2 RTA-DNNSpatial第92-96页
    4.4 Discussions on Generalization Capabilities第96-99页
        4.4.1 Generalization to Room Size第96-97页
        4.4.2 Generalization to Array Geometry第97页
        4.4.3 Generalization to Loudspeaker Position第97-98页
        4.4.4 Robustness to RT60 Estimation Error第98-99页
    4.5 Conclusion第99-102页
5. End-to-End Deep Learning for Speech Dereverberation and Recognition第102-124页
    5.1 Introduction第102-107页
    5.2 System Overview第107-111页
        5.2.1 Reverberant Speech Characteristics第108页
        5.2.2 Dereverberation Module第108-109页
        5.2.3 Recognition Module第109-110页
        5.2.4 End-to-End Dereverberation and Robust Speech Recognition第110-111页
    5.3 Experimental Setup第111-114页
        5.3.1 Dereverberation Module Configuration第113页
        5.3.2 Recognition Module Configuration第113-114页
    5.4 Experimental Results第114-122页
        5.4.1 Speech Dereverberation Results第114-115页
        5.4.2 ASR Results with Clean-Condition Training第115-117页
        5.4.3 ASR Results with Multi-Condition Training第117-118页
        5.4.4 ASR Results with Multi-Channel-Condition (MCC) Training第118-120页
        5.4.5 ASR Results with MCC Training and MCC Testing第120页
        5.4.6 A Preliminary Investigation with Real Recordings第120-122页
    5.5 Conclusion第122-124页
6. Conclusion第124-130页
    6.1 Contributions第124-127页
    6.2 Suggestions for Future Research第127-130页
Reference第130-140页
Acknowledgement第140-142页
作者简介第142-143页

论文共143页,点击 下载论文
上一篇:经验模态分解中的优化理论与方法研究
下一篇:基于吸波材料的无源干扰新方法研究