Електронна пошта Лабораторії акустичної експертизи та корекції
На даній сторінці розміщуються фрагменти листування між Лабораторією та її користувачами
Нижче ми розміщуємо фрагмент листування між студенткою американского університету штату Колорадо Крістен та професором кафедри акустики та акустоелектроніки Продеусом Аркадієм Миколайовичем. Повчальність цього листування полягає в тому, що наші українскі студенти можуть побачити, з якою наполегливістю та прискіпливістю окремі американскі студенти працюють над своїми магістерскими дисертаціями.
Якщо коротко, зміст запитань Крістен полягає в її намаганні оволодіти такими питаннями як використання алгоритмів шумозаглушення, вивчення їх можливостей та оцінювання якості мовлення за допомогою об’єктивних показників, таких, зокрема, як PESQ, що широко застосовується в системах зв’язку для оцінювання якості ліній зв’язку.
October, request from Kristen to Arkadiy Prodeus
I am a graduate student from Colorado state university from Electrical and Computer Engineering. I am interested in speech signal processing and worked on some small speech enhancement techniques including wiener filtering for my Masters thesis.
The reason I am contacting you is that while I was trying to implement your code and it mentioned that we need to download the PESQ.exe file from the ITU website but the package that was available on the website was implementation in C. There is no executable file that gets downloaded from there, so I would like to request you to send me a the PESQ.exe that was used in your implementation of PESQ algorithm.
I will be very thankful to you if you could send me the entire code that could take the input and provides a measure of the quality of enhanced speech. I am interested in your research area and hope to become as expert as you are.
I look forward to hearing from you soon. Thank you for your time and consideration.
October, answer from Arkadiy Prodeus to Kristen
You can find at web-page http://www.mathworks.com/matlabcentral/fileexchange/47333-pesq-matlab-driver next words: “If you have problems with downloading or compilation, you can try get pesq2.exe from here: https://yadi.sk/d/NwFNZ25RZDTXg “
So, click it and download it
November, request from Kristen to Arkadiy Prodeus
Thank you for this function, I was able to successfully resample the file to 16KHz but the problem I have is that the PESQ measure gives MOS and LQ scores. I am having a hard time understanding which score is the one that gives the speech quality? I would appreciate if you could clarify that for me.
Also, I am getting scores of ” WB MOS LQO = 1.360″ when I compare the clean with the enhanced speech. According to the PESQ algorithm, this seems to be poor. What do you think? Is it common in speech enhancement to get such low scores or I should work on my algorithm of noise reduction?
I sincerely appreciate all your help in this matter.
November, answer from Arkadiy Prodeus to Kristen
If you see book: P. Loizou, Speech enhancement. Theory and Practice, 2013, p.673, you can find:
“If the detected sampling frequency is 8 kHz, then it returns 2 PESQ scores, the raw PESQ score according to ITU P.862 [9] and the MOS-mapped score according to ITU P.862.1[10]. If the detected sampling frequency is 16 kHz, then it returns the MOS-mapped score according to ITU P.862.2 [11], which covers the wideband implementation of the PESQ measure.”
About mapping rules see pp.502-503 (Fig. 11.14 and Eqs. (11.35)-(11.36)) of the book.
Useful book is also: N. Cote, Integral and Diagnostic Intrusive prediction of speech quality, p.75
So you need use MOS_LQO for both NB-PESQ and WB-PESQ because it is mapped score…
As far as you question: “I am getting scores of ” WB MOS LQO = 1.360″ when I compare the clean with the enhanced speech. According to the PESQ algorithm, this seems to be poor. What do you think? Is it common in speech enhancement to get such low scores or I should work on my algorithm of noise reduction? ”
I think the last assumption is more probable and you should work on your algorithm…
November, request from Kristen to Arkadiy Prodeus
Thank you so much for clarifying my doubts on PESQ scores. With my wav files after I process the speech enhancement, I am resampling them to 16KHz according to your reply on resampling. After that I run the PESQ, so it means I am providing 16KHz signal to the PESQ algorithm but I am getting 2 scores for NB and one score for WB.
NB PESQ MOS = 1.854
NB MOS LQO = 1.524
WB MOS LQO = 1.156
So, which one should I prefer?
I am trying to work on Two Step Noise Reduction which implements the algorithm in two steps: noise reduction and then the harmonic regeneration by “Cyril Plapous, Claude Marro, Pascal Scalart”
I have attached the article and my version of code with this email. I would appreciate if you could have a look and let me know where I can improve this algorithm to get better PESQ scores?
I know I am asking a bit more, but I have to show some improvements in the speech intelligibility. Thank you for your time and all the support in this matter.
November, answer from Arkadiy Prodeus to Kristen
> So, which one should I prefer?
NB MOS LQO = 1.524, if original wav file was narrowband (3.5-4 kHz), and WB MOS LQO = 1.156, if it was wideband (7,5-8 kHz). As your original file had had Fs = 25 kHz, it seems me you can consider your signal as wideband, so WB MOS LQO = 1.156 will be right.
> …let me know where I can improve this algorithm to get better PESQ scores?
Dear Kristen, it isn’t so simple task as you think 🙂
Some time ago, I studied the TSNR algorithm and had found it isn’t so good as it’s authors announced. You can find my article here, there I had compared TSNR algorithm with spectral subtraction, MMSE and logMMSE algorithms – and found that TSNR algorithm is worse…
Of course, next step need be improvement of the algorithm, but I have no time now to solve the task. Excuse me…
November, request from Kristen to Arkadiy Prodeus
I understand your concerns about the algorithm and this is exactly I was suspecting from all the results I have been getting. I will highly be interested in trying other algorithms.
You article was very good and informative, it was efficiently comparing the four algorithms very well. It seems from figure 3(b) that logMMSE outperformed the enhancement and was the best among all.
I would like to investigate this kind of comparison as well with my database. Would you be able to send me the Matlab implementation of this work? I would like to see if I can use logMMSE for speech enhancement. I would really appreciate all your help and support in this matter.
November, answer from Arkadiy Prodeus to Kristen
I used Matlab programs from VoiceBox Toolbox in my investigations: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
First, it was ssubmmse.m Matlab program. There are detailed comments in it, so you can use it without any effort… Of course, its usage demands some another programs-functions, but all they are in the Toolbox, so I can find and download them easely.
Second, program specsub.m performs speech enhancement using spectral subtraction, I had used it also.
Third, I used estnoisem.m algorithm for noise spectrum estimation – its activation is made by default in above programs.
November, request from Kristen to Arkadiy Prodeus
Thanks you so much for directing me the Voicebox, it was surely helpful. I was able to locate all the algorithms you had mentioned in the email about the speech enhancement.
Now when it comes to evaluating the speech quality for these techniques, I would use PESQ for speech quality assessment, right? I had another question relating to PESQ, from the code that you had provided to me, does it compute any correlation coefficients?
Also, I read in some of the articles that PESQ scores are computed using disturbance values and average assymetric disturbance values using PESQ = a0 – a1 . Dind – a2 . Aind. Is this accurate? if so, then where can I find these constants a0, a1 and a2 in your matlab code ( pesq2_mtlb.m file )?
November, answer from Arkadiy Prodeus to Kristen
> I had another question relating to PESQ, from the code that you had provided to me, does it compute any correlation coefficients?
No, PESQ assessment program does not compute any correlation coefficients
> Also, I read in some of the articles that PESQ scores are computed using disturbance values and average assymetric disturbance values using PESQ = a0 – a1 . Dind – a2 . Aind. Is this accurate? if so, then where can I find these constants a0, a1 and a2 in your matlab code ( pesq2_mtlb.m file )?
PESQ source code was downloaded from ITU-T website. So, I think, it is “pure” PESQ score, without any correction coeffitients. You can use it “as it is” and say about it in your report.
November, request from Kristen to Arkadiy Prodeus
Thank you for your feedback, I appreciate it. Your response has clarified my doubts about using the results from PESQ. Also, I was reviewing the set up for PESQ in case of noise suppression algorithms ( ITU-T P835) and it seems like the clean (reference) and enhanced(degraded) signals need to be processed in order to be used by PESQ, do we need to perform any preprociessing (see figure I.1/P835 in ITU-T P835) before using PESQ for speech quality?
As of now, I am just adding 0dB, 5dB, 10dB noise(ssn or babble) to clean and performing noise reduction to obtain enhanced speech and then comparing it with clean using your code in Matlab. Is there anything I am missing here? Please clarify.
November, answer from Arkadiy Prodeus to Kristen
First, as far as figure I.1/P835 in ITU-T P835, “Reference condition: SNR constant, MNRU varies”. I think it isn’t valid for you because: “…The Modulated Noise Reference Unit (MNRU) is a reference condition described in ITU–T Rec. P.810 (1996) which simulates quantizing noise produced by logarithmic PCM technique, e.g. ITU–T Rec. G.726 (1990). In this specific case the noise is correlated to the speech signal. The MNRUs are used quite extensively in the assessment of speech codecs.” (Cote N., Integral and Diagnostic Intrusive Prediction of Speech Quality, 2011, p. 219).
On my opinion, you can forget about MNRU because of noise reduction algorithm (NRA) quality is object of your master’s thesis.
Another question: “do we need to perform any preprocessing”.
On my mind, NO, because of PESQ algorithm without our assistance makes power normalization and time alignment of degraded and reference signals.
But you need make preprocessing such as power normalization and time alignment of degraded and reference signals, when you use another quality measures.
Пропонуємо ще один приклад листування – цього разу із студентом з Ірану.
November, request from Arash to Arkadiy Prodeus
Theme: Sensitivity of Automatic Speech Recognition to Excessive Noise and Late Reverberation Reduction
I would be grateful if give me your code.
November, answer from Arkadiy Prodeus to Arash
Електронна пошта Лабораторії акустичної експертизи та корекції
На даній сторінці розміщуються фрагменти листування між Лабораторією та її користувачами
Нижче ми розміщуємо фрагмент листування між студенткою американского університету штату Колорадо Крістен та професором кафедри акустики та акустоелектроніки Продеусом Аркадієм Миколайовичем. Повчальність цього листування полягає в тому, що наші українскі студенти можуть побачити, з якою наполегливістю та прискіпливістю окремі американскі студенти працюють над своїми магістерскими дисертаціями.
Якщо коротко, зміст запитань Крістен полягає в її намаганні оволодіти такими питаннями як використання алгоритмів шумозаглушення, вивчення їх можливостей та оцінювання якості мовлення за допомогою об’єктивних показників, таких, зокрема, як PESQ, що широко застосовується в системах зв’язку для оцінювання якості ліній зв’язку.
October, request from Kristen to Arkadiy Prodeus
I am a graduate student from Colorado state university from Electrical and Computer Engineering. I am interested in speech signal processing and worked on some small speech enhancement techniques including wiener filtering for my Masters thesis.
The reason I am contacting you is that while I was trying to implement your code and it mentioned that we need to download the PESQ.exe file from the ITU website but the package that was available on the website was implementation in C. There is no executable file that gets downloaded from there, so I would like to request you to send me a the PESQ.exe that was used in your implementation of PESQ algorithm.
I will be very thankful to you if you could send me the entire code that could take the input and provides a measure of the quality of enhanced speech. I am interested in your research area and hope to become as expert as you are.
I look forward to hearing from you soon. Thank you for your time and consideration.
October, answer from Arkadiy Prodeus to Kristen
You can find at web-page http://www.mathworks.com/matlabcentral/fileexchange/47333-pesq-matlab-driver next words: “If you have problems with downloading or compilation, you can try get pesq2.exe from here: https://yadi.sk/d/NwFNZ25RZDTXg “
So, click it and download it
November, request from Kristen to Arkadiy Prodeus
Thank you for this function, I was able to successfully resample the file to 16KHz but the problem I have is that the PESQ measure gives MOS and LQ scores. I am having a hard time understanding which score is the one that gives the speech quality? I would appreciate if you could clarify that for me.
Also, I am getting scores of ” WB MOS LQO = 1.360″ when I compare the clean with the enhanced speech. According to the PESQ algorithm, this seems to be poor. What do you think? Is it common in speech enhancement to get such low scores or I should work on my algorithm of noise reduction?
I sincerely appreciate all your help in this matter.
November, answer from Arkadiy Prodeus to Kristen
If you see book: P. Loizou, Speech enhancement. Theory and Practice, 2013, p.673, you can find:
“If the detected sampling frequency is 8 kHz, then it returns 2 PESQ scores, the raw PESQ score according to ITU P.862 [9] and the MOS-mapped score according to ITU P.862.1[10]. If the detected sampling frequency is 16 kHz, then it returns the MOS-mapped score according to ITU P.862.2 [11], which covers the wideband implementation of the PESQ measure.”
About mapping rules see pp.502-503 (Fig. 11.14 and Eqs. (11.35)-(11.36)) of the book.
Useful book is also: N. Cote, Integral and Diagnostic Intrusive prediction of speech quality, p.75
So you need use MOS_LQO for both NB-PESQ and WB-PESQ because it is mapped score…
As far as you question: “I am getting scores of ” WB MOS LQO = 1.360″ when I compare the clean with the enhanced speech. According to the PESQ algorithm, this seems to be poor. What do you think? Is it common in speech enhancement to get such low scores or I should work on my algorithm of noise reduction? ”
I think the last assumption is more probable and you should work on your algorithm…
November, request from Kristen to Arkadiy Prodeus
Thank you so much for clarifying my doubts on PESQ scores. With my wav files after I process the speech enhancement, I am resampling them to 16KHz according to your reply on resampling. After that I run the PESQ, so it means I am providing 16KHz signal to the PESQ algorithm but I am getting 2 scores for NB and one score for WB.
NB PESQ MOS = 1.854
NB MOS LQO = 1.524
WB MOS LQO = 1.156
So, which one should I prefer?
I am trying to work on Two Step Noise Reduction which implements the algorithm in two steps: noise reduction and then the harmonic regeneration by “Cyril Plapous, Claude Marro, Pascal Scalart”
I have attached the article and my version of code with this email. I would appreciate if you could have a look and let me know where I can improve this algorithm to get better PESQ scores?
I know I am asking a bit more, but I have to show some improvements in the speech intelligibility. Thank you for your time and all the support in this matter.
November, answer from Arkadiy Prodeus to Kristen
> So, which one should I prefer?
NB MOS LQO = 1.524, if original wav file was narrowband (3.5-4 kHz), and WB MOS LQO = 1.156, if it was wideband (7,5-8 kHz). As your original file had had Fs = 25 kHz, it seems me you can consider your signal as wideband, so WB MOS LQO = 1.156 will be right.
> …let me know where I can improve this algorithm to get better PESQ scores?
Dear Kristen, it isn’t so simple task as you think 🙂
Some time ago, I studied the TSNR algorithm and had found it isn’t so good as it’s authors announced. You can find my article here, there I had compared TSNR algorithm with spectral subtraction, MMSE and logMMSE algorithms – and found that TSNR algorithm is worse…
Of course, next step need be improvement of the algorithm, but I have no time now to solve the task. Excuse me…
November, request from Kristen to Arkadiy Prodeus
I understand your concerns about the algorithm and this is exactly I was suspecting from all the results I have been getting. I will highly be interested in trying other algorithms.
You article was very good and informative, it was efficiently comparing the four algorithms very well. It seems from figure 3(b) that logMMSE outperformed the enhancement and was the best among all.
I would like to investigate this kind of comparison as well with my database. Would you be able to send me the Matlab implementation of this work? I would like to see if I can use logMMSE for speech enhancement. I would really appreciate all your help and support in this matter.
November, answer from Arkadiy Prodeus to Kristen
I used Matlab programs from VoiceBox Toolbox in my investigations: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
First, it was ssubmmse.m Matlab program. There are detailed comments in it, so you can use it without any effort… Of course, its usage demands some another programs-functions, but all they are in the Toolbox, so I can find and download them easely.
Second, program specsub.m performs speech enhancement using spectral subtraction, I had used it also.
Third, I used estnoisem.m algorithm for noise spectrum estimation – its activation is made by default in above programs.
November, request from Kristen to Arkadiy Prodeus
Thanks you so much for directing me the Voicebox, it was surely helpful. I was able to locate all the algorithms you had mentioned in the email about the speech enhancement.
Now when it comes to evaluating the speech quality for these techniques, I would use PESQ for speech quality assessment, right? I had another question relating to PESQ, from the code that you had provided to me, does it compute any correlation coefficients?
Also, I read in some of the articles that PESQ scores are computed using disturbance values and average assymetric disturbance values using PESQ = a0 – a1 . Dind – a2 . Aind. Is this accurate? if so, then where can I find these constants a0, a1 and a2 in your matlab code ( pesq2_mtlb.m file )?
November, answer from Arkadiy Prodeus to Kristen
> I had another question relating to PESQ, from the code that you had provided to me, does it compute any correlation coefficients?
No, PESQ assessment program does not compute any correlation coefficients
> Also, I read in some of the articles that PESQ scores are computed using disturbance values and average assymetric disturbance values using PESQ = a0 – a1 . Dind – a2 . Aind. Is this accurate? if so, then where can I find these constants a0, a1 and a2 in your matlab code ( pesq2_mtlb.m file )?
PESQ source code was downloaded from ITU-T website. So, I think, it is “pure” PESQ score, without any correction coeffitients. You can use it “as it is” and say about it in your report.
November, request from Kristen to Arkadiy Prodeus
Thank you for your feedback, I appreciate it. Your response has clarified my doubts about using the results from PESQ. Also, I was reviewing the set up for PESQ in case of noise suppression algorithms ( ITU-T P835) and it seems like the clean (reference) and enhanced(degraded) signals need to be processed in order to be used by PESQ, do we need to perform any preprociessing (see figure I.1/P835 in ITU-T P835) before using PESQ for speech quality?
As of now, I am just adding 0dB, 5dB, 10dB noise(ssn or babble) to clean and performing noise reduction to obtain enhanced speech and then comparing it with clean using your code in Matlab. Is there anything I am missing here? Please clarify.
November, answer from Arkadiy Prodeus to Kristen
First, as far as figure I.1/P835 in ITU-T P835, “Reference condition: SNR constant, MNRU varies”. I think it isn’t valid for you because: “…The Modulated Noise Reference Unit (MNRU) is a reference condition described in ITU–T Rec. P.810 (1996) which simulates quantizing noise produced by logarithmic PCM technique, e.g. ITU–T Rec. G.726 (1990). In this specific case the noise is correlated to the speech signal. The MNRUs are used quite extensively in the assessment of speech codecs.” (Cote N., Integral and Diagnostic Intrusive Prediction of Speech Quality, 2011, p. 219).
On my opinion, you can forget about MNRU because of noise reduction algorithm (NRA) quality is object of your master’s thesis.
Another question: “do we need to perform any preprocessing”.
On my mind, NO, because of PESQ algorithm without our assistance makes power normalization and time alignment of degraded and reference signals.
But you need make preprocessing such as power normalization and time alignment of degraded and reference signals, when you use another quality measures.
Пропонуємо ще один приклад листування – цього разу із студентом з Ірану.
November, request from Arash to Arkadiy Prodeus
Theme: Sensitivity of Automatic Speech Recognition to Excessive Noise and Late Reverberation Reduction
I would be grateful if give me your code.
November, answer from Arkadiy Prodeus to Arash