Automation of Subjective Measurements of Logatom Intelligibility in Classrooms

A great number of rooms are dedicated to a voice communication between a singular speaker and a group of listeners. Those rooms could be as small as meeting room and classrooms or larger like auditoriums and theatres. There is a major demand for them to assure the highest possible speech intelligibility for all listeners in the room. Classrooms are an example of rooms where a very good speech intelligibility is required (a teacher talks to a group of students who want to understand the teacher’s utterance). To determine the intelligibility degree (its maps) of rooms, it is necessary to take measurements in many points of those rooms. The number of measurement points depends on room’s size and precision of created intelligibility room’s maps. Despite of the crucial progress in the instrumental measurement techniques, the only reliable method subjective speech intelligibility measurement is still very time consuming, expensive, demanding high skills and specially trained group of listeners. The first part of this chapter presents the idea of speech intelligibility subjective measurements. The measurements with properly trained team are taken in described standards, conditions have to be controlled and repeated. The subsequent sections of the chapter are focusing on one of the classic subjective speech intelligibility measurement method in rooms (classrooms) and its automated version which is named as the modified intelligibility test with forced choice (MIT-FC). Finally, in the last section of the chapter, there compared results of subjective speech intelligibility measurements in rooms taken with classic and automated methods are and the relation between intelligibility taken with the forced choice method is given as well. The presented relation let us compare results taken with both methods and use relations known from earlier research carried in domain of speech intelligibility. However, the biggest advantage of the speech intelligibility measurement automation is the shortening of measurement time and the possibility of taking simultaneous measurements in several points of the room. The result of speech intelligibility is obtained just after the end of measurements, it is then possible to obtain the intelligibility map in few minutes. Of course the precision is growing with number of listeners in particular points.


Introduction
A great number of rooms are dedicated to a voice communication between a singular speaker and a group of listeners.Those rooms could be as small as meeting room and classrooms or larger like auditoriums and theatres.There is a major demand for them to assure the highest possible speech intelligibility for all listeners in the room.Classrooms are an example of rooms where a very good speech intelligibility is required (a teacher talks to a group of students who want to understand the teacher's utterance).To determine the intelligibility degree (its maps) of rooms, it is necessary to take measurements in many points of those rooms.The number of measurement points depends on room's size and precision of created intelligibility room's maps.Despite of the crucial progress in the instrumental measurement techniques, the only reliable method subjective speech intelligibility measurement is still very time consuming, expensive, demanding high skills and specially trained group of listeners.The first part of this chapter presents the idea of speech intelligibility subjective measurements.The measurements with properly trained team are taken in described standards, conditions have to be controlled and repeated.The subsequent sections of the chapter are focusing on one of the classic subjective speech intelligibility measurement method in rooms (classrooms) and its automated version which is named as the modified intelligibility test with forced choice (MIT-FC).Finally, in the last section of the chapter, there compared results of subjective speech intelligibility measurements in rooms taken with classic and automated methods are and the relation between intelligibility taken with the forced choice method is given as well.The presented relation let us compare results taken with both methods and use relations known from earlier research carried in domain of speech intelligibility.However, the biggest advantage of the speech intelligibility measurement automation is the shortening of measurement time and the possibility of taking simultaneous measurements in several points of the room.The result of speech intelligibility is obtained just after the end of measurements, it is then possible to obtain the intelligibility map in few minutes.Of course the precision is growing with number of listeners in particular points.
Speech quality is a multi-dimensional term and a complex psycho-acoustic phenomenon within the process of human perception.Every person interprets speech quality in a different way.The pioneering work on speech intelligibility was carried out by Fletcher at Bell Labs in the early 1940s.Fletcher and his team not only established the effects of bandwidth on intelligibility but also the degree to which each octave and ⅓-octave band contributed.
One of the fundamental parameters for quality assessment of speech signal transmission in analogue and digital telecommunication chains as well as in rooms, sound reinforcement systems and at selection of aural devices is speech intelligibility (ANSI, 2009;Basciuk & Brachmanski, 1997;Brachmanski, 2002Brachmanski, , 2004;;Davies, 1989;International Organization for Standardization -[ISO], 1991;Majewski, 1988Majewski, , 1998Majewski, , 2000;;Polish Standard, 1991, 1999;Sotschek 1976).Satisfactory speech intelligibility should be provided by telecommunication channels, rooms and hearing aids.It is obvious, that speech intelligibility concerns only the linguistic information (i.e.what was said) and does not take into account such features of speech, like its naturalness or the speaker voice individuality.Nevertheless, intelligibility should be and -up till now -is viewed as a basic and most important aspect of the quality of all systems which transmit, code, enhance and process the speech signals.Satisfactory speech intelligibility requires adequate audibility and clarity.
The aim of aural evaluation is quantitative evaluating and qualitative differentiation of acoustical signals reaching a listener.The process of aural evaluation can be presented as follows where: B -stimulus reaching the listener, S -listener, R -listener's reaction The reaction R is dependent on the signal S reaching the listener's receptors and on conditions in which the listener is based.Generally it can be assumed that the reaction R depends on the sum of external stimuli having the effect on the listener and internal factors having the effect on his organism.That assumption, however, doesn't take into consideration the listener's characteristic features such as cognitive abilities, rate of information processing, memory etc.The reaction can be then presented as a function Physiological and psychological process connected with a reaction to sound signal (audio) consists of sensational reaction and emotional reaction.The total listener's reaction is a sum of both types of reaction.
The sensational reaction is the effect of a physiological process which occurs during the listening.It arises when a certain stimulus overdraws sensitivity levels or aural sensation category levels.The emotional reaction is more complex and difficult in analysis because it isn't a direct result of received signal features but the listener's habits and individuality.As a conclusion it can be stated that the sensational reaction is the reflection of an acoustical picture created in a person's (listener's) mind, whereas the emotional reaction is the reflection of a person's attitude to that picture.Psychological and psychoacoustical research have proven that when provided stable in time conditions of evaluation, the differences in sensational reactions of particular listeners are substantially smaller than the differences in their emotional reactions.Therefore, one aim of objectivisation of aural evaluation is limiting the influence of emotional reaction on the final assessment result.That aim is achieved by introducing appropriately numerous assessments statistics, a proper choice and training of a listeners team and proper choice of testing material and rules of carrying listeners tests.The results analysis also has the big role in minimization the influence of the emotional reaction on the assessment result.
Among the different subjective methods that have been proposed for assessment of speech quality in rooms, the preferred are methods based on intelligibility tests or listening-only tests (ITU-T, 1996a(ITU-T, , 1996b;;ITU-R, 1997).The subjective measure results should be mostly dependent on physical parameters of the tested room and not on the structure of the tested language material.The elimination of semantic information is done by means of logatom1 (i.e.pseudo-words) lists on the basis of which the logatom intelligibilities are obtained.The problem ist hat speech (logatom) intelligibility is not a simply parameter to measure.

The traditional method of logatom intelligibility
The measurement of logatom intelligibility consists in the transmission of logatom lists, read out by a speaker, through the tested channel (room), which are then written down by listeners and the correctness of the record is checked by a group of experts who calculate the average logatom intelligibility.It is recommended to use lists of 50 or 100 logatoms (Fig. 1).
Logatom lists are based on short nonsense word of the CVC type (consonant-vowelconsonant).Sometimes only CV, but also CVVC, CVCVC-words are used.The logatoms are presented in isolation or in carrier phrase e.g."Now, please write down the logatom you hear".Each list should be phonetically and structurally balanced (Fig. 2).The measurement should be carried out in rooms in which level of internal noise together with external noise (not introduced on purpose) does not exceed 40 dBA.If no requirements as to background noise are specified for a tested chain, articulation should be measured at a noise level of 60 dBA in the receiving room and for the Hoth spectrum (Fig. 3).
The listeners should be selected from persons who have normal, good hearing and normal experience in pronunciation in the language used in the test.A person is considered to have normal hearing if her/his threshold does not exceed 10 dB for any frequency in a band of 125 Hz -4000 Hz and 15 dB in a band of 4000 Hz-6000 Hz.Hearing threshold should be tested by means of a diagnostic audiometer.Fig. 3.The room noise power density spectrum (Hoth noise) (Polish Standard 1991, 1999).
The size of the listening group should be such that the obtained averaged test results do not change as the group size is further increased (minimum 5 persons).The group of listeners who are to take part in logatom intelligibility measurements should be trained (2-3 training sessions are recommended).Logatoms should be spoken clearly and equally loudly without accenting their beginnings or ends.The time interval between individual logatoms should allow the listener to record the received logatom at leisure.It is recommended that logatoms should be spoken with 3-5 sec.pauses in between.The time interval between sessions should not be shorter than 24 h and not longer than 3 days.The total duration of a session should not exceed 3 hours (including 10 minute breaks after each 20 minute listening period).
Listeners write the received logatoms on a special form on which also the date of the test, the test list number, the speaker's name or symbol (no.), the listener's name and additional information which the measurement manager may need from the listener is noted.The recording should be legible to prevent a wrong interpretation of the logatom.
The received logatoms may be written in phonetic transcription (a group of specially trained listeners is needed for this) or in an orthographic form specific for a given language.In the next step, the group of experts checks the correctness of received logatoms and the average logatom intelligibility is calculated in accordance to the equation ( 3) and ( 4) Standard deviation s, calculated in accordance to Eq.(3), expresses the distribution of logatom intelligibility values W L over listeners.
If W n,k -W L >3s, the result of measurement is not taken into account, when an average intelligibility is calculated and calculation of W L and s must be done in accordance to Eq. ( 1) and (2) for reduced number of measurements.
The obtained average logatom intelligibility value can be used to determine quality classes according to

Modified intelligibility test with forced choice (MIT-FC)
In the traditional intelligibility tests the listeners write down (in ortographic form) received utterences on a sheet of paper and next a professional team revises the results and calculates the average intelligibility.This is the most time-consuming and difficult operation.To eliminate "hand-made " revision of the tests a method, called "modified intelligibility test with forced choice" (MIT-FC) has been designed and investigated in the Institute of Telecommunications, Teleinformatics and Acoustics at the Wroclaw University of Technology .
In the MIT-FC method all experiments are controlled by a computer.The automation of the subjective measurement is connected with the basic change in generation of logatoms and in making decision by a listener.The computer generates logatoms and presents the utterances (for logatom test the list consists of 100 phonetically balanced nonsense words), via a D/A converter and loudspeaker to the listeners subsequently and for each spoken utterance several logatoms that have been previously selected as perceptually similar are visually presented.
It has been found (Brachmanski, 1995) that the optimal number of logatoms presented visually to the listeners is seven (six alternative logatoms and one transmitted logatom to be recognized).The listener chooses one logatom from the list visually presented on the computer monitor.The computer counts the correct answers and calculates the average logatom intelligibility and standard deviation.The measurement time for one logatom set (3 lists) consisted of 300 logatoms is 20 minutes.
All measurement procedures are fully automatized and an operator has a flexible possibilities to set the measurement parameters and options.It is also possible to upgrade the application which realizes the MIT-FC method with more sophisticated scores processing.Block diagram system for the subjective measurements of logatom intelligibility by MIT-FC method in the rooms is presented in Fig. 4. The obtained with MIT-FC method average logatom intelligibility value can be used to determine quality classes according to

MIT-FC measurement system
The program for the subjective assessment of speech transmission in rooms with logatom intelligibility method with forced choice (MIT-FC) is based on TCP Client/Server technology i.e. the communication is done by local network.Requirements of the program are following: PC computer with Windows 9x, a network card and hub for communication between the Server and Clients (members of the team of listeners).
The work with the program starts with the installation on the Server computer a Server program.The Server is supervised by the person leading the subjective assessment.The next step is the installation of the Client program on the Client (listener) computers.The Client computers are used by members of the team of listeners.Before starting the assessment the Server and Client programs should be configured.During the configuration of the Server program it is necessary to give the path to the directory with signal files (testing signalslogatoms), number of logatoms per a session, intervals between reproduced logatoms, number of sessions and the port for the communication with the Client (usually 3000) (Fig. 5).
The configuration of the Client program is done by giving the name of the measure point (e.g. a room, location of listener in the room etc.), listeners login, IP address and the port number (usually: 3000) (Fig. 6).The next step after the configuration is the connection process.The number of people being logged on can be seen on the right-hand side of the main window of the program.During a multi-session assessment (the number of session should be chosen during the configuration of the Server), after finishing of the first session the program waits for reconfiguration of measurement positions.For example, changing the listener on certain measurement position, we should first disconnect the Client and change the user name.After this and choosing the Continue option, the next session can be started.The computer generates logatoms and for each spoken utterance visually shows six alternative logatoms and one transmitted logatom to be recognized (Fig. 7a).The listener chooses one logatom from the list visually presented on the computer monitor (Fig. 7b).During the tests, the listener is confirming his response by using key from '1' to '7'.Other keys are non-active during all testing session.
After finishing all measurement sessions, the dialog window presenting results in two options shows up: 1. Result of session nr -in this option the number of session for which the results will be shown should be given.2. Summary -in this option the summary of all sessions with detailed list of listeners and measurement points shows up.Fig. 7.The example of the listener's window.

Experiments
The goals of experiment  decision if the results of traditional and modified with forced choice methods let finding the relation which would allow to convert results from one method to the other and the classification of rooms tested with both methods,  measurement of experimental relations between traditional and modified logatom intelligibility methods.
Taking into account the comparative character of the experiment it was planned to be done only in a function of used logatom intelligibility method.With this end in view each measurement was done with both traditional and modified methods, not changing: surroundings. measurement system (only the logatom lists), The subjective tests were done according to Polish Standard PN-90/T-05100 with the team of listeners made up of 12 listeners in age from 18 to 25 years.The listening team was selected from students at Wroclaw University of Technology with normal hearing.The qualification was based on audiometric tests of hearing threshold.The measurements of logatom intelligibility were done using the traditional method and the MIT-FC method.The measurements were taken in two unoccupied rooms (Fig. 8).In each room, four measure point (Mp) were selected.These positions were chosen in the expectation of yielding a wide range of logatom intelligibility.Sound sources (voice and white or rose noise) were positioned in the part of the room normally used for speaking.One loudspeaker was the voice source and the second -the noise source.The various conditions were obtained by combination 14 level of white and rose noise and four measure point.The 14 signal-to-noise ratios (SNR) were used: 39 (without noise, only background noise), 36, 33, 30, 27, 24, 21, 18, 15, 12, 9, 6, 3,   The testing material consisted of phonetically and structurally balanced logatoms and sentences lists uttered by professional male speaker, whose native language was Polish.Logatom lists reproduced through the loudspeaker were recorded on the digital tape recorder in an anechoic chamber using a linear omnidirectional microphone.The microphone was positioned 200mm from the speaker's lips.The active speech level was controlled during recording with a meter conforming to Recommendation P.56 (ITU-T, 1993).At the beginning of each logatom set recording, 20 seconds 1000Hz calibration tone and 30 seconds rose noise are inserted at a level equal to the mean active speech level.
For each measure point (Mp) (the place where the measure position was situated) a list of 300 logatoms has been prepared.The same logatom lists were never played for any condition day after day.The logatom lists at the four listener locations were recorded on the digital tape recorder.These recordings were played back over headphones to the subject afterward.This way of subjective measurements realization provides the same listening conditions for both traditional and with choice methods.In each room for each position of listener (Mp) and for each signal-to-noise ratio (SNR) the logatom intelligibility was obtained by averaging out the group of listeners results.
The listening tests were done in the studio of Institute of Telecommunications, Teleinformatics and Acoustics of Wroclaw University of Technology (Fig. 9).Background noise level didn't exceed 40 dBA.Subjective measurements of logatom intelligibility were taken in conditions of binaural listening using headphones for the optimum speech signal level of 80 dBA.
Prior to the proper measurements the listening team was subjected to a 6 -hour training (two 3 -h sessions).The measurement sessions duration did not exceed 3h (together with 10 min breaks after every 20 min of listening).The subjective logatom intelligibility measurement results, obtained for different speech to noise ratio, are presented graphically in fig.10.The curves, representing the relationship between logatom intelligibility and the signal-to-noise ratio in a room were approximated by a four-degree polynomial calculated on the basis of the least-squarees method.The obtained relations are presented in fig.10. which also includes values of correlation coefficient R 2 -a measure of the conformity between the polynomial and the results obtained from the subjective tests.As one can see there is very good agreement between the theoretical curve and the empirical results; the value of correlation coefficient R 2 exceeds 0.99 in each case.For a few randomly selected measuring points the distribution of W n,k values was compared with the normal distribution.The agreement between the W n,k distribution and the normal distribution were tested by applying the Kolmogorov-Smirnov test.It has been found that at significance level  = 0.05 there are no grounds to reject the hypothesis about the goodness of fit of the distributions.Thus, it is reasonable to use average logatom hypothesis W L as an estimator of the logatom intelligibility for a given measuring point.
The main goal of presented research was assessment if there exists a relation between measurement results of traditional method and modified intelligibility test with forced choice (MIT-FC) in rooms, and if such a relation exists, its finding out.The obtained results revealed that there exists monotonic relation between results of traditional method and MIT-FC.The results were presented on the surface: logatom intelligibility MIT-FCtraditional, and approximated by a fourth order polynomial.The obtained relations are presented in fig.11. which also includes values of correlation coefficient R 2 -a measure of the conformity between the polynomial and the results obtained from the subjective tests.

Conclusion
The presented MIT-FC method offers a simple, easy to use, stable, and fully automatized speech system to assessment of speech quality in rooms.The results of the experiments have shown that the MIT-FC method is very useful in the evaluation of speech quality in rooms.The time needed to carry out the measurement with MIT-FC method is the same as in traditional one but we obtain the results right after finishing the measurement process.
The experiments carried out in finding the relations between logatom intelligibility measured with traditional and semi-automatic with forced choice methods for the rooms have shown that there exist the multi-value and repetitive relation between them.It allows using both methods interchangeably and converting results between them.
The obtained relations are applied in the Institute of Telecommunications, Teleinformatics and Acoustics of Wrocław University of Technology to the design of subjective tests for the verification of results yielded by a new objective method based on automatic speech recognition techniques.

Fig. 2 .
Fig. 2. Example of phonetic balance of two sets, three lists of 100 logatom each for the Polish language.

N
-number of listeners, K -number of test lists, W n,k -logatom intelligibility for n-th listener and k-th logatom list, k -number of correctly received logatoms from k-th logatom list by n-th listener, T k -number of logatoms in k-th logatom list.

Fig. 4 .
Fig. 4. The measuring system for the assessment of logatom intelligibility in room.

Fig. 5 .
Fig. 5.The example of the window of the Server program conditions were obtained (14 SNR*4Mp*2rooms).The speech and noise signal levels were controlled by means of a 2606 Bruel & Kjaer instrument by measuring them on a logarithmic scale according to correction curve A.

Fig. 11 .
Fig. 11.Relationship between logatom intelligibility measured with traditional and MIT-FC method for rooms.

Table 1 .
Speech intelligibility quality classes for analog channels in the traditional logatom intelligibility method.

Table 2 .
Classes of speech intelligibility quality for analog channels for the MIT-FC method.