The Advanced Voice Function Assessment Databases (AVFAD): Tools for Voice Clinicians and Speech Research The Advanced Voice Function Assessment Databases (AVFAD): Tools for Voice Clinicians and Speech Research

A new open access resource called Advanced Voice Function Assessment Databases (AVFAD) was developed, based on a sample of 709 individuals (346 clinically diagnosed with vocal pathology and 363 with no vocal alterations) recruited in Portugal. All clinical conditions were registered according to the Classification Manual of Voice Disorders-I. Participants were audio-recorded, producing the following vocal tasks: Sustaining vowels /a, i, u/; reading of six CAPE-V sentences; reading a phonetically balanced text; spontaneous speech. The AVFAD are comprised of 8648 uncompressed audio files and an addi- tional database file with 19 Praat Voice Report parameter values and 16 clinical data entries per participant. An annotated segment of the vowel /a/ for each participant was analysed automatically with a Praat script. Radial graphs were generated considering that all variables had an approximately normal distribution, and using previously calculated average and standard deviation values for all parameters. The normal and pathological f0 mean, Jitter ppq5, Shimmer apq11 and Harmonics-to-Noise-Ratio characteristics were compared. An additional analysis of the relation between the acoustic parameters and gender, age group, smoking habits, body mass index and voice usage, was considered. The AVFAD will allow future cooperative work and testing of non-invasive methods for voice pathology diagnosis. for the female gender, with much larger sample size, the results showed statis tically significant differences between normality and the six pathologies, in relation to Shimmer, Jitter, and HNR parameters. When the Nodules and Reinke’s Edema groups were compared to the normal group, significant differences in the f0 parameter were found as well. Accordingly, this study suggests that, in females, the studied vocal pathologies affect strongly the acoustic parameters Jitter, Shimmer, and HNR. The first two are increased, and HNR is decreased.


Introduction
The multidimensionality of voice requires the use of several types of evaluation and measures to allow the correct characterization of vocal quality [2]. The instrumental evaluation of voice [29] is considered as one of the most important elements for a correct vocal diagnosis and must precede intervention. It should include perceptive, acoustic, physiological, aerodynamic evaluation, and an auto-assessment of vocal quality.
Acoustic voice analysis [20,21] is an effective and noninvasive tool that can be used to confirm an initial diagnosis and provide an objective determination of the impairment [38]. It is also an important tool to an early detection and treatment of laryngeal tumors that can reduce both morbidity and mortality.
The collection of voice databases for testing and comparing the analysis methods is regarded as an important research area. However, despite the variety of models and methods developed by signal processing engineers, voice clinicians still express their disappointment with regard to the performance of the existing approaches for assessing voice quality.
Reference acoustic databases allow the standardization of acoustic analysis, benchmarking and comparing the performance of different voice analysis techniques. They also allow to differentiate normal voice and pathological voice, to evaluate and monitor it clinically, and to diminish the subjectivity that underlies the acoustic-perceptual analysis [22] by establishing a correlation between quantitative data. Results can be interpreted reliably, as long as they are collected by the same equipment, and the same data collection methods and recording techniques are used [43]. However, the reliability of acoustic analysis of the voice signal is still hindered by the "scarcity of sufficiently comprehensive databases" [38, p. 4].
A database of normal and pathological voices is a reference for the identification of clinically relevant perturbations in voice quality and data collection and analysis suitability developed for specific applications [26, p. 131].
The most widely used clinical graphical and numerical representation of normal and pathological voices [47] is the Multidimensional Voice Program (MDVP), and even when the acoustic analysis of voice is performed with freeware [30], reference values from the MDVP can be found in the manuals. However, these values should be used with great caution because they are based only in 15 normal voices [47, p. 227] and "may not be appropriate for various agesex subpopulations. At this time, the MDVP normative values should be regarded as preliminary and not as commonly recognized criteria by which abnormality is established. However, the concept of an integral database is important" [26, p. 135].
The University of Aveiro in Portugal collected, annotated and analyzed the Advanced Voice Function Assessment Databases (AVFAD), an open access resource that facilitates vocal evaluation, representing the first normative database for EP. Databases collected by clinicians enable the interpretation of automatically extracted descriptors of the speech signal and lead to the development of models for the interaction of these descriptors.
One of the purposes of this book chapter is to compare, acoustically, participants with normal voice to participants with voice pathology, regarding the parameters fundamental frequency (f0) mean (Hz), Jitter ppq5 (%), Shimmer apq11 (%) and Harmonics-to-Noise Ratio (HNR) in dB. An additional analysis of the relation between these two groups and participants' demographic data was considered, including gender, age group, smoking habits, body mass index (BMI), and voice usage. Generally, the main goal was to study EP speakers' acoustical characteristics and verify if it is possible to differentiate voice disorders through an acoustical analysis of voice.
The normative voice data presented in this book chapter is important to typify voice pathologies and when evaluating treatment success. For instance, it has long been known [32] that the voices of speakers with organic disorders of the larynx have higher Jitter and Shimmer and a lower HNR relative to the voices of normal speakers [8].
The AVFAD are distributed freely using a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, through the Advanced Communication and Swallowing Assessment (ACSA) platform at http://acsa.web.ua.pt/.

Method
The work reported in this book chapter is part of a larger ongoing project of the University of Aveiro in Portugal, which aims to build and validate a comprehensive set of resources for voice clinicians, including a standardized voice case history form [11,24], a voice evaluation protocol [1,25] and a reference voice database (AVFAD).
The sample used in this study includes 709 individuals, 346 of whom clinically diagnosed with vocal pathology and 363 with no vocal alterations, matched for gender and chronological age. Healthy controls were recruited at hospitals, from the University of Aveiro (UA) staff and students, and institutions with UA protocols. The recruitment process took place in the otorhinolaryngology departments of three hospitals that have a long-standing cooperation with the UA. Local clinicians discussed with the research team their medical diagnosis, and sociodemographic and anthropometric information were collected.
All clinical conditions were classified with the wording and numeric coding system proposed by Verdolini et al. [46]. The Classification Manual of Voice Disorders-I (CMVD-I) "lists most conditions that may negatively affect a patient's ability to produce voice, based on current understanding" [46, p. 2]. CMVD-I's Dimension 1 uses nine categories to classify these conditions [46, p. 4]. The 346 participants clinically diagnosed with vocal pathology are distributed as uniformly as possible distributed through these categories. Participants were recruited through a convenience sampling method, fulfilling a set of predefined inclusion criteria: aged 18 or older; Portuguese nationality; and EP as mother tongue.
Verdolini et al.'s [46] classification was derived from notes on diagnosis collected from the local hospitals voice clinicians, which were carefully analyzed by two independent speech and language therapists (SLTs), which reached a consensus after clarifying some participant's diagnosis with original clinical team.
Chronological age and gender matching participants with vocal pathology and healthy participants was implemented in 5-year clusters [15], that is, each participant with vocal pathology with a certain age and gender was matched to a control with the same gender and age within a 5-year range. For example, participants with vocal pathology aged 18-22 years were matched to controls within the same age range.
Informed consent was collected from all participants prior to any data collection, authorizing the use of recordings for the present study and also for other studies and by other researchers in the area of voice. The following participants' clinical and demographic data were then registered: smoking habits, age group, BMI, gender, and voice usage.
All participants were sitting in a comfortable chair, so they were as static as possible during the recording. A microphone was held on a tripod placed at a distance of 30 cm [44] from the participant's mouth (on-axis to the lips). Acoustic signals resulting from different voice assessment tasks were recorded via a Behringer ECM 8000 omnidirectional electret microphone connected to an audio interface (Presonus AudioBox USB; AudioBox Driver Version 1.57.0.5385; 16 bits and 48,000 Hz sampling frequency), using Praat version 5.3.56 [4].
Participants were recorded, producing the following vocal tasks: production of sustained vowels /a, i, u/-three repetitions each; reading of the six Portuguese CAPE-V sentences [19]three repetitions each; reading a phonetically balanced text [23] and spontaneous speech.
Raw recordings were segmented into eleven.wav files (one for each speech sample), using Audacity 2.0.5 and Praat 5.4.04. The /a/ vowel repetition considered to be closer to the speaker's natural voice and produced with a comfortable pitch and volume was selected for analysis, and an interval corresponding to one hundred consecutive cycles, two hundred milliseconds after phonation began, was annotated and analyzed automatically (through a script written specifically for this purpose) with Praat version 5.4.08.
Firstly, for all files and for a 75-ms section of Praat's editor window, the incorrect identification of the periods by the program (e.g., situations in which period-doubling or period-halving occurred) was monitored for each participant. Those whose samples did not allow the correct identification of the periods or did not present a segment in which it was possible to identify, in the same sequence, 100 cycles of oscillation of the vocal folds were dropped out of the final version of the database. It was possible to identify all the participants that had to be dropped out based on these criteria, since the following parameters were also extracted automatically (also available within Praat's Voice Report): number of pulses; number of periods; mean period; standard deviation of period; fraction of locally unvoiced frames; number of voice breaks; and degree of voice breaks. These parameters were not included in the final database.
The parameterization of the Praat scripting language function used to extract the data were voiceReport $ = Voice report… analysisStart analysisEnd 75 500 1.3 1.6 0.03 0.45, where each of the parameters evoked had the following correspondence with the designations used in the system Of Praat menus (View & Edit → Pitch → Pitch settings… and Advanced Pitch settings…): time range (s): analysisStart-analysisEnd (beginning and end of the / a / segment that was noted previously); pitch range (Hz): 75-500; maximum period factor: 1.3; maximum amplitude factor: 1.6; silence threshold: 0.03; and voicing threshold: 0.45.
The normal and pathological acoustic characteristics were compared using IBM SPSS Statistics 22, in order to explore differences between the parameters through the Mann-Whitney U test. In this book chapter, the following parameters are analyzed: f0 mean, Jitter ppq5, Shimmer apq11, and HNR.
An additional analysis of the relation between the acoustic parameters and gender, age group, smoking habits, BMI and voice usage was considered using the Kruskal-Wallis test. The participants ages were grouped for the purpose of this analysis and with the additional objective of analyzing voice changes across the life span [17], according to the following classification [45, p. 3]: young adulthood (18-45 years of age); middle adulthood (46-65 years of age); and older adulthood (older than 65). The BMI values were grouped into three categories according to WHOS's [48, p. 9] criteria: underweight (less than 18.5); normal range (18.5-24.99); overweight (25.00-29.99); and obese (greater or equal than 30.00).
Radial graphs were generated in Excel 2013, considering that all variables had an approximately normal distribution, and using previously calculated average and standard deviation values for all parameters. After the standardization of each variable, a grey circular area was drawn for each gender, corresponding, in each direction, to the average range of two standard deviations (that is, about 95% of normal distribution) of the healthy population. Applying that same standardization to each individual, a polygon in the radial graph was drawn, which allows the visualization of variables that are out of the expected range. The goal of radial graphs "is not only to determine if changes occur in the magnitude of certain parameters, but also to determine if there are configurational adjustments in a multi-dimensional profile" [26, p. 131] of voice.
Ethical approval was obtained from all authorities required by Portuguese bylaws for clinical research: national data protection committee; independent ethics committees. The AVFAD include 709 participants, from 18 to 93 years old, of whom 346 (49%) had a medical diagnosis of vocal pathology and 363 (51%) did not present any vocal pathology; 499 (70%) were females, and 210 (30%) were males, which are typical male/female ratios in Portuguese hospitals where the present study was conducted. Within the group diagnosed with vocal pathology, there are 26 different diagnoses based on Verdolini et al. [46] classification, including 249 (72%) female and 97 (28%) male participants. The control group was composed of 250 (69%) females and 113 (31%) males.

Results
The acoustic parameters f0, Jitter ppq5, Shimmer aqp11, and HNR were compared between the participants without vocal pathology and the group of participants with a diagnosis of vocal pathology. The analysis considered gender, since it is generally accepted that the difference in anatomic structures affect the parameter f0. Table 1 shows the results for males.
The results show that for all of the assessed parameters, there were statistically significant differences (p < α) between the two groups. Normal participants presented lower f0, Jitter ppq5 and Shimmer aqp11 values, and higher value HNR values, as expected. Results showed that there are statistically significant differences (p < α) between the groups in three parameters: Jitter ppq5, Shimmer apq11, and HNR. The fundamental frequency was unaffected by pathology in females.
Generally, the results showed that in both genders, there was a difference between normal participants' voices and pathological voices in most parameters that should be further considered and analyzed. For that purpose, multiple comparisons between normal participants and each of the six groups of pathology (nodules; polyp(s); cyst; Reinke's Edema; Reflux; Unilateral Vocal Fold Paralysis-UVFP) with the largest dimension were performed. The Bonferroni correction to control the chance of overall false-positive results leads to α = 0.05/6 = 0.0083. Table 3 presents the results of the comparison between normal and each type of pathological voices for male participants. Note that only the Reflux group has n ≥ 20.
The results show that the differences presented before, when pathological voices were analyzed as a single group, are just noticeable for two diagnosis, in the same two parameters. Males with Reinke's Edema or Reflux showed a statistically significant decrease in Shimmer apq11 and HNR, when compared to normal voices.  The results of the multiple comparisons show several statistically significant differences between normal and all the pathological groups. The parameters Jitter ppq5, Shimmer aqp11, and HNR are affected by pathology, that is, all the pathology groups presented statistically significant differences from the normal group. As far as the parameter f0 is concerned, results are not consistent across pathologies. The participants diagnosed with Nodules and Reinke's Edema presented statistically significant differences in comparison with the normal group, when all the other groups did not. In other words, based in this sample of female voices, Polyp(s), Cyst, Reflux, and UVFP seem to cause alterations in Jitter pqp5, Shimmer apq11, and HNR but not in f0, and Nodules and Reinke's Edema cause alterations in all the parameters. Based on these results, it is possible to conclude that males with a diagnosis of Reinke's Edema or Reflux presented a higher Shimmer aqp11 and lower HNR than the males without any vocal pathology. For females, there was no consistency in the behavior of f0, because it was only affected by the diagnosis of Nodules or Reinke's Edema. All the other parameters (Jitter ppq5, Shimmer apq11 and HNR) showed a consistent behavior: the group without vocal pathology showed, in comparison with all the other groups, lower values of Jitter ppq5 and Shimmer aqp11 and higher values of HNR.
The influence of age, BMI, smoking habits, and voice usage of participants on the acoustic characteristics of voice was also investigated. Table 5 shows the results for age, grouped by gender and diagnosis.
The results showed some significant differences between the groups (young adulthood; middle adulthood and older adulthood) for the variable age. These differences are noticed in f0, independently of the gender or diagnosis, also in Shimmer apq11 in all participants except males with normal voice and in HNR in the group of females with normal voice.    Table 3. Multiple comparisons between normal and pathological voices for the male gender.
Advances in Speech-language Pathology Table 6 shows the influence of smoking habits on the acoustic parameters, by gender and diagnosis.
The results show significant differences between groups (nonsmoker; former smoker; smoker) in f0 for females and males with normal voice and Shimmer apq11 for males with normal voice. Table 7 shows the influence of BMI on the acoustic parameters by gender and diagnosis.
The results show that BMI only had an influence on females with normal voice. All the others did not show statistical differences between groups (underweight; normal range; overweight; obese).   Table 8 shows the influence of voice usage on the acoustic parameters, by gender and diagnosis.
For the variable voice usage, the results showed that there are no differences between groups (singers and nonsingers), except for the parameter Jitter ppq5 in males with vocal pathology.    Nonparametric Kruskal-Wallis Test; *statistical significant differences for α = 0.05. Nonparametric Kruskal-Wallis Test; *statistical significant differences for α = 0.05.

Discussion
The number of patients with voice disorders has been increasing dramatically over the last decade, due mainly to unhealthy social habits and voice abuse. It has been reported that approximately 30% of the global population suffer from some kind of voice disorder during their lives.
Previous studies have shown that impairment of vocal function can have a major impact on the quality of life, severely limiting communication at work and affecting all social aspects of daily life.
With an increasing concern to improve the assessment of voice for as many European languages as possible, we collected the first complete and representative EP pathological voice database. This database will be of huge significance for voice clinicians' assessment and for testing and developing innovative, automated methods and devices for voice analysis.
Besides the great importance of the database for this project, it will also be of great significance for future studies in the field of voice function assessment.  The development of voice is influenced by individual characteristics, but there is no consensus among authors about which specific characteristics affect vocal quality. Gender, however, has been identified as having a significant impact [28,17], so the male and female samples of the study were analyzed individually.
The male f0 results showed a statistically significant difference between normal and pathological voices, but literature [40] indicates that both normal and pathological voices f0 values in our databases are within the normal range. Female f0 values were also within the normal range according to previous studies [16,40], but normal and pathological voices in the AVFAD did not present statistically significant differences. For both genders, the results support previously published scientific evidence [16,18,35], suggesting that the f0 parameter does not allow to distinguish between a normal and a pathological voice.
According to literature [8], the voices of speakers with organic disorders of the larynx have higher Jitter and Shimmer and a lower HNR relative to the voices of normal speakers. Thereby, the results of this study (presented in Tables 1 and 2) are supported by literature [32], since for both genders, normal participants had statistically significant lower values of Jitter and Shimmer. According to [14], Jitter and Shimmer could be used as an acoustic parameters to differentially diagnose vocal pathology.
For the HNR parameter, as expected, normal participants had statistically significant higher values than pathological participants. The results of this study were below the threshold (30 dB) established by Deliyski et al. [9], since the recording conditions (quiet room / audiology booth) used in their study were not reproducible in a hospital environment.
The results of multiple comparisons between normality and the six different vocal pathologies, for the male gender (as shown in Table 3), suggest statistically significant differences between the normal group and the Reinke's Edema group, and the normal group and the Reflux group, for Shimmer and HNR parameters. For the other groups, no statistically significant differences have been found. However, one should consider that the male pathological groups' sample size was too reduced to draw meaningful conclusions.
On the other hand, for the female gender, with much larger sample size, the results showed statistically significant differences between normality and the six pathologies, in relation to Shimmer, Jitter, and HNR parameters. When the Nodules and Reinke's Edema groups were compared to the normal group, significant differences in the f0 parameter were found as well. Accordingly, this study suggests that, in females, the studied vocal pathologies affect strongly the acoustic parameters Jitter, Shimmer, and HNR. The first two are increased, and HNR is decreased.
Concerning the four demographic characteristics in study (age group, smoking habits, BMI and voice usage), only the age group and smoking habits seem to interfere with voice quality. However, this was not the main goal of the present study and requires further investigation.
The radial graphs generated (with an Excel 2013 spreadsheet tool distributed with the AVFAD) from all the acoustic parameters automatically extracted from Praat were used in order to establish a threshold between normality and vocal pathology and are easy to interpret. It is believed that it might be a good tool helping voice clinicians to establish a diagnosis, to communicate results with other health professionals, clients and their families. It allows a quick visualization of the acoustic speech signal parameters and to verify if they are within the normal range. They are also a functional way of monitoring progress in therapy.

Conclusions
The empirical work developed in this project produced new insights into voice disorder assessment and provides clinicians with a critical resource for voice assessment. The voice pathology database provides a new and valuable tool for voice clinicians and for speech research. This database will enable the interpretation of automatically extracted descriptors of the speech signal. The evaluation of some specific parameters such as f0, Jitter, Shimmer and HNR has established differences of speech signal behaviors.
Acoustic databases aid professionals in the design of studies about the different pathologies, compare those pathologies with normal patterns and, furthermore, establish a threshold between normal and pathological voices. The AVFAD provide important tools for the study of voice, as they can objectively complement a differential diagnosis and help select an adequate intervention strategy.
The main conclusion drawn from this study was the existence of statistically significant differences between normal and pathological voice groups. Additionally, there are some evidences that suggest an association between age group, smoking habits, and the acoustic parameters, which should be further studied.
The radial graphs, drawn for each participant, allow a multidimensional acoustic analysis of patients and comparison with a normal (reference) range radial, giving clinician information on an individual's vocal quality, and allowing them to immediately detect changes in parameters.
One of the limitations of this study were the recording conditions. Although not optimal, they represent the real clinical setting where clinicians usually work and record their voice samples. For the male gender, some pathologies' sample size is limited and does not allow a reliable analysis of the results. We suggest that this sample may be increased in order to run a new statistical analysis and confirm the results.
In the future, considering the AVFAD database potentiality, the authors suggest their use in new voice studies, taking advantage of its complexity. For example, the other vocal tasks recorded could be analyzed. A detailed analysis of participants' clinical and demographic data is also recommended. These studies will represent a very important contribution to improve the knowledge about the acoustical characteristics of voice. Further work on performance improvement of assessment methods used by voice clinicians can be based on AVFAD samples and thus clinically applied.