This chapter focuses on the study of the relationship between reading of music and verbal texts and it seeks to define an ecological music reading task that allows comparison of musical and verbal domains. Participants were preservice music students who performed different music reading tasks correlated with a verbal text comprehension test. A Principal Component Analysis (PCA), was performed, explaining 91,5% of the variance. The following two axes were defined: one related to reading compression and the other to music performance variables. The relationship between the selected variables in the factorial plane, particularly the strong association between sight-reading and literal comprehension, suggest that sight-reading is a relevant factor with regards to the study of musical and verbal domains.
- music reading
- reading comprehension
- principal component analysis
Much of the current research emphasizes the importance of studying music and verbal language together, in order to reach more definitive conclusions regarding brain function and learning [1, 2]. In fact, verbal text reading literature provides a fairly complete theoretical framework that can be applied to reading sheet music. Therefore, we attempt to contribute to the growth of multidisciplinary and inclusive initiatives to the understanding of the mechanisms of acquisition of musical language in adulthood for school or university education.
The objective of this chapter is then to explore the relationship that exists between music and verbal language through written supports (the score and verbal text). We have proposed to explore different tasks in the musical field (e.g., sight-reading) in order to look for correlates with verbal comprehension tasks and the underlying cognitive processes.
1.1 Music reading and sight-reading
Music reading is characterized by a vertical component (simultaneous reading of the notes) and it also implies an expressive timing (e.g. changes in the speed of execution), as part of the character of the composition. The sequential units are formed in music by series of chords, which correspond to groups of notes that sound simultaneously or consecutively [3, 4]. Music reading is associated with the ability to develop an auditory representation that can be activated when a musical excerpt is presented . In fact, there are currently two types of hypotheses that underlie the main studies on intermodality. The first hypothesis corresponds to perceptual processes (cross-modal), and the second to conceptual processes (intermodal integration) . Less expert musicians would mainly use their ability to recode information, as shown by Drai-Zerbib and Baccino , while experts would be able to elaborate an amodal representation based on their previous knowledge. This amodal representation is a part of an action monitoring systems while reading music .
The overall success of musical performance mostly depends on the musician’s first contact with the score. This first encounter with the score is called sight-reading, and represents the ability to perform written music for the first time without previous practice. During sight-reading musicians must read with fluency and progressively (or even immediately) adjust their tempo as indicated by the composer. In addition to the speed of execution, it is important not to make reading errors. Sight-reading is considered a fundamental skill for western musicians  and it is a musical skill that can be developed with practice. In fact, Mishra  meta-analysis shows that sight-reading is tightly correlated to constructs, such as improvisational skills, ear-training ability, technical ability, and music knowledge. These specific practice-related skills are developed thanks to a strong action-perception association between the musical stimuli and the corresponding motor programming . There is evidence that specific psychomotor movement speed skills can be a dominant predictor when music complexity highly increase .
1.2 Evidence for comparing music and language domains
Despite the fact that comparative studies of music and text reading have been increasing progressively in recent years, particularly thanks to the incorporation of new techniques, such as eye-tracking or brain imaging, making comparisons between musical and verbal processing it is still challenging [12, 13, 14].
Studies of cortical stimulation have shown the existence of common brain areas for reading texts and scores . Through brain imaging studies other authors demonstrate the existence of specific areas for the processing of visuospatial information in reading of scores . More recently Bouhali et al. , point out the influence of domain specific expertise (word or music reading) for the specialization of the ventral occipitotemporal cortex and an enlarged overlap of functional activations in musicians for language and music.
It has also been suggested that in simultaneous tasks music and language share common resources in terms of syntactic integration processes  or that, on the other hand, these respond to the thesis of a modular brain organization [18, 19]. Moreover, in terms of transfer the relationship between music and language could be explained as a result of a more developed temporal processing in those individuals who have a greater musical training .
As far as the information processing is concerned, the perceptual elements of music (discrete) and verbal language respond to certain hierarchical principles (syntax), which allow the combination of structural elements in sequences [21, 22, 23, 24, 25]. These basic musical or linguistic events (notes, phonemes) are combined into more complex units (chords, words) and then form sentences (musical, linguistic) . From this perspective, Patel  proposes the syntactic integration resource hypothesis that postulates that music and language share a limited amount of resources that in turn activate syntactic representations separately. This implies that syntactic integration could be required when there is a concurrent process in the opposite domain . It would be needed more cognitive resources to connect the various components of the sentence (Dependency Locality Theory) . Consequently, in verbal text reading the subject-verb distances determine the resources involved and in the case of music reading, the tonal distances.
1.3 Comprehension in music and language
In verbal comprehension it is possible to distinguish the processes of decoding, literal comprehension, inferential comprehension and metacomprehension . The inferences intended as an integrated conception constitute procedures that scaffold the architecture of what is understood and also constitute the representations of this new information not made explicit in the text itself . According to Nadal et al. , language has certain elements that allow minimizing the cognitive costs associated with inferential computations. Indeed, part of these elements that guide the reading (discursive markers) contain “a set of syntactic-semantic instructions that determine both its position within a statement and the informative articulation of the elements under its scope at the sentence and textual level” (, p. 198). The same occurs with punctuation marks, since they delimit processing units, which minimizes the reader’s effort in comprehension . Finally, reading texts entails the simultaneous implementation of a series of activities where the objective would be the elaboration of a coherent mental representation of the meanings of the text .
The macrostructure theory [34, 35] proposes the existence of two levels of discourse representation: the microstructure and the macrostructure. The microstructure corresponds to the linear semantic structure of a text (threads: recognition of the words and the inferences or bridges necessary for the linking of the propositions, among others). The macrostructure, for its part, processes from the microstructure through a selective reduction of information (sub-processes: hierarchization of the ideas of the text, application of inferences based on previous knowledge. Macroprocessing activities allow the construction of the situation model (mental model constructed from the text). Unlike the microstructure, whose construction is relatively automatic, the construction of the macrostructure is rather conscious and aims at the establishment, codification and local coherence of the propositions. In synthesis, the levels of representation include the representation of the surface form of the text, afterwards a base text is generated (meaning of the sentences) and finally the situation model .
Following these evidences, Cara and Gómez  show that musicians can demonstrate similar abilities for the reading of verbal texts and scores (during silent reading of music and text). The authors analyze the information integration mechanisms in functional piano students by studying the duration of ocular fixations and the number of regressive ocular fixations. The authors demonstrate that there are different patterns of reading verbal texts and scores according to the inter- and intra-sentence levels in the processing of information and as regards the styles and types of text. Eye movements account for different strategies for processing verbal and musical information. Differences have been observed in the number, duration and type of trajectory of eye movements. All this indicates that, despite the similarity in the global understanding of music and texts, the underlying processes are different, which does not rule out the possibility that both domains share resources.
The present study aims to expand the results of the aforementioned study, this time, within the population of university students with at least 3 years of experience in the systematic practice of music reading. Previous studies show that university students have deficiencies in the selection and hierarchization of relevant information, and in capturing the author’s communicative intentionality and in building the situation model  of the texts . A preliminary analysis has been reported elsewhere .
With regard to music comprehension and considering the nexus between the comprehension processes and meaning, the bibliography consulted is mainly divided into two positions: the first considers that music only has a self-significance  and we could only have an approximation to musical comprehension from the musical form [41, 42]. The second position states that music and language respond to common cognitive abilities and therefore it would be possible to refer to musical comprehension . However, this implies considering a broader view of meaning [25, 43, 44].
The literature on process of music comprehension from a written support describes the existence of different levels of processing: notational processing, syntactic processing, analytical-structural processing and referential processing . At this last level, conceptual relationships are made between the syntactic or notational information of the score and other elements associated with the piece, such as certain extra-musical elements.
Given these antecedents, this project was necessary in order to explore if it is possible to link the cognitive processes involved in music reading and reading comprehension of verbal text. Given that there is no standardized test of comprehension of musical texts to our knowledge and that the scientific evidence that allows affirming the existence of a close relationship between the processes of verbal and musical comprehension is under construction, it has seemed pertinent to define an ecological task (in the musical field) that enables distinction of certain common domains of comparison. To our knowledge no previous studies have investigated the association between sight-reading and reading compression of written texts among preservice music teachers. From this perspective, and as consulted in the bibliography, we believe that sight-reading, could represent a good compromise between an ecological task and at the same time containing outputs of the musical domain. The theories of syntactic processing (Dependency Locality Theory) , and of tonal processing (Tonal Pitch Space Theory)  can be sited as a relevant framework in the comparison between musical and language processing at the syntactic level.
2.1 Participants and stimuli
Fifteen Music Pedagogy students from the Music Institute of the Pontificia Universidad Católica de Valparaíso participated in the experiment. The average age of the participating students was 23 years (
2.2.1 Reading comprehension task
A comprehension test that had been used previously in academic and technical-professional courses was applied . The text was of an argumentative nature (332 words, 6 paragraphs) written in participants’ native language, Spanish. Participants had to read the text, construct a coherent mental representation, and then answer eight questions. The test offered two indicators: Literal Comprehension (4 questions) and Inferential Comprehension (4 questions).
2.2.2 Articulation task
The test collected information on musical decision-making (detecting the stylistic features of a score). The participants observed a score containing only the pitch of the notes and the rhythmic figurations, into which they had to incorporate all the requested elements (i.e. add articulations and identify the meter and the style). Two extracts were presented, one from Mozart and the other from Schöenberg (see Figure 1). The test points awarded to participants were calculated by adding up the total number of correct features reported. To be considered valid, each articulation incorporated had to match the articulation present in the original score. These included dynamics, slurs, and tempi.
2.2.3 Sight-reading task
The sight reading task consisted of Beethoven’s Lied “Urians Reise um die Welt”. To calculate the test points, 3 types of errors were considered: omissions, substitutions and additions. Music flow (stops) and code changes (see below) were also included (see Figure 2).
It should be noted that at the PUCV Music Institute, music theory and reading classes are taught using the Kodály method. This method is based on a note-phoneme conversion system. The principle starts from the basic structures of the scales in which, regardless of the initial pitch, the tone and semitone relationships are maintained. Thus, for example, the C major scale shares the same number of notes, and the same relationships between them, as the D major scale. Therefore, a student who was taught the Kodály method would be asked to perform a D = C conversion, singing the note D on the staff as C, E as D, F# as E, G as F, and so on until the tonic is reached.
Code errors are emissions of an incorrect phoneme that does not correspond to the same degree of the corresponding scale. For example, if a score is sung in A major, the note C# (III degree with respect to A) is expected to be sung as E (III degree with respect to C). The code error appears when this note is sung as any other phoneme of the major scale.
2.2.4 Cross-modal task
This task was adapted from the material used by Drai-Zerbib and Baccino . Participants had to listen to a melody and then decide if the score presented subsequently corresponded to what was heard (note modification condition, original stave vs. modified stave). Thus, the points awarded in the cross-modal task represented the student’s ability to identify whether there was a difference between a short audio and its corresponding notational representation (the score).
In the first part of the experiment, participants took the reading comprehension test. The test consisted of reading a text (maximum 10 minutes of reading time) and then answering eight questions. The text was taken away from the participant once he was ready to answer the questions.
The articulation task was then applied, in which the participant had to add the articulations to a score that had been modified (slurs, accents and dynamics removed). The test was carried out on a sheet with the modified score in which the participant had to add the requested information using a counterbalanced order according to the two styles evaluated (classical vs. contemporary). Participants had 20 minutes to perform the test, 10 minutes for each score.
The cross-modal integration task was performed in an insulated room with a computer. Participants listened to the melodic sequences with Sennheiser HD 203 headphones. Immediately after listening to the audio, they viewed the score and had to decide whether what they had heard matched what they saw. If it matched, they had to press the left mouse button; if it did not match, they pressed the right button. If the participant decided that the audio matched the image, he immediately started a new audio (next question). If the participant decided that the audio did not match the image, he was diverted to a new screen in which 4 buttons appeared (each button corresponding to 1 of the 4 bars) in which he had to indicate in which measure he identified the error (or wrongly matched note), using the mouse. Once this was done, the participant continued with the next test item. This cycle was carried out similarly for the 48 musical stimuli.
The-sight reading task was administered individually. The experimenter showed the score to the participant, who had a maximum of 20 seconds to preview it before starting the solfege. The score was read from beginning to end, without repeating any passage. The audio recording of the test was administered using a Tascam DR-40 digital recorder. The errors, stops and code errors were subsequently counted by an advanced music student.
One student was eliminated from the sample because he exceeded the reading time of the verbal comprehension test and presented outliers in the music theory tests. The dependent variables correspond to the total score obtained in each test. Descriptive statistics are shown in Table 1.
|Sight-reading task (Beethoven)|
|Reading comprehension task (Ginseng)|
3.1 Principal components analysis (PCA)
In order to obtain more information about the relationship between the different studied variables a PCA was performed. Variables included explain 91.2% of the variance (see Figure 3).
A total of six active variables and two supplementary variables were selected. Inferential compression (Inf-C) and deletions (Del) are included as supplementary variables. In fact, these variables do not contribute to explaining a large percentage of the variance. However, we included inferential comprehension because we did not want to isolate both reading comprehension indicators. In case of deletions, the variable correlates significantly with literal comprehension, similarly to the other error types (additions and substitutions). For this reason, it seemed pertinent to include it as a supplementary variable. Other variables such as Schöenberg articulation task score, code error or the cross-modal task scores were not included in the ACP as these variables do not correlate with any other variable analyzed in the present study. The correlations matrix is presented below (see Table 2).
ACP results can be interpreted considering the presence of two axes, where one axis related to the music performance, particularly with sight-reading, and the second axis related to reading comprehension of verbal texts. The Mozart articulation task seems to be independent of both axes (Mozart), but associated with relevant aspect of performance (substitutions). Inferential comprehension, despite not cor-relating with the ensemble of variables, is further associated with the reading com-prehension axis, and similarly occurs with deletions and the music performance axis.
As can be seen in Table 2, there are significant correlations between literal comprehension and the musical tasks, particularly the sight-reading task, and music flow (i.e., stoppings). It is also important to highlight that the existence of correlations between literal comprehension and the different types of errors, suggests that the underlying cognitive processes could be common.
To explore the relationship between music and language processing, different music reading tasks and a reading comprehension test were applied to preservice music teachers with at least three years of music reading experience. A principal components analysis was performed in order to compare the different variables in the factorial plane. The interpretation of PCA results indicates the existence of two axes, one representing each domain, and particularly in the case of music, variables relate to sight-reading performances.
In fact, in the correlation matrix (see Table 2) we can observe significant correlations between errors, stoppings, articulations and the literal comprehension indicator. It is important to clarify that the correlation is negative since a lower number of errors (
We observed a strong association between stoppings and additions as well as between deletions (included as supplementary variable in the PCA) and literal comprehension.
The above is not easy to interpret, in fact, the studies that address the neuronal activity linked to the generation of inferences and its different indicators in a comparative way are relatively scarce [48, 49]. However, there is evidence that the superior temporal gyrus, as well as the inferior frontal gyrus, would be involved in generating inferences during the understanding of short stories presented in auditory modality . The same activation of the superior temporal gyrus has been observed in musical perception tasks, particularly in those where the use of analytical strategies is not necessary but rather a visual imaginary . The mentioned association could also be a type of far transfer, however it is difficult to confirm it, since we lack sufficient evidence. If this is the case, it is unclear why only the literal comprehension have a greater representativeness in the factorial plane. A possible explanation is related to the fact that in the practice, in reading comprehension evaluations, literal comprehension has a tendency to score better than inferential comprehension.
Concerning the correlation between the Mozart articulation task and substitutions, it is important to note that these variables come from different musical tasks (substitutions are related to the Beethoven sight-reading task). In any case, it would be possible to suggest that there is a stylistic relationship between both pieces, since in the case of Beethoven it is a work from his first compositional period, more related to the classical period. This could be reinforced if we consider that the task of articulation in the Shöenberg piece is not associated with the variables mentioned previously. Moreover, previous findings show that substitutions in sight-reading allow experts musicians to continue with the musical flow (sacrificing accuracy) when cognitive load increases . To achieve this process musicians must have some knowledge of the musical style [53, 54] and generate related musical expectations . It should also be noted that in the context of the present study, preservice music teaching curriculum, particularly the acquisition and learning of musical language (e.g., music reading) is oriented towards classic-romantic music. Contemporary music skills are developed mostly based on analytical and theoretical aspects. Therefore, they cannot mobilize that knowledge as they simply do not have it.
On the other hand, the fact that substitutions are further away in the factorial plane from the other two types of errors (additions and deletions) is probably due to the observed variability (see Table 1), which could ultimately explain, to some extent, the correlation with the Mozart task. Moreover, the independence in the factorial plane of the Mozart task from both axes, which suggest that the nature of the process involved is different as is the type of task. The articulation task involves a series of complex processes, were the stylistic musical background is essential, but not sufficient. In order to be able to retrieve all the missing features of the modified score, it is required to elaborate a complete mental representation of the musical piece. In fact, the artistic idea of a musical piece to some extent depends on the extraction of musical structure features . Future research may address this issue by investigating how musical expectations influences the performance of musical tasks of different cognitive hierarchy.
Another interesting result is related to the absence of associations between the cross-modal task and sight-reading. The consulted literature [6, 47], points out a link between expertise and the high-level knowledge structures in memory that support cross-modal integration capacities or cross-modal competences related to music reading tasks. It should be noted that music students, who participated in the present study, do not have such expertise (at least 12 years). They could not even reach the group of the less expert musicians (between 5 and 8 years of academic practice in music). Therefore, it is possible that the absence of association between the cross-modal task and sight-reading is due to the lack of musical experience of the musicians. The above could be affirmed by observing the descriptive statistics in Table 1.
Limitations of the present study are linked with the exploratory approach. Music reading tasks could be refined, the sample of musicians could be enlarged, and qualitative information could be included in order to better understand the link between their musical backgrounds and performance of the different tasks.
In summary, the results of the conducted study allow us to suggest that the sight-reading task is a pertinent task for making comparisons with tasks of the verbal domain, particularly with regard to literal comprehension and inferential comprehension. However, according to the musical practice approach, the sight-reading task specificity (vocal or instrumental), the skill level, as well as the professional task orientation, can generate variations in the music information processing mechanisms. This suggests, on the one hand, that the results obtained in the sight-reading task could not be generalized to any other sight reading-task. On the other hand, and according to the theory of shared resources between both domains  it is possible that this specificity is reflected in the same way in the verbal domain. Indeed, the correlations observed between the sight-reading task and the verbal comprehension task, suggest the existence of shared resources between both domains.
This research was made possible thanks to the financing of the Fund for the Promotion of National Music, No. 424305. Special thanks to Cristian Vargas, research assistant from the Music Institute at PUCV and Dušica Mitrović for her valuable edits of the manuscript.