Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet Quality of Service Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet Quality of Service

A real-time tele-auscultation over the Internet is effective medical services that increase the accessibility of healthcare services to remote areas. However, the quality of auscultation’s sounds transmitted over the Internet is the most critical issue, especially in real-time service. Packet loss and packet delay variations are the main factors. There is little knowl edge of these factors affecting auscultation’s sounds transmitted over the Internet. In this work, we investigate the effects of packet loss and packet delay variations, in particular, heart and lung sounds with auscultation’s sound over the Internet in real-time services. We have found that both sounds are more sensitive to packet delay variations than packet loss. Lung sounds are more sensitive than heart sounds due to their timing interpretation. Some different levels of packet loss can be tolerated, e.g., 10% for heart sounds and 2% for lung sounds. Packet delay variation boundary of 50 msec is recommended. In addition, we have developed the real-time tele-auscultation prototype that tries to minimize the packet delay variation. We have found that real-time waveform of auscultation’s visualization can help physician’s confident level for sound interpreting. Some techniques for quality of service improvement are suggested, e.g., noise reduction and user interface (UI).


Introduction
Quality of healthcare services in rural areas is a critical issue. Most developing countries are actively on improving the quality of healthcare services with the short and long term policy, increasing healthcare staffs and implementing new technologies are the two wildly example policies [1][2][3].
Generally, people who live in rural areas receive healthcare services at primary care unit as the first choice. However, a rural healthcare unit has some limitations such as infrastructure, healthcare staff and good/advanced medical equipment. Thus, referral of patients to the secondary or tertiary care unit is used for solving these problems, which lead to cost on traveling and waste of time. The effective care at the primary care units is one of the significant keys, which can improve a quality of healthcare services in rural areas. Telemedicine is one of the main keys, and telemedicine applications can enhance the accessibility of healthcare services through the collaborations between primary care unit and cooperation unit [4][5][6][7][8].
Interactive consulting over the Internet is a cost-effective way between physician and specialist. The real-time applications such as Skype [9][10][11][12], Google Hangout [13,14], FaceTime [15,16], and WebRTC [17][18][19] can make good experience to participants like a face-to-face communication. However, the real-time consulting between physician and specialist requires some specific information and equipment which depend on consultation's topic such as stethoscope for listen the body sounds on auscultation process. However, when data are transmitted over the Internet, loss and delay can be occurred, especially its sensitivity in a real-time application. Packet loss [20] and packet delay variations [21] are the two main factors that reduce sound quality in real-time communication, e.g., body sounds from the stethoscope.
In our work, we investigate the effects of using e-stethoscope caused by packet loss and delay in the real-time tele-auscultation system over the Internet.
Significance: There is a little knowledge on the effects caused by packet loss and delay to the real-time tele-auscultation system over the Internet. This work deeply investigates these effects on lung and heart sounds, to see the impacts and factors that influence the quality of tele-auscultation services. We discuss what effects will happen, their causes, and results by varying a number of packet loss, delay variations, and types of lung and heart sounds. It is significantly different from human conversation sounds. We have suggested the percentage of packet loss and packet delay variations to meet some confident level of sound interpreting which can increase the success rate and effective outcomes.
The rest of chapter is organized as follows: Section 2 describes an overview of tele-auscultation including the differences between traditional auscultation and tele-auscultation, system compositions, and types of services. Section 3 presents our prototype system design and development. Section 4 shows the analysis result of the effects of packet loss and packet delay variations of heart and lung sounds over the Internet service. Finally, we conclude of our work in Section 5.

Overview of tele-auscultation
Tele-auscultation is a system for providing a remote auscultation to another place. The main challenge for this service is how to find the suitable mechanism to transmit the auscultation's sound over the Internet effectively with an acceptable quality of auscultation's sounds and to find other related supporting systems.

Differences between traditional auscultation and tele-auscultation
Auscultation is the medical method for sound listening inside the patient body, on detecting and identifying the abnormal sounds [22,23]. A tele-auscultation provides the medical examination method similar to the traditional one, but the key of tele-auscultation that steps over the traditional auscultation is the mechanism for transmission the auscultation's sounds on long distance. Internet technology and electronic stethoscope are the keys for driven teleauscultation service. Moreover, experiences on using tele-auscultation may differ from the traditional one, it is not face-to-face and may experience some delay of body sound sent from the remote site. This awareness should be raised when using this service. Table 1 summarizes the differences between traditional stethoscope and e-stethoscope.

Compositions of tele-auscultation system
In our study, tele-auscultation can be summarized in three main compositions: stethoscope, client application, and application server. In terms of processes, they are: capturing, processing, transmission, and display. The system overview of tele-auscultation structure is shown in Figure 1.
In Figure 1, the first component is stethoscope. It is an instrument to capture the sounds in the body, and is used widely in auscultation process [24,25]. Currently, an e-stethoscope (electronic stethoscope) is a new generation of auscultation's device. It can improve the quality of auscultation sounds [26], as summarized in Table 1, and it is a source for transmitting the sounds from the body to the next component. The next component, client application, is software that captures the sounds sent by e-stethoscope. There are two techniques of sound capturing: the modern one is receiving directly from e-stethoscope via the transmission module embedded in the device [27][28][29][30], and the older one is receiving from stethoscope's ear tip via audio input device such as microphone [31]. Moreover, the client application is the main processing function for the auscultation's sounds such as improving sound's quality, sounds volume control, encoding, filtering, waveform rendering, transmitting/receiving sound between cooperated components, and sound play out. The last component, application server, is a center for managing the sessions, signaling, user accounts, forwarding the sounds (for client-server-service model). It should be noted that apart from client-server model, peer-to-peer service model may be considered. However, the server in this latter case may have less service requirements, e.g., for communication establishment, not for media streaming between client and server.

Types of tele-auscultation services
Synchronous and asynchronous (store-and-forward) are the two main communication types that describe the characteristic of each tele-auscultation service. Synchronous service is an interactive communication between participants [27][28][29][30][31], and auscultation' sound during the live session must be sent and played out immediately. While asynchronous service will store auscultation' sound in the middleware first, and then participants make a request and receive the data later [28,29]. It seems that today technology with high speed Internet, asynchronous service may have small time delay while sending stored auscultation sounds to the remote site. However, since synchronous service is live session, physician can request different auscultation's sounds (from different body positions) from a patient or healthcare staff immediately to improve healthcare service level. For example, physician may ask healthcare staff to move the stethoscope up/down/left/right from the current position in place, to follow up the result from the sound interpretation, to capture a better sound quality (unclear sound). This will make the service quality of real-time tele-stethoscope much better over asynchronous one. However, the service requires some certain level of QoS, e.g., high speed link capacity. In conclusion, both services have different significant impacts to tele-auscultation services, and it depends on the purpose of usage and what practice scenarios are for of healthcare services.

Communication models
Client/server and peer-to-peer are the two widely used communication models that describe the characteristic of systems for sharing the media between source unit and destination unit.

eHealth -Making Health Care Smarter
In peer-to-peer model, it has no central server for managing the media stream. Each node directly communicates to each other. This will minimize the processing time at the central server as well as communication link time delay. Conversely, client/server model has a central server for managing almost everything, e.g., all information must be passed to the server first. However, for time-based information sharing, the client/server model will be useful. For tele-auscultation services, both models are used: client/server based model [29,30], and peerto-peer based model [28].

Design and development of a real-time tele-auscultation
We design and develop a real-time tele-auscultation application covering both communication models: peer-to-peer and client/server based models. The model consists of two main components. First component is client application that includes stethoscope controller, session controller, real-time audio waveform, and audio player. Another component is application server that comprises of account management and session management. The server is used for user authentication and session initiation between two client applications. Some more details are as follows:

Electronic stethoscope
In prototype demonstration, 3 M™ Littmann ® Electronic Stethoscope Model 3200 [32] is utilized. The e-stethoscope provides a digital audio which in linear pulse-code modulation (LPCM) format, 4000 Hz of sampling rate and 16 bits per sampling.
As 4000 samples per second is not a standard voice sampling, we need to up-sample to 8000 times per second before putting in Real Time Transport Protocol (RTP) [34]. As a result, the sound quality is improved, but the bandwidth is increased to twice.

Auscultation's sound capturing
Stethoscope controller performs a connection to e-stethoscope by Bluetooth stack for capturing the auscultation's sound and handling some optional messages of e-stethoscope and other components. In our experiment, we captured the auscultation's sound every 50 msec, which created 400 bytes of audio data per packet. These audio packets will be transmitted to the destination node over the Internet for play-out at the destination.

Session and audio transmission
The handshaking mechanism is the important signal in peer-to-peer networks. Our prototype uses session initiation protocol (SIP) [33] and non-SIP (Web based signaling) for an initial protocol for real-time session establishing between participants. After completing the connection, the auscultation's sounds will be conveyed in RTP packets. There is no need for the central server participating in the media session.

Effects of packet loss and packet delay variations
In this section, study and analysis of auscultation's sounds quality affected by packet loss and packet delay variations in real-time communication over the Internet will be presented.
In our work, we focused and studied the characteristics of two body sounds: cardiac auscultation and lung auscultation. For the first one, cardiac auscultation, it is a method for screening heart sounds and heart murmurs [35]. In cardiac auscultation process, patient positions (left lateral recumbent, sitting and supine) and body locations (aortic area, pulmonic area, tricuspid area, and mitral area) can affect the quality of heart sounds. Particular positions and locations are important to the quality of listening to the specific heart sounds or heart murmurs [36][37][38]. Heart murmurs are the critical sounds related to the valvular heart disease. Intensity (grade), pitch, timing, and location have described the characteristic of each murmur [36][37][38].
For the second one, lung auscultation, it is a method for screening abnormal sounds over the lung's areas including front and back of the patient body [39]. Listening to the sounds of inspiration and expiration, together with comparing the intensity and pitch of each breath are the fundamental process for diagnosis the lung sounds [39,40].
As noted above, a real-time auscultation service over the Internet should ensure a feasible and reliable. Packet loss and packet delay variations are the critical factors that significantly damage the heart and lung sound quality.

The packet loss and packet delay variation generator
We developed the software that can generate the difference level of packet loss and packet delay variation patterns, to study and analyze the effects to heart and lung sound quality. The software components are described in Figure 2.
• Sender: It sends auscultation's sound with a packet size of 400 bytes, 50 msec packet interval time. The following original three heart sounds [41] and five lung sounds [42] are observed, as shown in Table 2.
The waveforms of each sound are shown in Figures 3 and 4.
• Controller: It is a component for generating the patterns of packet loss and packet delay variations. The loss patterns were generated based on Gilbert-Elliot model [43][44][45] with 2, 5, 10, and 20% of loss values. The packet delay variation patterns were generated based on Poisson distribution [46] with 50, 60, and 70 msec time delays.
• Receiver: It is a component for converting the received packets to the play out format (LPCM, 4000 Hz, 16 bits, and 1 Mono-channel), with a controlled jitter buffer.
In this experiment, we measured the packet loss and delay variations between different network services. For example, Ethernet network was the Intranet in our experiment site which has a big bandwidth, the 3G/4G network was a service provided by local mobile operators, and ADSL in the remote area was the Internet connection provided by a local service provider. We collected test information for a week at different time. Table 3 shows the average of packet loss and delays when information was sent across different network services. We noticed that the Intranet gave lowest packet loss and delays while ADSL in the remote area gave higher packet loss and larger delays. Heart sounds Lung sounds 1. Normal heart sound (75 bpm),

Distortions of heart and lung sounds
Signal distortion is the alteration in the pulse of heart sound and the breath on lung sound. In our experiment method, we summarize the number of distortion pulses and breaths in a minute duration. Sample pulses of heart sound and breath of lung sound are shown in Figures 5  and 6, respectively.

Packet loss
In packet loss experiment, the heart and lung sound with 2, 5, 10, and 20% of packet loss are used.
Of replication five times, the signal distortions of three heart sounds and five lung sounds by comparing the pulse and breathing of each sound with its original sound. The following results are given: Figure 7 shows the results of packet loss by varying from 2, 5, 10, and 20% for pan-systolic murmur. We can see that shape and position losses are randomly occurred from time to time where a higher value of packet loss gives more damage of shape and position. We tested all other given heart and lung sounds in Table 2. Figure 8 shows sound damage positions of pan-systolic murmur randomly from five times of experiment where 20% of packet loss is applied. We can see that shape and position losses are randomly occurred from time to time. This will make the receiver node in hard condition for the result interpretation.  Table 3. Packet loss and delays between different network connection services.

eHealth -Making Health Care Smarter
Of replication five times, with 2, 5, 10, and 20% of packet loss, the distortions are summarized in Table 4. The figures in the table are the percentage of distortions of each sound beats.
From Table 4, the increase of packet loss gets along with short-range distortions on heart sounds, but fluctuates and has long-range distortion of lung sounds. All of heart sounds and all of packet loss levels, the percent of distortions are less than 50%. On the other hand, for 2 and 5% of packet loss, the percent of distortions on lung sounds are less than 50%. However, when packet losses are 10 and 20%, the distortions go over 70%.

Packet delay variations
In packet delay variations experiment, the heart and lung sounds are investigated with three different levels of average packet delay variations; 50, 60, and 70 ms. Each delay was tested   for five times. Figure 9 shows sample results of replication five times, the distortions of three heart sounds/five lung sounds by comparing the pulse and breathing of each sound with its original sound are shown in Table 2.
Of replication five times, the distortions of three heart sounds/five lung sounds by comparing the pulse and breathing of each sound with its original sound are shown in Table 5.
From Table 5, the packet delay variations at 50 msec get a short-range distortion on both heart and lung sounds and high distortion on both heart and lung sounds at 60 to 70 ms of packet delay variations. We can see that increasing a small delay variation, e.g., from 50 to 60 msec, significantly impact the sound distortion.

Evaluation of sound quality by assessor: packet loss
Ten medical professionals (they are medical doctors and nurses) who experience on auscultation operation at least 2 years participated in this evaluation. All of them are binding tests. Each person listens to three heart sounds/five lung sounds at packet loss of 2, 5, 10, and 20% without knowing which sound he or she is listening. According to the listening results, they  Table 4. Percent of distortions among various heart and lung sounds on each level of packet loss.  Table 6. It seems that most of them can detect normal heart, pan-systolic murmur, early systolic murmur, and normal vesicular sounds when a small packet loss is applied, e.g., less than 10%. More than 90% can correctly detect all heart sounds at 2−20% of packet loss. Percentage of correct detection on lung sounds depends on the type of lung sounds and level of packet loss.

Analysis of the effects of packet loss and packet delay variations
From the experiment of packet loss and packet delay variations, we analyzed the results as follows:

Sound missing and splitting
Auscultation method requires continued sound on each cycle to recognize the rhythm, pitch, and intensity. Packet loss and packet delay variations destroy the sounds continuity in different patterns randomly. Packet loss makes missing in some positions of sound, as shown in Figure 10, and packet delay variation makes split in some positions of sound, as shown in Figure 11.
Sound missing consists of two patterns: (1) either pulse or breath missing, and (2) some parts of pulse or breath missing, as shown in Figure 12. A pulse or breath missing is caused by burst loss on transmission process which is narrow range damage to heart and lung sounds. For another pattern, some parts of pulses or breaths missing, it is caused by uncertain pattern loss on transmission process. This pattern is wide range damage to heart and lung sounds. The increase of packet loss level cannot specify the pattern of sound discontinuity; it can only increase the missing sound.

Pulse transformation
Transformation is the effect caused by packet loss. When the sound missing in some positions like murmur shape, it may lead to transformation to another type, and pulse transformation Figure 13. Samples of pulse transformations. (a) Early systolic murmur to heart normal (1) early systolic murmur (2) heart normal, (b) early systolic murmur to mid systolic murmur (1) early systolic murmur (2) mid-systolic murmur, (c) pan-systolic murmur to heart normal (1) pan-systolic murmur (2) heart normal, (d) pan-systolic murmur to late systolic murmur (1) pan-systolic murmur (2) late systolic murmur, (e) pan-systolic murmur to early systolic murmur (1) pansystolic murmur (2) early systolic murmur, (f) pan-systolic murmur to mid systolic murmur (1) pan-systolic murmur (2) mid systolic murmur. does not happen every pulse. The following result analysis of pulse transformation, as some examples, are as follows: Early systolic murmur sound transforms to normal heart sound, is a result of missing of the murmur shape but S1 and S2 are still remaining, as shown in Figure 13(a). Transformation of early systolic murmur to mid-systolic murmur is a result of missing of the beginning part of murmur shape, as shown in Figure 13(b). Transformation of pan-systolic murmur to normal heart sound is a result of missing a murmur shape, as shown in Figure 13(c). Transformation of pan-systolic murmur to late systolic murmur is a result of missing of the first half of murmur shape, as shown in Figure 13(d). Transformation of pan-systolic murmur to early systolic murmur is a result of missing of the second half of murmur shape, as shown in Figure 13(e). Transformation of pan-systolic murmur to mid systolic murmur is a result of missing of the beginning part and tail of murmur shape, as shown in Figure 13(f).

Design for improving the quality of service
Voice quality factors have been known for a long time, e.g., ITU guideline and standards [47], in tele-medicine applications, we do need to re-apply some techniques for this particular situation. The following design and implementation should be considered: • ITU provides PLC (Packet loss concealment) technique for digital voice communications. However, waveform substitution (one technique in PLC) may not be appropriate. Zero insertion or silent insertion (another one in PLC) is more appropriate, • Jitter buffer: jitter adaptation technique is deployed to reduce the effect of delay fluctuation due to the late or early arrival of voice packets (Figure 14). This will help the receiver-end hears sound in more comfortable level. However, due to real-time communication session condition, the buffer of delay absorption should be limited; we can have a small jitter buffer, e.g., few hundreds of milliseconds. In our experiment, 500 msec buffering (or 10 packets buffering) seems to be good enough. This will be traffic engineering choice to vary this figure.
• Noise removing: as mentioned, the device is operated remotely, and we noticed that moving of stethoscope creates a lot of noise. This creates an uncomfortable situation to the remote side. We have applied two stages noise filtering technique. The first stage looks at noise within a single voice packet (50 msec time interval), while the second stage evaluates the average noise energy for three consecutive packets. This will help a doctor in remote side working in convenience and comfortable way.
We have shown above that packet loss and delay will affect the quality of hearing, as a result, symptom determining may be hesitated. According to our prototype testing, most physicians at the remote side are happy with an e-stethoscope signal showing on the screen. This will help the interpretation confidence level after they are familiar with. We tested UI with packet loss and delay indicator, as shown in Figure 15, to raise awareness of a doctor during interacting operation. The levels of packet loss and delay can be easily noticed, for example, green color means there is no packet loss and delay (or very few, e.g., 1%), yellow color means there a few packet loss and delay (e.g., 5%), red color means there are high packet loss and delay (e.g., more than 10%). We have concluded that with this UI designed, it helps for awareness of doctor's confidential level. Moreover, packet loss and delay pontificating levels can be adjusted according to the doctor experience.

Conclusion
This work studies the effects of packet loss and packet delay variations to real-time tele-auscultation services over the Internet. We categorize communications models of tele-auscultation services to asynchronous and synchronous, client/server and peer-to-peer. Some important compositions of tele-auscultation are drawn out with prototype software demonstration. We then focus and study the characteristics of two body sounds: cardiac auscultation and lung auscultation when these sounds are transmitted on the Internet in real-time applications.  From our experiment results, based on medical professional staff verification, we have found that both sounds are more sensitive to packet delay variations than packet loss. Lung sound is more sensitive than heart sound due to its timing interpretation, to recognize the rhythm, pitch, and intensity. Some different levels of packet loss can be tolerated for both sounds, e.g., 10% for heart sounds, 2% for lung sounds. However, packet delay variation boundary of 50 msec is recommended. Based on our analysis, sound missing and split, and pulse transformations are the two factors that affect the sound quality. The pulse transformation result may lead to misinterpreting of abnormal sounds. We have also found that making distinct normal sounds is more accurate than abnormal sounds. In addition to our prototype software, we have concluded that real-time waveform of auscultation's visualization can help physician confident level for sound interpreting. Moreover, showing the ratio of packet loss and delay variations in clear icon will raise awareness and increase the success rate and effective outcomes.