The purpose of this chapter is to explore the issues of development of conversational dialog of robots for nursing, especially for long-term care, and to forecast humanoid nursing partner robots (HNRs) introduced into clinical practice. In order to satisfy the required performance of HNRs, it is important that anthropomorphic robots act with high-quality conversational dialogic functions. As for its hardware, by allowing independent range of action and degree of freedom, the burden of quality exerted in human-robot communication is reduced, thereby unburdening nurses and professional caregivers. Furthermore, it is critical to develop a friendlier type of robot by equipping it with non-verbal emotive expressions that older people can perceive. If these functions are conjoined, anthropomorphic intelligent robots will serve as possible instructors, particularly for rehabilitation and recreation activities of older people. In this way, more than ever before, the HNRs will play an active role in healthcare and in the welfare fields.
- humanoid nursing partner robots (HNRs)
- long-term care
The issue of healthcare demands for the increasing older adult population in Japan is a significant concern, and in other developed countries . This concern is further affected by the decreasing number of healthcare workers who are also getting older, resulting in high turnover rates of healthcare workers [2, 3, 4]. In this situation, it is appropriate to consider the use of healthcare robots, which is increasingly recognized as the potential solution to meet care demands of older persons as well as of patients with mental illness .
“What are the prominent areas of concern to support older persons when using healthcare robots?” and “What are the barriers to introducing Humanoid Nursing partner Robots (HNRs) to hospitals or elderly institutions?” Similarly, another question may be, “Which of these types of robots are needed for nursing care, the anthropomorphic or non-anthropomorphic robots?”
Different technological requirements can dictate whether anthropomorphic or non-anthropomorphic robots are needed. For example, if the technological demand is for measuring blood pressure and body temperatures, an anthropomorphic machine may not be necessary. Currently, non-robotic technologies are detecting and retrieving this information with digital hand-held devices, not necessarily robots . However, anthropomorphic robots may be necessary when a conversation is expected, particularly during a dialog with older persons while taking blood pressures and other vital signs, much like a human nurse does today. In addition, the following distinctive questions are asked, “What nursing care tasks can be programmed specifically for anthropomorphic nursing partner robots?” and “What are the core competencies that only nurses, and professional caregivers can do?” Reflecting on the aforementioned questions, it is essential to establish a field of Robot Nursing science, developed by nurse scientists from the perspective of a unique ontology of nursing, and designed from a foundation of robotics engineering, computer science, and nursing science. The expectation is to develop a knowledge base for robot nursing science as foundation for the practice of nursing that uniquely embraces the anthropomorphic robot realities, particularly in demanding precise conversational capabilities. These realizations were illuminated through the posited questions from which the answers may further the development of robot nursing science and its practice.
The aim of this chapter is to explore the issues concerning the development of dialog robots for nursing, especially for long-term care, and the prospects for introducing HNRs into nursing practice.
2. Development of robots that can compassionate conversations
2.1 Concerns regarding human-robot conversation capabilities
As an issue for humanoid robot verbalization, the robot voice should have an appropriate intonation, the speech speed, and the voice range that is easy for older persons to hear . If a cute-looking robot utters a low-pitched voice similar to that of an adult male, the user may find it creepy . Therefore, it is necessary to consider a humanoid robot with easy voices for older persons to hear and voices that have a sense of familiarity.
Challenges to developing robot-nursing science are realistic. These challenges highlight the necessity to promote research with the goal of systematizing technological competencies, ethical thinking, safety measures, and outcomes of using robots in nursing settings. With new devices and technologies developed by engineers, introduced and used in nursing care, robot nursing science can only develop within an ontology of nursing at its core. The growing reality of healthcare robot utility is perceived as nursing partners in practice. Human caring expressed as human-to-human relationships, and among nonhumans are the futuristic visioning of healthcare with humanoid robots as main protagonists.
2.2 Development of caring dialogical database of humanoid nursing partner robots and older persons
Full and effective use of robots by nurses and healthcare providers would lead to a better understanding of patients and their needs. Thus, it is necessary to develop a “Caring Dialog Database” for HNRs in order to enhance robot capabilities to know the patient/client, and to share the expressions of human-robot interactions in esthetic ways. Furthermore, it is important to develop a dialog pattern that allows humanoid robots to empathize with an older person . The ability to empathize and to communicate accurate empathy is likely to enhance the older person’s feeling cared for through HNR actions such as: 1) Listening attentively and accepting of older persons; 2) Knowing older persons intentionally; and 3) Establishing appropriate caring dialog.
2.3 Robotics and artificial intelligence
Robotics and Artificial Intelligence (AI) will become a predominant aspect of healthcare and in welfare settings. Human caring was based on a human-to-human relationship. However, in a nonhuman-to-human relationship in the case of HNRs, it is essential to consider what is required in the aspects of ethical concerns and human safety. Regarding redefinitions of nursing and its underlying beliefs, values, and assumptions, it is pertinent to understand the implications of AI and its role in HNR in healthcare. Thus, robotics, AI, with Natural Language Processing (NLP) will become a predominant aspect of healthcare and welfare settings, particularly among older persons .
For the emotional recognition and non-verbal output, the required functions include: 1) Recognizing users’ facial expressions; 2) Matching the expressions with emotion database information; 3) Selecting appropriate expressions from emotion database; and 4) Conveying emotional expression by particular motion, for example, using flashing light, moving upper limb and head, etc.
Furthermore, a robot’s recognition and verbal output for voices by other persons, e.g., older persons can include: 1) Subjects’ voice recognition; 2) Text conversion by NLP; 3) Matching with the NLP database for appropriate response; 4) Speech synthesis, and 5) Vocalization.
In conversations of robot with older person, it is expected that robot can provide accurate empathic response according to the situation. If an older person said, “I want to eat sushi!”, but the humanoid robot responds with, “I cannot eat because I am a robot”, this is unlikely to engage older persons with the robot because such response does not demonstrate the empathic understanding of the robot. However, if the humanoid robot responds like so: “I am a robot, but I would like to try eating sushi. Tell me, what does it taste like?”, this answer is likely to engage older persons because it relates a feeling of understanding and of empathy. If HNRs have this empathic response competency, older persons can attain well-being by understanding the content of dialogs and conversations with robots, such as Pepper robot (Figure 1).
3. Required conversation functions for humanoid nursing partner robots
This section discusses requirements for HNRs to allow a two-way conversation (dialog) with the user/other. HNRs should comprehend the content of the remarks, the intention of these remarks, including emotions, and others. From the information on speech, paralanguage, and appearance, such as facial expressions and gestures, HNRs can present listening postures to the user, using an appropriate ‘line of sight’ and nods to signify the appropriateness of the response to the user’s remarks (Figure 2). These functionalities might facilitate active speech engagements of the user with the HNRs. Furthermore, when HNRs return appropriate responses based on the user’s remarks and the contents they understand from non-verbal information, the user can feel that HNRs are listening and understanding dialog/conversation and may feel satisfied with the information or content of the interactive dialog.
As an example of an appropriate response by HNRs, a method such as repeating the keyword used by the user in the remark, or providing a topic related to the keyword can be considered. Further, the voice and movement of the HNRs during its response also affect the impression of the dialog (Figure 3).
For example, when a user speaks a sad topic or shows a sad expression, it is necessary that the HNR responds with a sentence and bodily behavior related to compassion, comfort, and encouragement. Moreover, it is important that the robot’s voice is conveyed with a tone of artificial compassion that matches the response sentence, and accurately delivers the humanoid robot’s intention to the user [13, 14]. The humanoid robot’s response matching the user’s emotions may contribute to an expression of artificial sympathy and thereby enhancing empathic expressions for the user.
In the current facial expression recognition technology, reading facial expressions more accurately by recognizing a human face in 3 dimensions (3D) is also being studied [15, 16]. Research is being conducted on robot capabilities assessing human emotions not only from the movements of the eyes and mouth, but also from the movements of human facial muscles and upper body . For the response ability of humanoid robots, it is at the stage wherein creation appropriations of response sentences are evidenced, including the examination of the response vocalization intervals without discomfort, and the implementations with robot verifications and improvement [18, 19, 20, 21].
The required performance of the application for interacting with older persons requires a function that allows HNRs to speak according to the user’s remarks, rather than the user conversing according to the robot’s remarks. Alternatively, it is necessary to devise a way to give the user the feeling of having a dialog by making the user feel as if the HNR understands the user’s remarks and intentions. As reception functions for the application, in addition to the technology to accurately read the content of remarks from the user’s voice, the technology to read the user’s situation from information other than voice such as facial expressions and gestures is required. Furthermore, as its transmission functions, a response sentence, vocalization and actions the expressive function matching the user’s remarks and emotions are required.
4. How to develop a robot that can express verbal and non-verbal expressions
For a robot to convey verbal and non-verbal expressions like those of a human, it is necessary to have a receptor that is equivalent to that of a human. These are the sensory receptors for sight [22, 23], smell [24, 25], and touch . Then, the information detected from these receptors is entered to the system. As such, it is necessary to perform machine learning based on the input information and prepare to output verbal and non-verbal expressions [27, 28]. Additionally, it is necessary for the robot to be able to perform the same movements and expressions as humans do in terms of its output [29, 30]. Depending on how similar the robot’s expression is to human behavior, however, it may lead to an uncanny valley  that human beings may find creepy at some points, thereby influencing the responses it can do.
4.1 Anthropomorphic form necessary in human-robot conversation
The anthropomorphic form is necessary when an HNR is expected to talk to older persons or take vital signs like human nurse do. The influence of physiognomies of HNRs is a greater determining factor apparent to the efficient response of human beings in human-robot transactive engagements. Instead of the “Uncanny Valley” captivating robot communication, it is the human-HNR interactive ‘fit’ or congruence that may be better appreciated by human persons when HNRs are appropriately described from its appearance or looks. It’s accurate and appropriate conversational capabilities further appropriate responses by HNR dependent largely on conversational communications that can easily be influenced by artificial affective communication .
In the case of a pleasant conversation with HNRs, human beings have a sense of affinity (Shinwa-kan) with HNRs, like appreciating their cuteness, and expressions of fun. However, human beings have also disappointed, especially when robots have poor conversational competencies. Human beings feel fear, misunderstanding, and confusion depending on the deviousness of the conversational language contents.
4.2 Roles and functions of humanoid nursing partner robots
Nevertheless, as companions in patient care, HNRs should assume multiple roles including being healthcare assistants to help with task completion. It is necessary for HNRs to possess abilities to express artificial emotions through linguistically appropriate and accurate communication processes, including nonverbal expressions with autonomous bodily movements. It is also critical that the appearance of HNRs would be more familiar, relatable, non-intimidating , does not cause human emotional unease and discomforts such as fear, anxiety, and suspiciousness, since human-like appearance of HNRs can lead to resistance .
One of the essential attributes of HNRs proposed is Artificial Affective Compassion (AAC) . With the AAC (Figure 4) accentuating the significance of language in human-robot interaction, not only will physiognomies of robot impact HNRs’ value, but also its capabilities to communicate with AI for NLP. Communicating with artificial affection instilled with phonology and appropriately applied by mimicking human interactions through human features, elements designed with social and cultural nuances in communicative situations in transactive engagements with human beings may be made more valuable and meaningful for human healthcare practice.
5. An example of conversation with older persons and Pepper
The issue of conversation with Pepper includes expressions such as robot gaze , eye blink synchrony , eye contact , and speech . This issue between older persons and Pepper with a conversation function in the application named “Kenkou-oukoku TALK for Pepper ” is vocalization with less intonation. This characteristic makes it difficult for older persons to understand whether Pepper’s sentence is an interrogative or a declarative sentence. Similarly, it was found difficult for older persons to understand the end of the sentence with Pepper’s talk. The pitch of Pepper’s voice is high and difficult to hear. Similarly, Pepper’s sensors may not be able to register the correct meaning of the sentence because of the older persons’ soft voice or use of a dialect. If the contents of the conversation cannot be recognized, Pepper may interrupt the conversation or suddenly change the topic, which may offend older persons. Therefore, many situations exist wherein the contents of the dialog do not match. In the current performance of Pepper, it changed the topic while the user was still thinking about the answer to Pepper’s question . In addition, the operational issue of Pepper is its line of sight. If its line of sight is deviated from the talking person using Pepper’s dialog program, Pepper will proceed with the conversation while recognizing others objects around it, thereby failing its line of sight. Figure 5 presents Robot’s line of sight.
The role of an intermediary to support the conversation between older persons and Pepper is important . In the current conversation with Pepper, the user must adapt oneself to the Pepper’s utterance. In this case, older persons are expected to listen to Pepper’s talk instead of doing all the talking. They must have cognitive responsiveness ability while talking with Pepper. Training to respond quickly and accurately to Pepper’s questions may be useful as a rehabilitation for cognitive function.
Furthermore, to use the current Pepper’s conversation application for cognitive rehabilitation of older persons, researchers propose a method in which older persons play a role of listeners. This role might be useful for training as they can concentrate on listening to the speaker’s utterances, understand the content of the conversation, and to convey their personal feelings to the other person. When the conversation with Pepper is over, if the intermediary will instruct older persons to recollect the conversation content, this process may lead to maintenance of memory and confirmation, and training of information processing functions for older people.
As a means of improving the Pepper robot application, it is desirable that there is no one-way conversation by Pepper when older persons adapt the Pepper’s utterances. Moreover, it is necessary to improve the conversation performance of the application so that older persons can enjoy talking with the robot for a long time. Thus, it is necessary to improve the following: (1) Timing of talk response; (2) Talk content must match the situation; (3) Appropriate reaction to the user’s speech; (4) Functions of having eye contact with the user properly; and (5) Functions of reacting to users with non-verbal expressions. It is considered to assure the accomplishment of mutual conversation by these functions.
In order to solve the problem of line of vision, it is necessary to enable robots to express verbal and non-verbal expressions at the same level as human beings. It is considered that robots are merely showing artificial verbal and non-verbal expressions through machine learning . Advanced intelligence is required when trying to express verbal and non-verbal expressions by incorporating artificial thinking, mind, and compassion . Therefore, it is necessary to give the computer an artificial self .
Demands for quality nursing care and household responsibilities may be successfully met because of anticipated automation and robotization of work activities through AI and other technological advancements . AI has become the latest “buzzword” in the industry today. To date, there is no AI machine able to ‘learn’ collective tacit knowledge. AI applies supervised learning and needs a great deal of data to do so. Humans learn in a ‘self-supervised way’. Humans observe the world and figure out how it works. Humans need fewer data because humans can understand facts and interpret those using metaphors. Humans can transfer their abilities from one brain path to another. Moreover, these are skills, which AI will need if it is to progress to human intelligence .
6. Dialog systems
The types of dialog systems can be classified as follows: task-oriented dialog systems and non-task-oriented dialog systems. A task-oriented dialog system  performs the dialog necessary to achieve the demands of the user. The non-task-oriented dialog system  aims to continue the dialog itself. In order to continue the dialog, it is necessary to be able to handle non-task-oriented dialog . The number of people who can converse at the same time is one robot to one person. However, in order to develop a high-performance humanoid robot in the future, it is desirable that one robot can have a dialog with three people.
Regarding the initiative of dialog, in the case of HNRs, the nurse and caregiver have the initiative. In the case of a dialog system, the distinction is made as to whether or not it has a physical body. For example, Siri does not have a body, but AI speakers and communication robots have a physical body. In this book, the HNRs having physicality are premised on the AI technology of mounted dialog processing. As a classification of learning methods, regarding modality, in the case of a voice dialog robot, learning performed from a plurality of pieces of information (multimodal)  is needed. For example, information on a dialog between a skilled nurse and a care recipient is recorded at the same time. Then, it is necessary to let AI learn the motions and biological data acquired by the moving images and sensors as multimodal information.
The following are the steps involved in the HNR’s generation of a response sentence containing emotions in response to the patient’s speech. These steps are illustrated in Figure 6.
When the patient speaks, the HNR recognizes it as speech and converts it to linguistic information.
The HNR acquires multimodal information (vocal tone, facial expression, and language) from the patient via various mounted sensors and uses machine-learning algorithms, such as neural network algorithms, to estimate the patient’s emotion .
Latent Dirichlet Allocation (LDA)  is applied to the patient’s speech to detect the topics in their speech. The number of topics must be determined in advance. Here, the number of topics is determined as approximately 1000, and appropriate speech topics are acquired by clustering a set of topics obtained from the LDA based on the similarities between topics.
Based on the speech topic thus obtained and semantic features of the speech content, search and detect sentences from the response database built in advance that agree with the speech topic and have similar content to the speech. Obtain the most appropriate sentence in response to the speech content as the result.
Based on the response sentence thus obtained and the emotional estimated in (2), artificial emotions are synthesized using an emotion expression dictionary. To synthesize emotions, find expressions from the emotion expression dictionary semantically similar to the expressions included in the response sentence and replace them with synonymous emotional expressions in accordance with the patient’s emotions. If the patient has expressed a feeling of “sadness”, then the expressions in the response sentence are replaced by expressions that suit the feeling of “sadness”.
Lastly, a sentence is generated. Here, the response sentence is synthesized with artificial emotions and is corrected if it contains unnatural elements or if it does not match the context of the preceding sentence. Moreover, add some sentences (that evoke a speech response) so as to not stop the conversation with the patient.
This system requires speech act type estimation, topic estimation, concept extraction, and frame processing. Few studies have experimented and applied interactive processing using machine learning at a practical level in the field such as long-term care. It is considered that the cause is the difficulty of responding to situations that cannot occur in normal conversation, acquiring utterances, and predicting emotions in special dialog situations such as in long-term care.
In order to incorporate the tacit knowledge of nursing/long-term care into AI as explicit knowledge, we will record multimodal dialog data at the nursing/long-term care site. A multimodal nursing/long-term care corpus is constructed by its label to the data by skilled human nurses. Based on this corpus, we will develop a machine learning model method for predicting care labels from multimodal information. It is important to evaluate and tune the model with the goal of creating labels with high prediction accuracy. It might be an important point for future development to adapt the current mainstream method to dialog in long-term care with the older person and perform dialog processing. The following problems are considered when the target has a natural dialog with the dialog system only by adapting the current mainstream method.
In a general dialog system, the next utterance cannot be made without the response of the other party, so the dialog may not continue depending on the situation.
The general knowledge/common sense obtained from dictionary data such as Wikipedia may differ from the knowledge/common sense of the care recipient.
Since there are cases where long-term care dialog is conducted based on tacit knowledge, there is a lot of information that cannot be seen from the collected corpus, and it is difficult for the machine-learning model to respond.
Depending on the care recipient’s situation, it may be difficult to predict emotions even with multimodal information.
We also believe it necessary to build a language model suitable for caregiving dialog with the older person for use with speech recognition, by collecting and analyzing a corpus of audio-visual caregiving dialogs. Thus, not only the language model but also the acoustic model must be tailored to the nursing practice and speech of older persons. In addition, by collecting and analyzing the audio-video corpus of long-term care dialog, it will be necessary to construct a language model suitable for elderly long-term care dialog for use in voice recognizers. It is thought that not only the language model but also the acoustic model that is suitable for the nursing care site and the voice of the older person will be required.
As HNRs learn to perform nursing functions, such as ambulation support, vital sign measurement, medication administration, and infectious disease protocols, the role of nurses in care delivery will change .
Hamstra  argued:
It is not obvious whether robotics can parallel these characteristics yet, as research on these topics is still ongoing. Also, it is important to remember that high-ability humanoid robots that can function much like a human being has not been developed in 2020. It is still in the developmental stage, and, today, it cannot be used as a functional work robot that is autonomous. Therefore, intermediaries such as healthcare providers play critical roles within the transactive relations between and among robot and older persons now.
Transactive is a term focusing on the transactional nature of things . As an active process, it illuminates the main feature of the relationship among human-to-human and human-to-intelligent machines, which is, always a transaction. The term illuminates the relationship between HNRs and human persons. This picture shows a transactive relationship among older persons, an occupational therapist as intermediary, and Pepper (Figure 7).
7. Prospects for the introduction of humanoid nursing partner robots in clinical nursing practice
Since AI and robots used for nursing and long-term care are diverse, it is important to clarify what AI is, what robots are used for in nursing and long-term care, how to use it, and how to apply it to nursing care? Therefore, conducting research related to this topic is important.
In addition, we will search for solutions in clinical settings used for nursing and clarify the required performance and required functions for AI. From the perspective of nursing and medical care, it is important to establish academic disciplines that explore and collaborate with AI and robot developers from the Faculty of Engineering. For example, consider how many and what kind of nursing work the healthcare robot can perform. Looking at the current situation, various robots that do not have human shapes have already “invaded” the medical world. There are robots for improving efficiency and accuracy, such as surgical robots and robots that support dispensing operations and assisting caregivers such as providing transfer and bathing support.
In the future, it can be imagined that a robot that scans the running of blood vessels and programs injection technology will perform injections. It may be possible to secure blood vessels for intravenous injection, which can be safer than nurses and doctors. However, what should we do to ensure safety, when this robot breaks down and out of control? Like humans, it is necessary to be able to judge whether to puncture anymore.
Will humans take on the role of monitoring robots in the relationship between robots and humans (nurses)? Will it add the role of monitoring robots to keep patients safe? Alternatively, now that computers have been introduced, will robot-dedicated engineers be stationed in hospitals just as computer engineers are stationed?
The purpose of this chapter is to explore the issues of development of conversational dialog of HNRs for nursing, especially for long-term care, and to forecast the robot introduction into clinical practice. The major issue is HNRs verbalization, which include inappropriate intonation, voice range, and the speech speed. These issues bring challenge to promote HNRs introduction into clinical practice. In order for robot to meet the demand and situations in nursing and healthcare, it is essential to improve conversation functions and performance of HNRs, such as the ability to express appropriate verbal and non-verbal expressions. For this challenge, collaboration between nursing researchers and AI and machine developers is recommended.
This work was partially supported by JSPS KAKENHI Grant Numbers JP17H01609. We would like to express our sincere appreciation to all the patients and participants who contributed to this article. With many thanks to Dr. Savina Schoenhofer for reviewing this chapter prior to its publication.
Conflict of interest
The authors have no conflicts of interest directly relevant to the content of this article.