VIGOR: A Versatile, Individualized and Generative ORchestrator to Motivate the Movement of the People with Limited Mobility

Physical inactivity is a major national concern, particularly among individuals with chronic conditions and/or disabilities. There is an urgent need to devise practical and innovative fitness methods, designed and grounded in physical, psychological and social considerations that will effectively promote physical fitness participation among individuals of all age groups with chronic health condition(s) and/or disabilities. This research is dedicated to achievingVersatile, Individualized, and Generative ORchestrator (VIGOR) to motivate the movement of the people with limited mobility. TaiChi is a traditional mind–body wellness and healing art, and its clinical benefits have been well documented. This work presents a Tai-Chi based VIGOR under development. Through the use ofHelping, Pushing and Coaching (HPC) functions by following Tai-Chi kinematics, the VIGOR system is designed to make engagement in physical activity an affordable, individually engaging, and enjoyable experience for individuals who live with mobility due to disease or injury. VIGOR consists of the following major modules: (1) seamless human-machine interaction based on the acquisition, transmission, and reconstruction of 4D data (XYZ plus somatosensory) using affordable I/O instruments such as Kinect, Sensor and Tactile actuator, and active-orthosis/exoskeleton; (2) processing and normalization of kinetic data; (3) Identification and grading of kinetics in real time; (4) adaptive virtual limb generation and its reconstruction on virtual reality (VR) or active-orthosis/exoskeleton; and (5) individualized physical activity choreography (i.e., creative movement design). Aiming at developing a deeplearning-enabled rehab and fitness modality through infusing the domain knowledge (physical therapy, medical anthropology, psychology, electrical engineering, biomechanics, and athletic aesthetics) into deep neural network, this work is transformative in that the technology can be applied to the broad research areas of intelligent systems, human-computer interaction, and cyber-physical human systems. The resulting VIGOR has significant potentials as both rehabilitative and fitness modalities and can be adapted to other movement modalities and chronic medical conditions (e.g., yoga and balance exercise; fibromyalgia, multiple sclerosis, Parkinson disease).


Motivation
Physical inactivity, particularly among aging adults and home-bound individuals with chronic conditions and/or disabilities, is a major national concern in the United States [1]. Regular physical activity, defined as 150 minutes of moderate physical activity per week [2], supports improved health and decreases the risk of obesity and chronic disease for people of all ages and abilities. Physical exercise also has important benefits for individuals with chronic health conditions such as arthritis [3]; depression [4,5]; stroke [6]; lower-limb disabilities [7]; fibromyalgia [8,9]; cardiopulmonary difficulties [10,11]; multiple sclerosis [12]; Parkinson's disease [13]; and vestibular disorder [14]. In addition to physical benefits, engagement in physical activity provides psychological benefits for these individuals [15,16]. Despite this evidence, less than half of all adults get the recommended amount of physical activity on a regular basis [17]. This issue becomes extremely serious during Coronavirus (COVID-19) pandemic [18]. The associated economic impact of physical inactivity is significant: annual health-care expenses are estimated at $860 billion for community-dwelling adults 50 years or older [2] with still additional workforce impacts [19]. These impacts are compounded by the fact that 80 percent of chronic conditions can be prevented or managed with regular physical activity [2]. Therefore, there is an urgent need to develop practical innovative exercise methods that engage individuals at all ages, including those with chronic health condition(s) and/or disability, increase regular physical activity levels, and translate to improved health with optimal functional ability and participation.
As noted above, typical physical activities may not always be feasible for individuals who suffer from disabilities or diseases, and may increase the risk of new and exacerbated chronic health conditions, compounded by advanced age. There is a critical need to tailor physical activity to an individual, based on their underlying capability, health risks, and movement goals. For example, different individuals may wish to strengthen different muscle groups, or have specific movement goals directed by a physical or occupational therapist.
In order to achieve those goals, we propose a Versatile, Individualized, and Generative ORchestrator (VIGOR) to motivate the movement of people (particularly those with limited mobility) [21]. To Help, Push, and Coach (HPC) users with various chronic health conditions to participate in restorative physical activities in the most effective way, the VIGOR system is designed to adapt to ensure an individualized experience that accounts for the personal, environmental, and social/ cultural characteristics of the user [22]. Figure 1 compares VIGOR with its competitors. The proposed VIGOR is unique in that it can provide a fully personalized user experience. Software products in the industry using virtual technology to encourage engagement in physical activity [23][24][25] include SaeboVR (www.saebo.com/saeb ovr), Nintendo Wii, and Verapy Therapy VAST (vast.rehab). Similar software products in Academia include OpenSim (opensim.stanford.edu) and QuaterNet (Facebook AI Research). Unlike those products, VIGOR integrates Tai-Chi, the traditional mind-body wellness and healing art [26,27], with a series of data-driven computing technologies that will provide customized restorative physical activities for individuals with a broad range of chronic conditions and functional abilities. Our premise is that a user-friendly movement HPC system that may be conveniently utilized in sitting or standing positions, will empower individuals to increase their regular physical activity levels, and thus, improve health, functional ability, and participation in activities of everyday life. In this way, VIGOR emerges as an innovative, individualized and generative fitness modality that demonstrates connection of data, systems, and people for potential clinical benefits [21,28].
In this research, we propose developing VIGOR within the context of Tai-Chi, a traditional mind-body wellness and healing art [26][27][28]. While our methods and framework can be applied to multiple exercise approaches, Tai-Chi is ideally suited to people with limited mobility, such as aging population and disabled people. Tai-Chi has documented benefits in improving balance as well as muscle strength, coordination, and endurance in multiple populations [26]. In addition, the lowimpact nature of Tai-Chi is ideal for elderly individuals or groups with neuromusculoskeletal impairments. This exercise has low risks for musculoskeletal injury and joint damage while providing the many benefits of exercise.
While Tai-Chi is proven to have many health benefits, the underlying biomechanics of different choreography tailored to individual patient capabilities are difficult to identify. Knowing the "right" strategy for an individual from a kinematic trajectory alone is difficult without understanding underlying physiology. Biomechanical models can be used to determine the kinetics resulting in a desired kinematic trajectory [29][30][31][32], and then to coach the patient to activate the correct muscles to work toward their movement goals. Joint kinetics are more directly mapped to underlying muscular strength and capability compared to joint kinematics [32]. Thus, the incorporation of underlying biomechanics is critical for personalization of training sessions and mobility targets.

Rationale for the VIGOR system to address aging and chronic disability
Tai-Chi is characterized by low impact, flowing, and circular movements [13,27]. The practice of these movements requires coordination and Motivation and rationale for the proposed VIGOR system: a comparison between existing systems and VIGOR (online video [20] synchronization of a calm yet alert mind and a relaxed body [15,16,22]. It has enormous potential for improving physical and psychological functionality for users in both clinical and non-clinical settings by allowing flowing movements that offer body and mind benefits to users [28,33,34]. Enabled by deep learning technology, the proposed Tai-Chi based VIGOR offers several unique advantages as an individualized, effective, sustainable, and restorative fitness modality for users with movement-based chronic health conditions. The integration of Tai-Chi with four-dimensional (4D: the sensory data includes X-Y-Z plus a somatosensory signal [35,36]) virtual-reality technology is both innovative and feasible in that: (1) Complex human movement can be deconstructed into primitive components/modes and deep learning methods [37] can be employed to accurately formulate the spatially and temporally dependent kinetic behavior as well as reconstruct incomplete joint movement or distorted movement caused by chronic health condition(s) [38]; (2) 4D kinetic behavior can be captured and reconstructed through modern sensors, actuators, and VR/AR technologies to generate seamless human-machine interaction; (3) Despite having significant storage and computation complexity, real-time kinetic analytics is applicable over a cutting-edge big-data engine and high-performance computing platform.

VIGOR's infrastructure
VIGOR aims to enable users an intelligent, four-dimensional (4D), partial control (e.g., virtual limb, which indicates that VIGOR can be driven by part of the inputs. In other words, VIGOR can tolerate and compensate for missing input when part of an input channel(s) is disabled), virtual-reality, and active-orthosis-enabled generative modality. Figure 2 shows the infrastructure of the VIGOR system. A deep-learning-based virtual coach, which is trained by Tai-Chi master's kinetic data, is the core module of VIGOR. By applying experience (obtained via deep learning) with other related knowledge such as biomechanics and medical pathology, VIGOR measures a user's movements, evaluates his/her performance in comparison to the Tai-Chi master, and offers real-time visual and tactile feedback to the user. Far more than an on-site real-time Tai-Chi instructor, VIGOR also adapts the master movements to accommodate a wide range of mobility restrictions and improvements over time. The kinetic data for the Tai-Chi master and users are captured by different sensors, such as Microsoft Kinect and somatosensory sensors [39]. The fusion, transmission, storage, retrieval, management, and analytics [40] of sensory data are computationally and storage intensive. In VIGOR, an edge-computing-enabled network is exploited to connect the user with the virtual coach server. An edge server is employed to store and process the large volume of sensory data in real-time [41]. Integrated with Tensorflow, a deep learning library, VIGOR measures and predicts kinetic behavior of VIGOR users.
The system also provides the user with a multi-fold and panoramic 4D experience that includes visual, somatosensory information and direct physical support. 3D reconstruction and visualization with Unity3D allows the user to place themselves in a variety of different simulated spaces with a personalized virtual Tai-Chi coach walking them through Tai-Chi motions in a 3D world, supported by a softactuator based wearable device.
VIGOR is developed following "5S criteria" as follows: (1) Substantiation (or personalization) -VIGOR can provide user with personalized service according to their health condition and clinical requirements; (2) Simplicity -even those who are untrained or uneducated users can freely use VIGOR; (3) Skimpiness -only commodity hardware and software are used in VIGOR so that majority of people can afford it; (4) Scalability -VIGOR can satisfy the requirement of increasing number of users; (5) Speed -real-time response is needed to satisfy the requirement of users.

Research objectives and function modules of VIGOR
The major objective of the VIGOR is to develop a state-of-art deep learning system to help, push, and coach the people, particularly those suffering from mobile disability, so that they can get engaged in physical activities.
• For the people who are not able to move due to aging, disability or health issues, a Helper is needed to support their movement, virtually or physically. This is a network completion problem, which infers missing vertices (dysfunctional joints) and edges (i.e., dysfunctional muscles/bones). Section 4 will talk about the solution to this problem.
• For the people who are reluctant to move,aPusher is needed to stimulate them through specific external audio/video/tactile stimulus (e.g., VR/AR, actuator). The reconstruction of physics stimulus will be addressed in Subsection 2.2.
• For the people who do not know how to move,aCoach is needed to recognize/ score their' motion and send them real-time feedback/instruction (Subsections 3.2 and 3.3); or design individualized and optimized exercise according to their health condition or medical requirement. These two problems are motion recognition (Section 3) and generation (Section 5), respectively.
As a matter of fact, machine learning approaches ignore the fundamental biomechanics law and clinical regulations for human motion and thus may result in illposed problems. Additionally, deeper and wider deep neural networks (DNNs) often require large sets of labeled data for effective training and suffer from extremely high computational complexity, preventing them from being deployed in real-time systems. As a result, there is a need to incorporate domain knowledge into DNNs [42,43]. As one of the major contributions of this project, domain knowledge will be infused into DNNs through data augmentation, customizing loss function, or embedding knowledge block into NN as an independent module (e.g., dynamicsguided discriminator in the motion choreography module).
Enabled by the deep neural network and multimodal human-machine-interaction techniques, the VIGOR system consists of the following function modules: • Real-time 4D human-machine interaction based on robust data acquisition, transmission, and re-construction methods (Section 2). It is challenging to integrate, represent and analyze heterogeneous 4D temporal data in proper data formats, which is applicable over various affordable hardware instruments. In this task, a proper uniform data format characterizing the human kinetics across heterogeneous hardware platforms is studied. In addition, to facilitate the interaction between user and VIGOR, two-way communications are investigated.
• Identification of a user's kinetic movement (Section 3). To help, push and coach (HPC) users (including people with mobile disability) in real time, VIGOR needs to identify a user's kinetic behavior and respond users with prompt instructions. Major research challenges are (1) the normalization of the kinetics of users (including the people with limited mobility), (2) the formulation Tai-Chi movement philosophy using neural network, and (3) the metrics about movement grading. The technical contributions of VIGOR include: (1) normalizing sensory data spatially, temporally, and kinetically, and removing occlusion using spherical interpolation and Kalman filtering algorithms; (2) deriving reference Tai-Chi kinetic patterns of using temporal neural network such as long-short term memory (LSTM); (3) grading users' kinetic behavior using entropy; and (4) enriching kinetic data using inverse dynamics theory.
• Adaptive virtual limb generation (Section 4). To motivate a user who has had a limb amputated to move, VIGOR provides the user with a pleasurable sensation experience that the limb is still there by generating a virtual limb. To this end, a major challenge is the difficulty in generating the adaptive motion of the virtual limb based on the observed kinetic behaviors of functional body parts. In this task, deep neural network regression is designed for real-time virtual limb generation and then time series prediction model [44] is used to improve the consistency of generated kinetics sequence. A hierarchical visible autoencoder is developed and evaluated for the adaptive virtual limb generation according to the kinetic behavior of functional body-parts, which are measured by heterogeneous kinetic sensors. The virtual limbs can be reconstructed on VR/AR platform and active orthoses [45].
• Creating individualized movement choreography (Section 5). A unique feature of VIGOR is its ability to create customized movement choreography for individual users based on their observed health conditions. One of the most challenging issues in deep learning enabled choreography is how to balance the training reliability and the creativity of neural network. In consideration of complex body action coordination in human motion, visible deep neural networks integrating biomechanics and DNNs are developed to generate Tai-Chi choreography. Specifically, knowledge-guided neural network architectures of LSTM, generative adversarial networks (GAN) [46], and their combinations with multiple data modality are designed to create customized movement choreography for individual users based on their health conditions and clinical rehabilitation requirements. New training methods based on the polynomialbased Hessian-free Newton-Raphson optimizer [47] is also created.
Each research objective along with the specific challenges and tasks will be described in more detail in Sections 2-5 individually.

Real-time 4D human-machine interaction
The challenge of Objective 1 is to provide real-time (prompt HPC feedback) and scalable (to support multiple-user) human-machine interaction environment based on affordable hardware instruments with heterogeneous modality. To address the challenge, real-time 4D data acquisition and two-way communication are investigated. Figure 3 shows the basic input and output equipment of VIGOR. A Microsoft Kinect and a foot pressure sensor are used as input equipment to acquire kinetic data (or 4D sensory data) of an VIGOR user. Virtual reality goggles, such as the Oculus Rift or HTC Vive, tactile actuators, and active orthoses are used as output equipment that work together to depict 4D feedback to the user.

Acquisition and processing of kinematic data
The Microsoft Kinect collects the kinematic data of the Tai-Chi master (for training purposes) and the user. Through Kinect, we can obtain joints' transient position x, y, z hi k t and corresponding Quaternion rotation [48] cos θ where θ is an angle around unit axis v ! , t is the time, and k is the joint identifier. Quaternions [48] are considered to represent the rotation of a rigid body in 3D space using four degrees-of-freedom (DOFs).
Quaternions are superior to many other traditional rotation formulation methods because they completely avoid gimbal-lock [49]. In VIGOR, Quaternions are used in 4D reconstruction over Unity3D platform and acquisition of kinetic signal. On the other hand, as a Quaternion is specified with reference to an arbitrary axis vector it is not a good choice in rotation recognition. In VIGOR, Euler angles α, β, γ hi , which represent the angles rotating around axis Z, X, Y respectively (denoted as yaw, pitch, roll hi in some literature) are adopted in gesture recognition.
tracking status (0: invisible; 1: referred; 2: observable), and potentially forces f k t and moments, etc. Tracking status indicates whether or not the joint is observable by the sensor. The forces and moments are derived by inverse dynamics analysis. Due to measurement error or unavoidable occlusion, a joint is not always observable or tractable by the kinetic sensor. Spherical linear intERPolation (SLERP) [50] and Kalman filtering techniques (be discussed in Section 3.1) are employed to compensate the missing data. As illustrated in our preliminary online video [20], SLERP can effectively address those short-term missed-tracking joints (namely tracking status = 0 or 1).

Acquisition of tactile data
Besides Kinect, other acquisition instruments such as accelerometers, orientation sensors, and strain gauges [39] are also considered for the VIGOR system. As indicated above, a foot pressure sensor is used to obtain the ground reaction force F t for inverse dynamic analysis. Furthermore, electromyography (EMG) [39] is selectively employed to evaluate and record the electrical activity produced by skeletal muscles. The EMG signal is characterized by a frequency range of several hertz to over 1 kHz and by amplitudes ranging from fractions of a microvolt to a few thousand microvolts. Electromyographic signals can be analyzed to detect activation level or to analyze the biomechanics of users' movement. To acquire highquality EMG signals from localized muscle region, identification of localized muscle region of users, noise reduction and grounding practices (to eliminate extraneous electrical noise), electrode site preparation and placement (to minimize the detection of irrelevant bioelectrical signals) and appropriate differential signal preamplification and preliminary signal conditioning (to further enhance signal-to-noise ratio) can be conducted. EMG signals can be classified to detect movements of limb. Our active/powered orthosis system, which enables users for movement, has EMG and Internal measurement Unit (IMU) sensors. Those sensors can monitor body movement and muscle activity and send the measurement data to the server.

Reconstruction of 4D data
4D kinetic feedback/instruction is reconstructed through virtual reality, tactile actuators, and motoring system that drives the active orthosis. (1) VR/AR facility, which can visualize the kinetics of human body in Quaternion format [48,49] (acceptable by Unity3D VR/AR SDK). (2) Tactile actuators, through which VIGOR can directly guide users with somatosensory feedback. Tactile actuators potentially used in VIGOR include Eccentric Rotating Mass (ERM), Linear Resonant Actuator (LRA), Piezo, and Electro-Active polymers (EAP) with high fidelity of sensations, and excellent durability. (3) Active orthosis [51], which enables users with direct physical support through functional electrical stimulation (F.E.S) [51] or robotic exoskeletons [45].

Real-time, two-way communication
Two-way communications are of key importance in the proposed system, since the information needs to be exchanged in a real-time manner. The challenges of the communication protocol for the proposed VIGOR include: (1) Real-time communication: Information in the VIGOR system needs to be conveyed in real time. If there is a significant delay in the communications, synchronization between the Tai-Chi master and user will be lost and the user will experience a disturbed rhythm. (2) High communication throughput: When there are many users, all the corresponding multimodal sensory data and feedback information need to be conveyed in the network, thus incurring a substantial requirement for communication bandwidth. (3) Two-way communications: The communications are between the virtual Tai-Chi master and users with mutual interactions. Therefore, it could be sub-optimal if one-way communications are considered separately. (4) Dynamics awareness: The communications may be optimized together with the physical dynamics of the virtual Tai-Chi master and users (namely the motions).
To address the above challenges, first, VIGOR can be modeled as a cyber physical system (CPS) [53,54] and then the bandwidth can be analyzed for controlling the physical dynamics. Last, the detailed communication protocol can be designed and evaluated with the whole system.

Deployment of VIGOR on affordable hardware using edge computing
Edge computing enables real-time knowledge generation and application to occur at the source of the data close to user device [55,56], which makes it particularly suitable for the proposed latency-sensitive system. An edge server can be adapted to serve multiple users through interaction with their devices. There are communication and computing trade-offs between the edge server and each user device. Data could either be locally processed at the user device or else be transmitted to and processed at the edge server. Different strategies introduce different communication costs, resulting in different delay performance. To provide the best quality of experience for users, the following tasks are involved: (1) Identification and modularization of computing tasks: the computing tasks of data preprocessing, kinetic movement recognition, and individualization of movement choreography need to be identified and the corresponding computing overheads (CPU cycles, memory) need to be determined.
(2) Design, prototyping and enhancement of offloading schemes: Based on the results of bandwidth and delay analysis as well as delay performance requirement, computation offloading schemes need to be developed to determine which computing tasks should be performed locally at the user device and which computation tasks should be offloaded to the edge server. As shown in Figure 4, an illustrative concept demonstration about edge-computing-enabled VIGOR is given in our online video [52]. Edge-computing-enabled VIGOR deployed on commodity hardware (demo in online video [52]

Identification and scoring of user's kinetic movement
To help, push and coach (HPC) users with movement disabilities in real time, VIGOR is featured with: (1) an enriched dataset by introducing kinetic data (specified by time series [57][58][59]), which is derived from the measured kinematic data, into the neural network; (2) compensating with any missing kinetic data introduced by users' disability. Identification of a user's kinetic behavior during movement mainly involves the following research tasks:

Preprocessing pipeline for kinetic movement identification
Data preprocessing operations play an indispensable role in VIGOR because: • Input data is of a heterogeneous nature. For example, different users have variable sizes; sensors may have various viewing angles; users may not always be located in a deterministic position; and the two time-series data sets may not be synchronized. As a result, scaling, rotating, translating, and dynamic time warping (DTW) are needed to normalize the original input data.
• The input data set may be incomplete. For example, occlusion inevitably leads to missing data; Musculoskeletal forces and moments exerted over the joints from muscles cannot be directly obtained from the sensors [60]; some input channels are not enabled (e.g., partial control) for users with mobility-based chronic conditions (i.e., partial control). In the implementation of VIGOR, Kalman filtering, inverse dynamics, and time-series prediction are employed to handle the incomplete data [35,61].
• The measurement-induced noise is significant. x, y, z hi denotes a joint's position; f t indicates a joint's applied force, which is derived from inverse dynamics; α, β, γ hi indicates a joint's rotation under normalized Joint Coordinate System (JCS) -Euler angle. Its main implementation techniques include data fusion, inverse dynamics analysis, spatial normalization, Kalman filtering, and reconstruction of disable input channels. The kinetic data is stored in JSON format.

Formulating musculoskeletal kinetic features
Inverse dynamics analysis (IDA), which is derived from Newton-Euler Equations [60,[64][65][66][67][68], aims to calculate unknown kinetic information (the net joint forces and moments) from measured kinematic information (e.g., position, velocities and accelerations of joints) and measured kinetic information (e.g., ground reaction forces). As illustrated in Figure 5, given joint locations hi where i denotes the identity of a joint, and ground-reaction force F, the joint force f i and other musculoskeletal kinetic features can be computed via IDA.
As illustrated in Figure 5, VIGOR employs inverse dynamics to compute internal joint forces and moments with given ground reaction forces. Inverse dynamics is implemented by dividing the human body into multiple connected rigid bodies [69,70], which correspond to relevant anatomical segments such as the thigh, calf, foot, arm, etc. The model's anthropometric properties (e.g., the mass and moment of inertia) are derived from statistical analysis. In addition, it is assumed that each joint is rotationally frictionless. The proposed methods in Figure 5 can be customized to investigate the biomechanical response of human motion by considering different health issues such as cerebral palsy, poliomyelitis, spinal cord injury, and muscular dystrophy [67].

Spatial normalization
As addressed in Section 2, we can acquire joint positions and rotations, which are hi according to joint rotation [49], and polishing the kinetic curve using a Savitzky-Golay filter. Our preliminary experimental results demonstrate that the normalization techniques addressed above can greatly improve the quality of data (less noise and smoother kinetic performance) so as to achieve higher recognition [36,41,61].

Recovering occlusion-induced missing data
During sensory data acquisition, unavoidable occlusion may introduce missing data or lost-tracking. VIGOR employs spherical linear interpolation (SLERP) to fix the issues caused by short-term occlusion [35], and employs Kalman filter [72][73][74] to fix the missing information (including both position and rotation) caused by long-term occlusion. A preliminary comparison between the raw and preprocessed physical rehabilitation kinematic data is available on our online video [62,63].

Normalization of the kinetics of users with limited mobility
In order to recognize the kinetic movement of users with disabilities, VIGOR normalizes their kinetic data by compensating the missing data incurred by disabled input channels: in the event that several input channels are disabled, the VIGOR model is able to construct the void input channels by taking the advantage of correlation among all inputs. Compensation can normalize the input data so that VIGOR can achieve higher recognition rate, and its psychological and physiological benefits to users are also under our investigation. Figure 6 demonstrates the application of deep neural network [37,75] on compensating the missing channels introduced by limited mobility. As our preliminary contribution, multilayer perceptron (MLP), temporal convolutional neural network (tCNN) [46], and autoencoder methods are employed to construct disabled legs and the resulted recognition accuracy is improved [21].

Entropy-oriented scoring of human motion
The proposed research employs entropy [76] to grade a user's movement behavior, which is defined as a times series of joint kinetic features such as positions and rotations. The distance/dissimilarity between two time series can be measured in time-domain or frequency-domain [58,59]. In time-domain, Approximate Entropy (AppEn) and Sample Entropy (SampEn) [76,77] can be employed to formulate the regularity and predictability about the normalized Euclidean distance between the time-series of users' and reference data.
As our preliminary work, Figure 7 compares the entropy values of an advanced Tai-Chi user and a beginner. The whole Tai-Chi set is divided into multiple subsequence (or clip), which consists of 25 to 100 frames, and the comparison is made clip-by-clip. In Figure 7, each subsequence consists of 25 frames. It is observed that an advanced Tai-Chi user has smaller entropy than a beginner. Besides the overall entropy of a user, VIGOR also provides the entropy of each joint so that the virtual Tai-Chi coach can provide accurate instruction to users.
Entropy or cross-entropy analysis can be performed for the time-series in the frequency domain which is derived from discrete Fourier transformation (DFT) or discrete wavelet transformation (DWT) [58,59]. A hybrid metric that combines both time-domain and frequency-domain information may be considered as well.
In this work, the recognition accuracy of the aforementioned classifiers with respect to three benchmark datasets was determined: Dataset I: UTD Multimodal Human Action Dataset (UTD MHAD [85]), Dataset II: UTKinect-Action3D [86], and  [20,41] (an in-house Kinect skeletal dataset collected for Tai-Chi training). The experimental results showed that SVM and LSTM-RNN surpasses the other classifiers; particularly, LSTM-RNN has a superior recognition accuracy in case of limited number of training data (e.g., 200 training samples). However, LSTM-RNN suffers from unsatisfactory time performance [35]. Scalable algorithms for temporal neural network such as LSTM-RNN and temporal convolutional network (tCNN) need to be developed [46].

Dataset III: Tai-Chi Yang-Style 24 movement
In this work, a musculoskeletal biomechanics guided loss function is used to formulate the objective of kinetics classifier: where y is the pre-determined movement identity; f (X, θ) is predicting movement identity of kinetics sequence X ¼h x k t , y k t , z k t , fÞ k t i ÀÉ t m t¼t 0 n (as defined in Figure 5, t is time step ranging from t 0 through t m , k is joint's identity); θ ∈ ℜ n indicates the parameters (weight and bias) of neural network; R θ ðÞ: ℜ n ! ℜ is the regularizer, whose importance is controlled by regularization strength ϱ ∈ ℜ; and L θ ðÞ: ℜ n ! ℜ is actually regularized loss. The corresponding optimization method is called batch optimizer.

Reconstruction of 4D instruction/feedback for users
VIGOR can be also regarded as a real-time coaching system to help users improve their physical rehabilitation movement for optimal clinical effect. According to the measure and recognition result discussed above, VTCS generates real-time 4D instructions or guidance to users over virtual reality or augmented reality (AR) platform, as shown in the online video [20,87,88] addressed in our preliminary work.

Adaptive virtual limb generation
To relieve the physical and psychological suffering of people with limited mobility, VIGOR develops an adaptive (versatile to various types of disability) and  full-body-driven virtual limb generation system (all measurable body-parts will be used to formulate virtual limbs). The related technical contributions include: (1) According to specified kinetic script (e.g., dancing, running, etc.) and users' physical conditions, a hierarchical network is extracted from human musculoskeletal network, which is fabricated by multiple body components (e.g., muscles, bones, and joints, etc.) that are biomechanically, functionally, or neurally correlated with each other and exhibit mostly non-divergent kinetic behaviors. (2) The generated limb can be reconstructed over the VR/AR system, tactile actuator system, and motoring system.

Pipeline of adaptive virtual limb generation
The proposed work employs deep learning techniques such as autoencoder to generate virtual limbs [89] according to the observed kinetic behaviors of other body parts based on the following hypothesis: (1) The human body consists of multiple components such as muscles, bones, and joints, which are correlated with each other mechanically, neurally, and/or functionally. (2) Deep learning techniques such as autoencoder can be used to capture the kinetic pattern of human movement. Figure 8(a) shows the flowchart of the adaptive virtual limb generation, which consists of the following critical aspects: (1) Formulating human musculoskeletal network [91] according to the functional, mechanical and neural correlation between each body component (muscle, joint, or bone). (2) Deriving hierarchical network (in the configuration of forest data structure) from the human musculoskeletal network according to the physical status of users, where the virtual limbs will form the leaves of a hierarchical tree. (3) Building visible autoencoder neural network according to the hierarchical network so that the kinetic behavior can be constructed according to the kinetic behavior of user's functional body parts measured by heterogeneous sensors. (4) Training the addressed visible autoencoder neural network according to specific human movement script such as walking, jogging, dancing, or any other physical activity. (5) Representing kinematic behavior about virtual limbs using VR/AR, tactile actuators, and active orthoses, which can directly stimulate users. Figure 8(b) shows the screenshot of virtual limb generation.

Multivariate time series-based kinetics generation of Virtual Limbs
Adaptive and full-body-driven virtual limb generation can (1) engage various individuals with limited mobility in regular physical activities, (2) accelerate the rehabilitation of patients, and (3) release users' phantom limb pain. Virtual limb generation is a generative time series problem. Figure 9 shows the pipeline of kinetics generation (a multivariate time-series) and correction of kinetic sequence of the virtual limbs.
À denotes the measured kinetic sequence of functional body parts. As defined in Figure 5, t is time step ranging from t 0 through t m , k is the identity of joints that are related to function body parts.
t¼t 0 denotes the generated kinetic sequence of virtual limbs, t is time step ranging from t 0 through t m , j is the identity of joints that are related to virtual limbs.
As illustrated in Figure 6, we can generate the the kinetics of the wheel-chaired Tai-Chi practitioner according to the movement of his/her arms, which are functional and healthy. This work employs deep neural network to generate Y virtual t using Y measured t : is the output of deep neural network.

Loss function for the Generation of virtual limbs' kinetics
In this work, a musculoskeletal biomechanics guided loss function is used to formulate the objective of generated virtual limbs' kinetics: In Eq. (3), (Y virtual , Y) indicates labelled training data; Y virtual is the expected kinetic of virtual limbs; θ ∈ ℜ n indicates the parameters (weight and bias) of neural network; R θ ðÞ: ℜ n ! ℜ is the regularizer, whose importance is controlled by regularization strength ϱ ∈ ℜ; L biomechanics Y virtual ÀÁ denotes the bio-mechanics violation of generated kinetics with weigh γ ∈ ℜ and this work uses kinetic imbalance of human body to measure L biomechanics ; and L θ ðÞ: ℜ n ! ℜ is actually regularized loss.

Correction of generated kinetics using time-series prediction model
The kinetic sequence of virtual limbs does not behave smoothly. This work corrects Y virtual t using Auto-Regressive Integrated Moving Average (ARIMA) [44] time-series prediction model. ARIMA model is fitted to time series data for pattern recognition and forecasting. The AR part of ARIMA indicates that the evolving variable of interest is regressed on its prior (or historical) values. The MA part indicates that the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. The I (for "integrated") indicates that the data values have been replaced with the difference between their values and the previous values. ARIMA is defined as: where Y virtual t is the differenced series (it may have been differenced more than once). The "predictors" on the right hand side include both lagged values of Y virtual t and lagged errors. Eq. (4) is also called ARIMA(p, d, q) model, where p is the order of the autoregressive part; d is the degree of first differencing involved; q is the order of the moving average part.
Any time series may be split into the following components: base Level, trend, seasonality and error. The coefficient of the ARIMA model is determined through autocorrelation [44] and the correlation of the series with its previous values.

Formulating the kinetics of virtual limbs using the measured kinetics of functional body parts
As described in Eqs. (2) and (4), the generation of virtual limb kinetics consists of two steps: (1) create preliminary kinetics of virtual limbs according to the measured kinetics of functional body parts; and (2) correct the preliminary kinetics using time series prediction models such as ARIMA. This subsection will focus on Step (1) because it faces more technical challenges.

Configuration of network architecture according human anatomy
It is known that any system can be regarded as a hierarchical structure (i.e., system ! subsystem ! sub-subsystem, ...). As illustrated in Figure 10(a), the human body system can be always divided into sub-components that are mechanically correlated. Inspired by the Bayesian network, we propose a visible and hierarchical neural network (VHNN), which is derived from human anatomy, to accurately formulate a system. As illustrated in Figure 10(b), a sample visible and hierarchical neural network, which is directly derived from the human body system, is employed to specify the musculoskeletal kinematics. The VHNN can be employed in virtual limb generation, 4D kinetic behavior recognition, and individualized Tai-Chi choreography (to be discussed in the remaining sections). Preliminary experimental results demonstrate that VHNN is superior to a classical neural network from the point of view of training speed and stability.

Example: generating virtual legs based on arm movement using VHNN
A neural network is trained to generate the kinetic status of hip, knees, and feet according to the kinetic status of shoulders, elbows, and arms captured by 4D sensors [90]. As illustrated in Figure 11 (a)-(d), four network architectures are investigated in this research: (a) multiple layer perceptron (MLP); (b) denoising autoencoder (a classical autoencoder architecture); (c) visible and hierarchical neural network with two subsystems (VHNN2); and (c) VHNN with four subsystems (VHNN4). It can be observed that VHNN splits the input tensor and then feeds the split tensor into multiple smaller, parallelized autoencoders. Thus, data for each joint can be calculated in parallel with their own respective autoencoder. The aforementioned parallelized autoencoder pipelines are simplified stacked autoencoders, allowing for optimization of specific, key tasks rather than one large task. A video playlist of the generation of virtual legs based on VHNN may be found at [92].
As illustrated in Figure 9, the generated kinetics of virtual limbs can be corrected using time-series models such as ARIMA.
As illustrated in Table 1, the proposed VPNN architecture has proven to have overall superior results compared to previous work. Decreased training time compared to previous autoencoders architectures can be observed due to the parallelization of simpler autoencoders, increasing efficiency by easing optimization. This is done by allowing autoencoders to train on specific gestures in a whole movement. In addition, it does not exhibit data-hungry tendencies that state-ofthe-art models exhibit, allowing it to be trained on small amounts of data.
Lower ground truth error can be seen in the VPNN-AE-2 versus VPNN-AE-4. This is due to training data having no anomalies that real-time data can exhibit. While VPNN-AE-2 with single-correlation works better when testing against ground truth data, VPNN-AE-4 with double-correlation works better in real-time as the patient may not follow the Tai-Chi movements correctly. This causes worse ground truth error as the added complexity of the architecture increases noise  during output, but enables better patient-error tolerance. Because of this additional noise produced of VPNN-AE-4, improvements through larger training datasets, more sophisticated pre-and post-processing of data, as well as improved NN architecture could be achieved.

Construction of virtual limb using active orthosis
In order to provide users with physical support, the generated virtual limb can be re-constructed on motoring system to drive Hip-knee-ankle-foot orthoses (HKAFOs) [97,98]. Paralysis of hip abductor muscles is one of the most common reasons for prescribing HKAFOs. They can incorporate flexion-extension and abduction-adduction control and have free or locking joints [99]. Different from passive and semi-active orthoses, the HKAFOs have basically built-in power supplies, one or more actuators for moving the joint, the sensors for getting feedback data [97].
The designed active orthosis is shown in Figure 12(A). Knee and ankle are considered rigid; but with locking mechanisms located at the hip and knee joints, and these parts can move anytime person desires. Therefore, in consequence of any adverse motion, the limb will be protected from harm. Also, in the active orthosis, the system acts from the hip zone and only performs "flexion" and "extension" motions. The HKAFO has two mechanical structures: (1) the gear and T type deflector reducer mechanism to transmit the generated torques of an actuator to the hip joints; and (2) pulley and four-bar mechanism, which is used for transferring the generated torque to the knee joints. With the mechanical system used for the motor to move in both directions, also provided power save, it is being aimed to
reduce battery consumption to minimum which was a huge problem in these devices. Illustration of the control circuit is shown in Figure 12(B). The patient's intention to perform a flexion or extension motion is detected by both EMG and accelerometer sensors. In order to determine the last location of the patient after movement, physical feedback is utilized from the mechanical system. Adding the new ankle joint to HKAFOs for real-time virtual limb can also be considered. The EMG signals may be subject to preprocessing to remove unwanted interference; the most common sources of interference are power line harmonics and motion artifact from electrode movement. As myoelectric signals have a time sequence with a random number of elements, it is not practical for classification. Therefore, the signal sequence should be mapped to feature vectors. Feature vectors of EMG signals are classified to detect which movement produces specific results. Deep neural networks, fuzzy logic, finite state machine and support vector machine, etc. may be adopted as classifiers. In this work, the Finite State Machine (FSM) was chosen as a classifier. The FSM consists of a status set, input, output, event set, and state transition functions. The behavior of each system's state is characterized by a possible system state. Here, the transitions between output states are provided, depending on the input variable and the present state of the system. The EMG signals and the accelerometer data collected from both legs are classified using the FSM method. The result of this classification is used for three different situations for actuator input. These situations are: the patient stops, moves right leg or moves left leg, respectively.

Individualized movement choreography
Different users have different health statuses and clinical requirements. VIGOR employs generative deep neural network architecture to create initiative and individualized Tai-Chi movements [26] to benefit users in the most effective way [100][101][102]. The most challenging issue in deep learning enabled choreography is how to balance the training reliability and the creativity of neural network. In this work, we propose the following techniques: (1) visible neural network, which incorporates biomechanics into the neural network, is employed to formulate the generative movement; (2) only mechanical property such as joint/muscle force and moment is used to measure the generative movement; (3) second-order optimizer is used to speed up the training the neural network.

Tai-Chi choreography based on LSTM-RNN
In this work, Long Short-Term Memory type of RNN (denoted as LSTM) [103,104] is employed to design individualized Tai-Chi choreography [26]. Human3.6M dataset (high quality 3D joint positions and rotations at 50FPS) and our in-house dataset (acquired by Microsoft Kinect V2, including joints' XYZ and Quaternions, 24-30FPS) are used as the training data. The Tai-Chi movement is created clip by clip (or subsequence by subsequence) according to users' health conditions and their clinical rehabilitation requirements [21]. Figure 13 shows the frame-work of LSTM-based Tai-Chi choreography design. A Tai-Chi movement (or sequence) is partitioned into multiple subsequences (aka a clip or clips). A seed subsequence, which can be generated randomly, is fed into the trained model. The output token is regarded as the succeeding subsequence that is fed back into the model for the following subsequence, as a result a creative Tai-Chi sequence can be created clip-by-clip. Four thread visible and hierarchical AutoEncoders [106] are used to reduce problem dimensionality. The resulting individualized Tai-Chi choreography [100][101][102] is integrated into the VR or AR environment [88] from which users can learn. Online video [105] shows a sample Tai-Chi choreography. Compared to other deep learning-enabled choreography projects [107], the proposed method may have faster training speed and be more problem-oriented because (1) the geometric configuration of human anatomy is kept by employing Joint-coordinate systems such as Euler angles. [36,41], and (2) human biomechanics are preserved by introducing kinetic features [41,108].

Movement choreography based on visible GAN
LSTM-based choreography suffers from relatively large accumulated error and lacks a global picture of Tai-Chi choreography. As an effective deep generative model, Generative Adversarial Networks (GANs) learn to model distribution either with or without supervision for high dimensional data (images, texts, audios, etc.), and have been gaining considerable attention in many fields [109][110][111]. In VIGOR, GANs may be considered to generate novel Tai-Chi movements by simulating a given distribution.
As illustrated in this work, conventional GAN such as DCAN [46], suffers from frequent modal collapse during the training state, particularly on generator side. The discriminator often improves too quickly for the generator to catch up, which is why we need to regulate the learning rates or perform multiple epochs on one of the two networks. To balance the training of generator and discriminator for decent output, this work investigates the following strategies: (1) Application of Wasserstein distance to formulate the loss function [46,112]. (2) Application of visible neural network by incorporating the biomechanics theory (inverse dynamics and the transient dynamics simulation of human body [60,68]) in the formulation of generator and discriminator. The neural network is personalized using boundary and initial conditions of human dynamics. Figure 14 shows the pipeline of GAN-enabled human movement choreography system. A generator G generates kinematic data out of latent vector, and a discriminator D estimates the probability that a sample came from the training data rather than G. Fed with latent vector, which is randomly generated in the beginning and derived from the transient dynamics simulation of human body thereafter, the generator generates a series of personalized and creative Tai-Chi kinetic subsequence to fool the discriminator. The discriminator is trained to discriminate between "real" Tai-Chi kinetic sub-sequences (from the training set) and "fake" Tai-Chi sub-sequence generated by the generator. Because the generator is fed with deterministic simulated data, an equilibrium of the "adversarial game" between the generator and discriminator can be reached much easily.
In this work, a musculoskeletal biomechanics guided loss function is used to formulate the objective of discriminator: where {X, Y} indicates labelled training data; f (X, θ)ispredictingoutputof neural network; θ ∈ ℜ n indicates the parameters (weight and bias) of neural network; R θ ðÞ: ℜ n ! ℜ is the regularizer, whose importance is controlled by regularization strength ϱ ∈ ℜ; same as Eq. (3), L biomechanics (f (X, θ)) denotes the bio-mechanics violation of choreography with weigh γ ∈ ℜ; L aesthetics (f (X, θ)) denotes the violation of athletic elegance violation about the designed choreography with weigh η ∈ ℜ. Figure 14 also illustrates that the generated kinetics needs to made temporally consistent according to specific time series prediction models such as ARIMA (Eq. (4)), LSTM, and Fast Fourier Transformation (FFT).

Polynomial-based Hessian-free Newton-Raphson optimizer
Many deep-learning-enabled applications suffer from training data scarcity. Various strategies have been investigated to overcome this limitation. Besides visible neural network, polynomial-based Hessian-free Newton-Raphson algorithm (poly-HFNR) [69,113] is proposed to deal with data scarcity issue by speeding up the NN learning efficiency. The superiority of poly-HFNR optimizers includes: (1) A fewer number of training epochs in NN configuration than first-order-convergence optimizers such as stochastic gradient decent (SGD) algorithms; (2) Less computation and storage complexity (O(N) where N is the degree-of-freedom of neural network) than typical implementation of Newton-Raphson based algorithms; (3) Non-convex tolerance; and (4) Circumventing the explicit formulation of the Hessian matrix and the iterative/direct solution to Newton's equations (for optimization) during the training process of the neural network.
Poly-HFNR based on Neumann-series-based (Neumann-poly-HFNR) and poly-HFNR based on generalized least-squared polynomial (GLS-poly-HFNR) [47,69,113,114] have been developed and critically assessed with respect to benchmark problems such as iris-classification, air-foil recognition, simulation of yachtdynamics, and pima Indian diabetes. Both implementations demonstrate reliable and super-linear convergence performance. The experimental results illustrate that: (1) from the point of view of storage and computation complexity, poly-HFNR is comparable with SGD; (2) from the point of view of convergence performance, poly-HFNR is completely comparable with Quasi-Newton. Our future work will focus on (a) evaluating poly-HFNR on various large-scale benchmark problems; (b)  improving the convergence of poly-HFNR from super-linear to quadratic convergence rate; and (c) developing CUDA-version poly-HFNR and then transplanting it into popular deep learning framework such as Pytorch, TensorFlow, and Caffe.

Conclusion
This work presents VIGOR system that has a strong potential for broad significance to the physical and psychological health of people with limited mobility. It is expected that VIGOR may (1) produce an affordable and user-friendly platform which promotes regular physical activity via a seamless interaction between the user and the Tai-Chi model/master; (2) cultivate and enhance interdisciplinary research by integrating the expertise of physical therapy, psychology, computer science, electrical engineering, and structural mechanics; and (3) adapt to other movement modalities (e.g, yoga).
The major research elements include: (1) Seamless real-time 4D human-machine interaction based on affordable input/output hardware instruments such as Kinect sensor, foot-pressure sensors, actuator, assistive device/exoskeleton, and VR goggle, etc.; (2) Kinetic movement grading and identification; (3) Adaptive virtual limb generation over VR/AR and assistive device/exoskeketon; and (4) Individualized movement choreography(i.e., creative movement design). As the major research contributions of this work, visible and hierarchical neural network (VHNN) architecture is proposed to recognize and predict human kinetics efficiently; and a polynomial-based, Newton-Raphson algorithm is proposed for efficient optimization. Both techniques play significant roles in small-data problems.
As part of our future work, the clinical effect of VIGOR system will be assessed. Specifically, we plan to evaluate both the user-experience and the feasibility of VIGOR by conducting a few of phases of a human subject study with healthy and mobility-limited adult human subjects. In every phase, subjects will be surveyed and interviewed following exposure to VIGOR. The clinical data will be analyzed using Auto-Regressive Integrated Moving Average (ARIMA) model [44].