The timing exceeding the classification rate of 70% and the classification rate at 0.5 s each dataset.
For trans-humeral amputation, daily living tasks requiring bimanual coordination, such as lifting up a box, are most difficult, hence most urgent for a trans-humeral prosthesis to fulfill. However, in studies reported on trans-humeral prosthetic control, the states of the target objects, such as their size, relative pose and position, which are important for any real reaching and manipulation tasks, have not been taken into account. In our previous study, for a box lifting-up task, we investigated the possibility of using around-shoulder EMG (electromyogram), for identifying target-reaching-positions for the boxes with different configurations (relative pose and position). However, with only the around-shoulder EMG, it is impossible for the system to guide the prosthesis to hold or grasp target objects precisely and fast sufficiently. The purpose of this study is to explore the possibility of using both the image information from an action camera and around-shoulder EMG, to identify targeted-reaching-positions for various box configurations more accurately and more rapidly. Multinomial logistic regression was employed to realize both information integration of, and the target-reaching-position identification. A set of experiments were conducted. As a result, an average classification rate of 75.1% could be achieved for various box configurations.
- trans-humeral prosthesis
- bimanual coordination
- reaching motion
- target objects information
- logistic regression
Fore-arm prostheses [1, 2] controlled by users’ bio-signals have been the focus so far, while only fewer studies have been reported on prostheses for higher level amputees , due to the fact that there are fewer residual upper limb functions but higher DoFs (degree-of-freedoms) have to be controlled.
To solve this problem, several different approaches have been proposed. The iEEG (intracranial electroencephalogram), obtained from the intracranial electrodes embedded in the brain was used to control trans-humeral prostheses . In , Kuiken et al. reported their research efforts to control trans-humeral prostheses using EMG by TMR (targeted muscle reinnervation) technology. By the above-mentioned methods, an intuitive user-prosthesis interface could be achieved using the bio-signals with more direct information of intended motions, however, the problems are clear: they are invasive and need surgery, which costs high, and may cause physical and mental burden to patients.
In [6, 7], the EMG (electromyogram) signals from the around-shoulder area (ASA), and in , the EMG from the ASA, together with additional motion-related EEG were used, and machine learning methods were employed to explore the limited information.
Bimanual coordination between one’s healthy arm and its prosthetic counterpart, in bimanual tasks such as holding a bottle with one hand while opening its lid with another hand, operating a car handle, and lifting up a box, was proposed as one solution [9, 10, 11, 12, 13, 14, 15]. This is because at first, the needs of trans-humeral prostheses might mostly come from the bimanual coordination, since in daily living, there are many tasks that need the coordination of the limbs of both sides , while most amputees can use their healthy (normal) side to complete most tasks that do not need bimanual coordination. Secondly, more information for controlling trans-humeral prostheses can be acquired from both coordinating sides, since the required behavior of the prostheses could be estimated from not only the residual stumps, but also the motion and motor behavior of the normal side, too.
However, in the studies of bimanual coordination mentioned before, the states of target objects, such as their relative position, size, and pose, which are important for any real manipulation and reaching tasks, has not been taken into consideration. In a typical bimanual coordination task: lifting up a box by two hands, the target-reaching-position for a trans-humeral prosthesis to reach varies depending on the state of the box. For this reason, it is necessary to take into consideration the states of target objects when identifying the target-reaching-position of the healthy arm for realizing the bimanual coordination for the users of trans-humeral prostheses.
Similarly, bimanual coordination has been addressed in robotics . In a study on bimanual box grasping by a humanoid robot, the concept of grasping stability was used to deal with the different states of the box .
In our previous study, we explored the possibility of identifying target-reaching-positions with respect to various box configurations (box size and relative pose) and investigated the features highly generalized for unknown data: i.e., those that could enable the classifiers to be trained by fewer box configurations. However, it was made clear that, with only the ASA EMG, it is impossible for the system to guide the prosthesis to hold or grasp target objects precisely and fast sufficiently for the daily living activities.
This study has two relevant purposes, throughout the experiments and analyses for the bimanual box lifting task. The first is to explore the possibility of identifying the target-reaching-positions with respect to various box configurations, using two signal sources: bio-signals detected from the around-shoulder area and images from an action camera. Here a box configuration specifies the pose and the position of a box relative to the user. The reason for using the bio-signal only from around-shoulder area is that the sensors at the distal sites are more likely to be affected by external perturbation, moreover, around shoulder sensors configuration could be also applied to the amputated side. On the other hand, the reason for attaching the camera near the shoulder is that the camera there does not limit the use of both arms in practical use even in a wearable setting, and its positional relation with the trans-humeral prosthesis is straightforward. Classifiers are trained to identify the intended target-reaching-positions for different box configurations.
The second is to explore the optimal way to integrate the information from the two signal sources, to realize fast and accurate target-reaching-position. Since only with the fast and accurate estimation, there could be sufficient time for controlling the trans-humeral prosthesis to match the healthy upper limb.
2. Feature selection and classification for target-reaching-positions
2.1 Feature selection
Three hundred and ninety eight features (i.e., 8 EMG sensors × 10-time steps + 28 ratio of WL × 10-time steps + 8 total sums of WL + 28 ratio of total sum of WL + box pose + box position) were calculated from the measured data. Apparently, using all the features for classification may cause training problems, such as flattening or over-fitting. In this study, the Akaike information criterion (AIC)  was used for feature selection.
Here, k is the number of parameters in the statistical model, and L is the maximized value of the likelihood function for it.
Method of incrementally increasing and decreasing representative variables in  was used to select features. That is, if the AIC does not decrease when the next feature is added, the feature selection ends. To decide the initial values for the selection, the ratio between interclass variance and in-class variance in  was used. The feature with the largest ratio is adopted as the initial value.
2.2 Evaluation of the features and classification of the target-reaching-positions in the multinomial logistic regression
Multinomial logistic regression analysis was employed as the classifier. The method is called a multinomial logit model, which is one of several natural extensions of the binary logit origin. This multinomial logit model counts the relative probability of being in one category versus being in a reference category, k, using a linear combination of predictor variables. Consequently, the probability of each outcome is expressed as a nonlinear function of p predictor variables .
The multinomial logit model can be expressed as the following equations:
where πj = P (y = j)(j = 1, 2, …, k) is the probability of an outcome being in category j, k is the number of response categories, πj = P (y = j), and p is the number of predictor variables. A total of j-1 equations was solved simultaneously to estimate the coefficients. The coefficients in the model express the effects of the predictor variables on the relative risk or the log odds of being in category j versus the reference category, here k, . When used in classification, the probability of each label can be obtained from the above equations and the feature obtained by measurement. The label with the highest probability is the classification result.
In the feature selection by AIC, a feature is selected by its compatibility with the previously selected features. Therefore, in essence, the features selected earlier are not guaranteed to the best. On the other hand, coefficient, and the p value of the coefficient of the feature by the logistic regression (coefficient, p value of the coefficient) can represent how the feature affects the classification. The feature with the smallest p value affects the classification the most. The reason for using AIC as feature selection is that the logistic regression equation could not deal with directly a large number of predictor variables, i.e., features. For the above reasons, we performed the feature selection using AIC, and feature evaluation logistic regression.
Regarding classification methods, SVM  and neural networks [1, 2] are well used for bio-signals. However, in this research, not only the classification but also the information integration based on feature selection and evaluation is required, which is difficult for both SVM and neural networks. Contrarily, the multinomial logistic regression can perform a dual role of classification and feature evaluation. In addition, since classification results of the multinomial logistic regression come with the probability, it is also possible to evaluate the ambiguity of the classification. Furthermore, the multinomial logistic regression uses only j-1 (j: number of categories) weighted sum for classification, its computational cost shall be lower than that of SVM and neural networks.
The difficulty of this research lies in the fact that, the reaching motion to the same relative position of the box with different box configuration (relative pose and position) should be classified as the same class, and in some cases, as the box position changes, even though the actual target-reaching-position is almost the same, the label of the target-reaching-position that should be classified shall be completely different. For example, the back of one box placed at a certain position, and the front of another box placed at a displacement of the box width are the planes with same position. If with only EMG, the reaching motion to both planes would be identified as the same, though they should be classified as the different ones. Therefore, it is necessary to introduce in some forms the box configuration information, and investigate how to integrate the two types of signals.
We compared between two datasets. Dataset 1 used EMG only; dataset used EMG and the box configuration (relative pose and position). Also, the classification was performed in two steps. In step 1, the upper side of the box (RP1, 2, 3) and the bottom side of the box (RP4, 5, 6) were classified. In step 2, in the case where it was classified as the upper side of the box in step 1, classification of RP1, RP2, RP3 was performed. If it was classified as the bottom side of the box, RP4, RP5, RP6 were classified. When the classification result is correct in both steps, the classification rate was increased. In that case, the classification rate was calculated by leave-one-out cross validation.
Feature extraction and feature selection were performed every 0.1 s from the start of motions. Feature selection was performed for each subject and classifier (for the upper side of the box and the bottom side of the box, for RP1, RP2 and RP3, for RP4, RP5 and RP6), and the feature was not unified among subjects. After that, the multinomial logistic regression was constructed using the selected features from the data until a specified elapsed time step, and the change of classification rate was investigated each dataset.
3. Measurement experiment
Three male healthy subjects, of age 23, with no known history of neurological abnormalities or musculo-skeletal disorders, participated in the experiments. They were informed about the experimental procedures and asked to provide a signed consent.
3.2 Experiment procedure
The subjects were required to stand comfortably in front of a table. Before starting a new trial, they were asked to rest the palm of their dominant hand naturally open. They were instructed to move their dominant hand towards one of the six target-reaching-positions on the side of a box, for the purpose of lifting it up (Figure 1), after pushing a button to denote a new trial.
The size of the box used during the experiment was 260 × 310 × 165 mm (Length × Width × Height). The box was placed in one of four different poses, and three different positions relative to the subject, as denoted in Figures 2 and 3, respectively. The subjects were asked to reach a total of five times for each box configuration, giving a total of 360 (6 positions × 4 poses of box × 3 position of box × 5 times) trials. The subjects were required to do the reaching motion with 1.0 s, following the tempo of a metronome. They could rest for a few seconds between each trial. Muscle activity, skin surface undulation during the motions were recorded with the sensors and devices described in the next subsection.
In the experiment, eight EMG sensors (Trigno, DELSYS), were used to measure the muscle activity. The sensor signals were recorded using Powelab 16/35 (AD instruments), at a sampling frequency of 400 Hz. Generally, the sampling frequency used for muscle activity recording is 1 kHz or more, but because no frequency-domain features are to be used in the classification, as shown in the next subsection, the sampling frequency was decreased.
The eight EMG sensors were placed on the skin surface of eight different muscles around the shoulder: Latissimus dorsi, Deltoid middle strand, Deltoid front strand, Deltoid rear strand, Triceps branchii, Middle part of trapezius, Descending part of trapezius, Pectoralis major, as shown in Figure 4, were selected according to the shoulder anatomy .
The action camera was attached to the shoulder mouth. Then, the image during the reaching motion measurement was acquired. However, this study no information was acquired using image processing. Although the action camera and the algorithms to process the images have been determined, in this study, because it is the integration of information from different signal sources that is to be investigated, the information of relative pose and position the of box was directly used.
3.4 Feature extraction
The EMG signals were processed by a 1 Hz high-pass filter.
The features were based on the waveform length (WL) of filtered raw signals. WL is a measure of complexity of the EMG signal, which is defined as the cumulative length of the waveform over the time segment . The following features calculated.
WL in the segmentation delimited by every time step (0.1 s) and the ratio of WL of each two EMG channels in that interval
The total sum of WL until a specified elapsed time and Ratio of WL of each two channels in that interval
Regarding the relative pose and position information of the box, the angle of the reaching side (as shown in Figure 2), and the distance between the subject and the box (as shown in Figure 3) were used, respectively. To simulate the error possibly caused by image processing, and investigate the tolerance of the classification to configuration deviation, a simulated error was added to the configuration information.
In the analysis for classification rate (Sections 4.1 and 4.2), evaluation of features (Section 4.3), random values (−0.5 to +0.5) created using MATLAB were added to the box pose and box position. Here, 0.5 mean 50% of the angular interval between different poses (45° × 0.5 = 22.5°) (see Figure 2), or distance interval between different relative positions (15 cm/2 = 7.5 cm) (see Figure 3). In the analysis of the effect of simulated error (Section 4.4), four levels of simulated errors: 0, ±0.5, ±0.75, ±1.0 were added. That is, the maximal actual distance (position) error is ±15 cm (±1.0) and, the maximal actual angle (pose) error is ±45° (±1.0). That is, the maximum error given is same as the angular interval between box poses.
The box pose and box position information were introduced as categorical variables. The box pose information P1 and P2 are set to 1, P3 and P4 are set to 2 in Figure 2. The box position information L1, L2 and L3, as shown in Figure 3, were set to 1, 2, and 3, respectively. Also, all features were standardized using the Z score for evaluation of selected features. For a random variable X with mean μ and standard deviation σ, the z-score of a value x is
4. Results and discussion
4.1 Comparison using classification rates
Figures 5–7 show the classification rate at each elapsed time step of the reaching motion for each subject. RPi (i = 1–6) in each figure represents a reaching positions, the meaning of the digit i can be found in Figure 1. At the end of the reaching motion, the classification rate achieved by classification with only EMG and that of EMG + box configuration information was 60.0 and 75.1%, on average for all subjects, respectively. It is clear that the classification rate was greatly improved by integrating the box configuration information and ASA muscle activities.
In Figures 5–7, the legend markers RP 123, RP 456, RP upper_and_bottom represent the result of classifying relative position 1, 2, 3, relative position 4, 5, 6, and relative position upper row and bottom row, respectively.
As seen from Figures 5(a),6(a), and 7(a), when using only EMG as the features, at the elapsed time step of 0.5 seconds, the classification rate of RP 123, RP 456 and RP upper and bottom was 55.4, 59.6, and 84.3% on average for all subjects, respectively. At the end of the reaching motion, the classification rate of RP 123, RP 456 and RP upper and bottom was 68.9, 62.2 and 91.5% on average for all subjects, respectively.
In contrast, when using the EMG and the box configuration (relative pose and position) information as the features, at the elapsed time step of 0.5 s, the classification rate of RP 123, RP 456 and RP upper and bottom was 76.9, 74.4 and 84.5% on average for all subjects, respectively. At the end of the reaching motion was 83.5, 82.2 and 90.9% on average for all subjects, respectively.
It can be seen from these results that, no clear classification rate increase was observed even if the state of the box was introduced in classification of the box top and bottom. On the other hand, it is found that the box configuration is effective for identifying the depth of the reaching motion, since an increase of about 20% was observed.
Although the classification rates are not as high as those in the studies for recognizing the motions of hands and fingers [1, 2, 4], considering the disadvantages brought by the boxes with different configurations, and limited EMG measurement sites, the results are acceptable. Moreover, the results are comparable to those in the research on complex motions [26, 27], in which the classification rate reported was around 70% too. So in the following analysis, 70% is used as the threshold for investigating the real-time characteristics.
Table 1 shows the timing when the classification rate exceeded 70%, and the classification rate at 0.5 seconds for each subject. As seen from the table, when using only EMG as the features, the classification rate did not exceed 70% for any subjects. When using the EMG and box configuration as the features, subject A, B, C achieved 70% at the timing of 0.4 0.9 and 0.8 s, respectively. Also, when using only EMG as the features, the classification rate at 0.5 seconds was 51.7, 46.4, and 46.1%, for subject A, B, and C, respectively. In contrast, when using the EMG and box configuration as the features, subject A, B and C achieved 71.7, 53.6, and 65.8%, respectively. By introducing the box configuration information as the features, the classification rate of subject A, B, and C increased by 20.2, 7.2, and 19.7%, respectively. From these results, it is clear that, the information of box configuration enables more accurate and faster classification.
|The timing exceeding the classification rate of 70% [s]||The classification rate at 0.5 s [%]|
|Subject||Only EMG||EMG and box configuration||Only EMG||EMG and box configuration|
4.2 Comparison using classification probabilities
Figure 8 shows the probabilities obtained by the logistic regression at the end of the reaching motion of the subject A. In the figure, (a) shows the case using only EMG as the features, (b) shows the case using both EMG and box configuration (pose, position) information as the features. A reaching position with the highest resultant probability was counted as the classification result.
From Figure 8(1, 2), it can be seen that in the classification of box upper and bottom, high probabilities were achieved even when only EMG was used as the features. From Figure 8(3–8), when only the EMG was used as the features, the probabilities were low even if the classification results were correct (a), but when both EMG and box configuration were used, the probabilities showed a clear difference for classification, which means that ambiguity decreases by introducing the box configuration information as features.
4.3 Evaluation of selected features
Tables 2 and 3 show the features selected using AIC, the coefficients of each feature in the logistic regression, p value in the classification of upper and bottom side reaching position. Table 3 have the similar layout, showing the features selected using AIC, coefficients of each feature in the logistic regression, and p value for classification of RP1/2/3, and RP4/5/6, respectively. In Tables 2 and 3, the selected features were arranged in the selected order.
|Selected feature||Coefficient||p value||Selected feature||Coefficient||p value|
|137, vdWL||1.23||0.001||127, vdWL||1.55||0.003|
|89, rdWL||−0.88||0.024||93, rdWL||−2.91||1.23E-7|
|99, rdWL||2.21||1.90E-8||53, rdWL||−0.84||0.004|
|152, rtWL||1.15||0.001||87, rdWL||1.12||0.001|
|117, vdWL||1.38||9.48E-6||180, vtWL||1.78||4.16E-4|
|84, rdWL||1.10||0.001||176, vtWL||−1.26||2.69E-4|
|32, rdWL||−1.55||5.64E-7||21, rdWL||0.89||0.006|
|65, rdWL||−1.58||6.11E-5||61, rdWL||0.70||0.002|
|85, rdWL||−2.11||2.70E-6||111, rdWL||0.99||0.18|
|Selected feature||Coefficient||p value||Coefficient||p value|
|π4 versus π6||π5 versus π6|
|(a) In classification of RP1/2/3|
|π4 versus π6||π5 versus π6|
|(b) In classification of RP4/5/6|
From Table 2, it is clear that, for the classification of RP upper and bottom side, the box configuration information (both the pose and position), was not selected by the AIC selection process. As can be seen from Table 3, for the classification of RP1/2/3 and RP4/5/6, the box pose and position were selected. Moreover, the p value of the box position is the smallest, which means the box position is the most contributing feature for the classification.
4.4 Influence of the simulated errors for box configuration information
If the box configuration information is calculated from the image processing from the action camera, errors occur due to the influence of noise, measurement error and the other system errors. Therefore, in this research, the tolerable range of the error was investigated by adding simulated error to the true box configuration.
Figure 9 shows the influence of the simulated error level of the box configuration information on the classification rate of each subject. As seen from the figure, the classification rate decreased when the error level was increased in all subjects. In the case of subject A, even if the highest level error, 1.0 was given, a classification rate exceeding 70% was obtained. In the case of subject B, when the simulated error level 0.75 or more was given, the classification rate was lower than 70%. For subject C, when simulated error level 1.0 was given, the classification rate fell below 70%. From these results, it can be said that the error level should be controlled to 0.5 or less (position: 7.5 cm or less, posture: 22.5° or less).
In this research, we employed multinomial logistic regression to realize both information integration of two signal sources: images and around-shoulder EMG, and the target-reaching-position identification for 12 box configuration (pose 4 × position 3).
A high classification rate was achieved using both information sources. It was found that the box configuration information contributes to the classification of the depth of the reaching motion. Moreover, since the timing at which the classification rate exceeds 70% greatly differs from each subject, it is considered that the optimal classification timing might be individual dependent. Furthermore, the classification rate decreased when the error level was increased in all subjects.
In the experiment, we only changed the box position in the depth direction relative to the subject. Lateral changes of the box position relative to the subject shall be investigated, in the near future. Moreover, the effect of the box configuration information calculated from the real images captured by the active camera should be studied and compared with the results of this study. Since the error caused by the image acquisition and processing, as well as the real computational cost shall affect the information integration. Finally, the system should be finally validated with the data from amputee subjects.
This work was supported by JSPS Grant-in-Aid for Scientific Research (B) 17H02129.