Motor execution detection based on autonomic nervous system responses

Triggered assistance has been shown to be a successful robotic strategy for provoking motor plasticity, probably because it requires neurologic patients’ active participation to initiate a movement involving their impaired limb. Triggered assistance, however, requires sufficient residual motor control to activate the trigger and, thus, is not applicable to individuals with severe neurologic injuries. In these situations, brain and body–computer interfaces have emerged as promising solutions to control robotic devices. In this paper, we investigate the feasibility of a body–machine interface to detect motion execution only monitoring the autonomic nervous system (ANS) response. Four physiological signals were measured (blood pressure, breathing rate, skin conductance response and heart rate) during an isometric pinching task and used to train a classifier based on hidden Markov models. We performed an experiment with six healthy subjects to test the effectiveness of the classifier to detect rest and active pinching periods. The results showed that the movement execution can be accurately classified based only on peripheral autonomic signals, with an accuracy level of 84.5%, sensitivity of 83.8% and specificity of 85.2%. These results are encouraging to perform further research on the use of the ANS response in body–machine interfaces.


Introduction
There is increasing interest in using robotic devices to assist individuals who suffered neurologic injuries such as stroke and spinal cord injury (Marchal-Crespo and Reinkensmeyer 2009). Neurologic patients' active participation is thought to be essential for provoking motor plasticity (Lotze et al 2003, Perez et al 2004, and by assisting the movement that participants cannot achieve by themselves, active assist exercise provides novel somatosensory stimulation that can help induce brain plasticity (Rossini and Dal Forno 2004). Triggered assistance allows the participant to attempt a movement without any robotic assistance, only initiating the assistance when some performance variable (e.g. force generated by the participant, limb velocity, or muscle activity measured with surface EMG) reaches a threshold.
Triggered robotic assistance, however, requires sufficient residual motor ability or remaining muscle activity to activate the trigger, and hence is not applicable to individuals who have no functional motor ability left as a result of a severe neurologic injury. In these situations, brain-computer interfaces (BCI) have emerged as promising solutions (del R Millan et al 2010). Brain-computer interfaces could be used to control robotic devices to move the impaired limb when an intention to move is detected from cortical activity. Intention to move is defined as a supraspinal command that results in a physiological change, and eventually in a movement. Electroencephalography (EEG) and, more recently, functional nearinfrared spectroscopy (fNIRS) are the most widely used non-invasive techniques employed in BCIs. However, the burden of connecting sensors on the patients scalp and the relatively long training period required for the user to produce classifiable brain signals can be time consuming and frustrating. Additionally, the system performance can be severely affected by the interference caused by sensor location and, in the case of fNIRS, hair color and thickness. All these challenges can lead to user frustration and, ultimately, rehabilitation withdrawal (Coyle et al 2004, van Gerven et al 2009. More recently, studies have introduced the concept of body-machine interfaces (BMI), where physiological signals can be self controlled and used to detect functional intent (see Blain et al (2008) for a review). Responses of the autonomic nervous system (ANS), such as cardiorespiratory and electrodermal responses, can be measured with economical off-the-shelf instrumentation and are relatively fast to set up. Physiological signals such as skin conductance response, heart rate, respiration rate and skin temperature have been shown to have the potential of serving as inputs for the development of BMIs (Blain et al 2008). However, these previous studies in BMIs are mainly based on self-paced physiological signal changes, and thus the approach still requires the subject to perform a training phase to learn how to successfully control his/her physiological signals. Nevertheless, recent studies in psychophysiology showed that non-self-paced physiological signals can also provide a proper method to estimate a person's emotion and frustration level Andre 2008, Scheirer et al 2002), mental workload (Wilson and Russell 2003, Collet et al 2009 and activity engagement (Kushki et al 2012) without his/her active participation.
Physiological measurements have also been employed to increase the performance of BCIs. These so-called hybrid BCIs use brain recording technologies in conjunction to physiological signals (e.g. heart rate and blood pressure) to improve the classification performance (for a review, see Pfurtscheller et al (2010)). Most of the work on hybrid BCIs has made use of self-paced physiological signals, whereas there are only few studies that employed non-self-paced physiological data. An example of non-self-paced hybrid BCIs that outperformed classic BCIs included the respiration rate, heart rate, skin temperature and skin conductance response in an BCI based on music imagery (Falk et al 2011). We recently conducted an experiment, the results of which showed that the addition of blood pressure, respiration rate, heart rate, and skin conductance response significantly improved the accuracy of detecting motor execution of an fNIRS-based BCI (Zimmermann et al 2012). Interestingly, while hybrid BCIs have been proposed as an alternative to classic BCIs to improve accuracy, physiological signals have never been employed as stand-alone signals to detect motion execution. This paper suggests a paradigm shift into the use of the ANS responses in BCIs: the physiological signals are treated as the unique main source of information.
This paper investigates the feasibility of a BMI to detect motor execution, monitoring only changes in peripheral autonomic signals, without direct measurement of force, EMG activity and brain activation. The motivation behind our approach is to provide an interface for severely affected neurological patients who cannot rely on their neural circuitry to trigger assistance or control a robotic device. We hypothesize that such a BMI can achieve similar performance in detecting motor execution as BCIs directly based on signals from the central nervous system. This technology could improve not only robot-assisted rehabilitation, but also assist during activities of daily living: a mobile robot or a wearable exoskeleton in a home environment could provide support during any task based on subject's motion intention.
We performed an experiment with six healthy subjects. Four physiological signals were acquired (mean blood pressure, breathing rate, skin conductance response and heart rate) during an isometric pinching task. The physiological signals were used to train and evaluate an individually optimized classifier to detect rest and active pinching periods based on hidden Markov models (HMMs). The rationale behind an individually optimized classifier, rather than the one that generalizes to a wide range of users, is to study the feasibility of a classifier that could ultimately tune its parameters to different subjects, e.g., subjects with neurological injuries such as stroke or SCI.

Measurements of physiological responses
Based on previous research in the fields of BMI and psychophysiology (Blain et al 2008, Koenig et al 2011, four peripheral autonomic signals were recorded online: electrocardiogram (ECG), respiration, blood pressure, and skin conductance response (SCR). All physiological signals were acquired at 600 Hz using a biosignal amplifier (g.USBamp, g.tec, Austria, figure 1).
2.1.1. ECG. ECG was measured using the g R .GAMMAsys active electrode system from g.tech. The electrodes (g R .GAMMAclip, g.tec, Austria) were placed using sticky patches, with the ground on the left shoulder, reference over the left clavicle, channel 1 over the right ribs and channel 2 over the left ribs. The skin area where electrodes were placed was previously cleaned, although no further skin preparation was required (i.e. shaving).
The raw ECG signal was filtered with a fourth-order Butterworth bandpass filter with the frequency band 0.01-40 Hz. The heart rate (HR) was calculated online, detecting the R-wave peaks of the QRS complex using an adaptive threshold algorithm similar to the one described by Christov (2004). The HR was simultaneously calculated using a similar adaptive threshold on the raw blood pressure signal and compared to the HR calculated from the ECG in order to increase the HR detection robustness. Time and frequency domain measures of heart rate variability were discarded as possible features, since the minimum time interval required to measure cardiovascular variability is typically 5 min (Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology 1996).

Skin Conductance Response
Respiration Rate ECG Figure 1. Measurement setup. Four physiological signals were acquired: blood pressure, respiration rate, skin conductance response and electroencephalogram.

Respiration rate.
The respiration signal was acquired using a thermistor respiration flow sensor (SleepSense R , Scientific Laboratory Products, USA) placed at the entrance of the nostrils. The sensor was fixed on the skin using hypoallergenic adhesive tape. The raw respiration signal, measured as the difference of temperature between inhaled and exhaled air, was filtered with an eighth-order Butterworth bandpass filter with the frequency band 0.1-2.1 Hz. The breathing rate (BR) was calculated using an adaptive threshold algorithm, similar to the one employed with ECG. Time and frequency domain measures of breathing variability were not considered due to the short recording periods. Other respiration-related measurements, such as breathing amplitude, were excluded after preliminary testing, since no changes between rest and active periods were observed.
2.1.3. Blood pressure. The raw blood pressure was measured with a continuous non-invasive arterial pressure system (CNAP TM monitor 500, CNSystems, Austria). An inflatable cuff was placed around the left upper arm, and two size-adjustable finger cuffs were attached to the proximal phalanges of the left index and middle fingers. Subjects were requested to position the left arm on the chest over the heart. The arm cuff was employed only for a couple of minutes during the system initialization for scaling purposes. During the experiments, only the finger cuffs were used.
The raw blood pressure signal was detrended subtracting a best-fit line (in the least-squares sense) from the raw signal in order to remove any possible signal drifts during sessions. It was further low-pass filtered with a first-order Butterworth filter with the cutoff frequency of 0.1 Hz, leaving only the low and very low frequency spectra of the signal. The selection of the low and very low spectra (called mean blood pressure in subsequent sections, BP) was performed after comparing which cardiovascular features showed the most significant changes between rest and activation periods. Thus, diastolic, systolic and raw blood pressure signals, although initially considered, were excluded after preliminary testing.
2.1.4. Skin conductance response. Skin conductance was measured attaching two electrodes (g R .GSRsensor, g.tec, Austria) through Velcro R rings to the distal phalanges of the left index and middle fingers. Skin conductance is characterized by a slowly changing background level (tonic), and a rapid time-varying (phasic) response (Malmivuo and Plonsey 1995). The tonic level is related to the general activity of the perspiratory glands influenced by external temperature. The phasic response is called SCR and is usually related to the automatic response to stimuli. The raw skin conductance signal was filtered with an eighth-order Butterworth low-pass filter with a cutoff frequency of 30 Hz. The SCR signal was linearly detrended over each rest-activity period to remove the tonic level and was further normalized.

Experimental protocol
All experiments were approved by the institutional ethics committee of the ETH Zurich (application number EK 2010-N-49), and participants were provided informed consent. Six healthy male subjects between the age of 20-30 yr were recruited from the ETH Zurich students and staff environment. Inclusion criteria were no history of neurological disorders or orthopedic problem affecting the right upper extremity.
The measurements were conducted in a silent, dark room. Subjects were requested to lie supine on a comfortable padded table. The task consisted in isometrically tracking a provided pinching force with the right index finger and thumb. Isometric pinching was chosen partly for convenience (i.e. it is a simple task that minimizes subject movement), but also because it allows for a systematic assessment of subject's performance. The force applied by the subject during pinching was measured with a one-axis thick-film force sensor (CentoNewton 100N, LPM-EPFL, Switzerland) attached to the distal phalanges with Velcro R rings (figure 2(a)). Subjects were instructed to remain as motionless as possible during the experiment. The protocol was implemented in Simulink R . The force sensor was connected to the computer via a USB data acquisition card (NI USB-6008, National Instruments Inc., USA).
The experimental protocol was described in detail by Zimmermann et al (2011). Here, only a brief summary is given for completeness. fNIRS was used to simultaneously record brain activity in motor areas; however, these signals are not used in the present analysis and are beyond the scope of this paper. Three visual commands were presented to subjects using video goggles (z920HR-VGA, Zetronix Corp., USA): (1) rest: the word rest was displayed on the screen (figure 2(b)), and subjects were instructed to remain as relaxed as possible; (2) preparation: the message get ready was displayed and subjects were instructed to be aware they will be asked to move in a few seconds. Subjects were not instructed to imagine the movement or to try to move as quickly as possible; (3) activity: the word squeeze was displayed (figure 2(c)). Subjects were requested to try to match their applied pinching force (visually represented by a dynamic horizontal white bar, figure 2(c)) with a reference force (rendered with a horizontal green bar under the reference bar). In order to prevent subjects from learning the reference force and reduce their concentration level, a complex reference force profile between 1 and 4 N was generated from a truncated Fourier series with frequencies 0.5, 1.0 and 1.1 Hz. The force level and duration of the activity periods were small enough to avoid fatigue.
The protocol consisted of a random presentation of three different sequences of visual commands.
• S1 and S2: a rest command was followed by a preparation command (of 10 or 5 s), and then followed by an activity command that lasted for 20 s. • S3: a rest command was followed by a preparation command of 10 s, and followed again by a rest command. • S4: a rest command was followed by an activity command that lasted for 20 s.
The duration of the rest commands was randomized (from 15 to 24 s) in order to reduce learning effects that may decrease attention and to avoid that the autonomic system synchronizes with the activity periods. The experimental protocol consisted of a total of 10 trials per sequence, presented in random order. The experiment was divided into two sessions of 20 trials each. Each session began with a baseline of 180 s and finished with a baseline of 120 s. The total time required to finish a session was approximately 20 min. Participants paused for 10 min between sessions.

Classifier
In a preliminary study (Zimmermann et al 2011), we found that none of the physiological signals showed a significant change between the rest and the preparation periods. Thus, only periods of activity and the rest periods that preceded were considered in the classifier, independently of the sequences they were part of (i.e. 10 trials per sequences S1, S2 and S4, and thus, a total of 30 rest periods followed by 30 activity periods).
2.3.1. Data pre-processing. The four physiological signals were further decimated to 5 Hz in order to reduce the computational time required to train and evaluate the classifier. The training and testing data sets were generated as vectors of physiological values at each sample time. A 15 s window was selected, corresponding to the shortest rest command possible. Optimizing the window length for each subject would increase considerably the training time of the classifier, and thus, the same conservative window length was fixed for all subjects. Responses of the ANS are rather slow (figure 3), and thus choosing the last 15 s of the rest periods, and the first 15 s of the active periods did not seem reasonable. Different physiological signals have different latency responses (i.e. the SCR generally shows a faster response than other systemic changes, figure 3), and thus, different post-stimulus times could ideally be used for each signal to detect the active periods. However, in order to reduce the computational time that optimizing the latency time for each subject and for each physiological signal would require, we fixed the latency time to 5 s, and thus, the 15 s windows were shifted ahead by 5 s.
A fourfold cross-validation was used to randomly distribute all pairs of associated restpinching trials into train and test data sets (i.e. 25 training trials and 5 testing trials).

Hidden Markov models.
HMMs are well known in temporal pattern recognition applications such as speech and gesture recognition. The main argument for HMMs over other classification techniques (e.g. support vector machines, linear discriminant analysis) is their ability to classify time-sequential data, such as the time-varying physiological signals presented. Here, only a brief introduction to HMMs is given. The reader is referred to Rabiner (1989) for a detailed tutorial.
A HMM is a finite-state machine containing N unobservable (hidden) states (S = {S 1 , S 2 , . . . , S N }). The probability of transition to other states only depends on the current state and is defined by a transition probability matrix A = [a 11 a 12 · · · ; a 21 · · · ; · · · a NN ]. HMMs emit an observation vector at every time sample O t = {O 1 , O 2 , . . . , O F } that depends only on the current state and number of features F. Each state has an associated observation probability distribution B which determines the probability of generating an observation at a certain time step. The probability of starting in a specific state is modeled by the initial state distribution π .
A HMM is completely characterized by defining the number of states N, the initial and transition state probabilities (π and A), and the observation probability distributions B at each state (denoted in short as λ(π, A, B)). In this paper, a left-right Markov model topology was chosen that allowed transitions only from each state to itself and to the state to its right (figure 4). The observation probability distributions were chosen to be mixtures of M Gaussians with full covariance matrices in order to account for possible observation correlations. To reduce the chance of overfitting the classifier, only HMMs with a maximum of five states and two mixtures were considered. The number of observations was set to 4 described in section 2.1 (figure 3).
The initial transition matrix A 0 and initial state probability π 0 were estimated by uniformly distributed random numbers. The observation probability distribution B 0 was initialized using k-means clustering on training observations. In order to find the optimal model parameters λ (π, A, B) given a fixed number of states and mixtures, the initial probability parameters were  Figure 4. Illustration of two four-state left-right HMMs for rest and active conditions. The four observations employed in this study were: heart rate (HR), respiratory frequency (RF), mean blood pressure (BP) and skin conductance response (SCR). The observation probability distributions at each state are represented as mixtures of two Gaussians. adjusted using the Baum-Welch algorithm (Rabiner 1989) on the training data. The freely distributed HMM toolbox for Matlab by Murphy (1998) was used.
Given a specific number of states and mixtures and a sequence of test observations, the likelihood that the observed sequence was produced by any of the two HMM models (one for rest, and a second one for active) was computed, using the forward-backward algorithm (Rabiner 1989). Subsequently, each of the testing trials was classified into one of the two models, by selecting the model with the highest likelihood. Based on the different number of hidden states N = {1, 2, 3, 4, 5} and different number of mixtures M = {1, 2}, a total of ten models were trained for each of the rest and active classes.

Evaluation
The metrics used to quantify the classifier performance were accuracy, sensitivity and specificity: where TP is the number of true positives (correctly detected active periods), TN is the number of true negatives (correctly detected rest periods), FP is the number of false positives (rest periods classified erroneously as active) and FN is the number of false negative (active periods classified as rest). The performance metrics were calculated for each k-fold partition and averaged over one complete cross-validation run. A well-known problem with HMMs is their lack of convergence to a global maximum. Changing the initial model parameters λ(π 0 , A 0 , B 0 ) results in a different optimized trained model. In order to reduce the effect on performance variability due to random initial model parameters, we ran the evaluation procedure a total of 7 times. The mean and the standard The chance level in a two-class BCI is not exactly 50%, but 50% with a confidence interval at a certain level (95%) that depends on the number of training trials (Muller-Putz et al 2008). The calculation of the confidence interval was performed using a binomial distribution of p = 0.5, considering 30 trials per class (two sessions), and 15 trials per class (one session). This yielded upper confidence limits of 64.1% when considering the two sessions and 66.5% for one session. Thus, the obtained performance metrics were considered above the chance level when their means were significantly higher than the corresponding upper confidence limits. The significant level was set to p = 0.05.

Results
Due to technical problems, the physiological signals of subject 6 were recorded only during the first session. Figure 5 reports the per-participant classifier performance sensitivity, specificity and accuracy values obtained using the four features with the combination of number of states N and number of mixtures M that yielded the maximum accuracy. The optimum number of states and mixtures per subject, and the mean and SD of the sensitivity, specificity and accuracy for each subject are reported in table 1. All subjects performed significantly above the a priory set 64.1% chance threshold (66.5% for subject 6).
The average classifier accuracy over the six participants was 84.5%. The optimization of the HMM parameters (N and M) for each user required a calibration session with known active and rest intervals from the fourfold cross-validation training data set. The calibration phase that iterates for each number of states and mixtures combination takes a relative long time. In practical BCI situations, however, it would be desirable to reduce the time required for Table 1. Classification sensitivity, specificity and accuracy, (mean ± SD) across seven complete cross-validation runs for each subject, for the best combination of number of states and number of mixtures, and for a fixed HMM structure (N = 3, M = 2). The optimum number of states N and mixtures M for each subject in the personalized HMM are also reported.

Personalized HMM
Fixed HMM Sensitivity Specificity Accuracy Sensitivity Specificity Accuracy 75.5 ± 7.2 78.8 ± 9.9 77.1 ± 5.7 3 2 75.5 ± 7.2 78.8 ± 9.9 77.1 ± 5.7 s2 75.6 ± 4.0 81.2 ± 8.5 78.4 ± 5.6 5 2 76.7 ± 5.5 79.5 ± 7.5 78.1 ± 5.0 s3 99.0 ± 3.7 92.2 ± 2.5 95.6 ± 1.6 2 2 98.0 ± 2.8 92.9 ± 3.0 95.4 ± 2.4 s4 78.4 ± 1.7 79.7 ± 7.8 79.1 ± 3.5 2 1 77.6 ± 4.7 78.3 ± 5.7 77.9 ± 2.8 s5 79.0 ± 4.3 87.5 ± 4.7 83.3 ± 1.7 3 2 79.0 ± 4.3 87.5 ± 4.7 83.3 ± 1.7 s6 95.2 ± 3.3 91.7 ± 2.9 93.5 ± 2.7 4 2 92.6 ± 2.4 90.5 ± 5.6 91.5 ± 3.0 Average 83.8 ± 3.7 85.2 ± 6.1 84.5 ± 3.5 8 3 .2 ± 4.5 84.6 ± 6.1 83.9 ± 3.4 calibration, while still obtaining good accuracies. In order to reduce the calibration computation time during normal BCI investigations, it seems reasonable to use a fixed HMM structure. The effect that fixing the number of states and mixtures had on the overall accuracy was studied. It was found that the number of states and mixtures that maximizes the overall accuracy (N = 3 and M = 2) resulted in only a slight reduction of 0.6% of the average classifier accuracy (83.9%). The mean and SD of the sensitivity, specificity and accuracy for each subject using a fixed HMM structure are reported in table 1. The accuracies of all subjects remained above the chance level. Some subjects showed higher accuracy levels than others (i.e. subjects 3, 5 and 6; Mann-Whitney test, p = 0.05), probably due to the intersubject differences in the ANS responses. In order to investigate the changes in the different physiological signals, the mean of the last 5 s of the rest periods was compared to the mean value from 5 s during the activity periods. Because different physiological signals have different latency responses (see, e.g., figure 3), different times after the onset of the activity period were used for each signal (3 s post-stimulus for the SCR and 5 s for all the others) (Zimmermann et al 2011). Paired t-tests were used to evaluate the presence of a significant change in each physiological signal for each subject. The significance level was set to 5%. The resulting p-values are listed in table 2.
A significant correlation between the accuracy and the number of significant features was found: subjects with a larger number of significant features (e.g. s3 with four significant features, table 2) resulted in higher accuracy (Pearson's correlation, R 2 = 0.78, p = 0.02). In order to investigate the effect that non-significant features had on the classifier accuracy, we performed feature reduction (i.e. selection of a subset of features for the classification). There exist several methods for feature reduction. A common technique to rank individual features is through ANOVA: a statistical method used to rank the features which show the most significant difference between two classes. Then, only the n most significant features are used in the classifier (Wagner et al 2005). Analysis of variance was chosen to reduce the feature dimensionality because the final extracted features are a subset of the original features, while other popular methods (e.g. principal component analysis) create new transformed features. Furthermore, ANOVA is computationally less expensive when compared to recursive feature reduction algorithms (e.g. sequential forward selection).
Based on the values reported in table 2, the p-values from more to less significant (order shown in brackets) were ranked. For each subject, the feature with the lowest p-value was iteratively added in the features list, and the classifier performance was recalculated. The classification accuracies (mean ± SD) across the seven complete cross-validation runs for  each subject, for the best combination of number of states and number of mixtures, and for different number of features are reported in figure 6. Even with only one feature, all subjects performed significantly better than the chance. Some subjects showed a slight decrease in the performance as less features were used, while some showed the opposite tendency. Although the overall performance of all subjects decreased when a limited number of features was used, the overall performance between using only one feature (80.9%) and using all the four features (84.5%) was not significantly different (paired t-test, p > 0.05).

Discussion
The goal of this study was to investigate the feasibility of detecting motor executionspecifically isometric pinching in index finger/thumb opposition-with a BMI based only on measurements of physiological signals from the ANS. Four physiological features were measured (mean blood pressure, breathing rate, skin conductance response and heart rate) during an isometric pinching task. The acquired physiological signals were used to train a classifier based on a dual HMM. We hypothesized that activity in cortical areas can be detected by monitoring changes of the ANS, instead of measuring directly at the supraspinal level. We performed an experiment with six healthy subjects the results of which showed that motor execution can be accurately classified based only on peripheral physiological signals.
We hypothesized that such a BMI based on ANS could achieve similar performance in detecting motor execution as a BCI directly based on signals from the central nervous system. This study showed that motor execution can be accurately classified based only on peripheral physiological signals with an accuracy of 84.5%. These results are in line with recent BCI studies that employed EEG to detect movement intention (Boye et al 2008) and motor imagery (Tsui et al 2009). Few non-invasive BCIs, mainly based on fMRI techniques, have shown higher accuracy levels (Lee et al (2010) achieved an accuracy above 90%). However, the infrastructural needs, electromagnetic compatibility limitations and high associated costs make fMRI-based BCIs inappropriate for standard robotic rehabilitation. On the other hand, our results slightly outperformed fNIRS-based BCIs employed to classify mental tasks such as music imagery  and mental arithmetic (Falk et al 2011).
In this study, fNIRS was also employed to simultaneously record brain activity in motor areas (contralateral primary motor cortex and ventral premotor cortex). The brain hemodynamics recorded with fNIRS were employed to train a similar dual HMM classifier (Zimmermann et al 2012). The results showed that the classifier based only on the signals measured from the central nervous system with fNIRS achieved an average accuracy of 79.4%, i.e. a slightly lower performance than the classifier based on the ANS response presented here. On the other hand, when the four physiological features described in this paper (mean blood pressure, breathing rate, skin conductance response and heart rate) were added as auxiliary observations into the HMM, the classification accuracy increased significantly to 88.5%. This is in line with recent studies on hybrid BCIs that used brain imaging methods in conjunction with self-paced physiological signals to improve the classification performance (for a review see Pfurtscheller et al (2010)). While physiological measures have been successfully employed to improve the accuracy in hybrid BCIs (Falk et al 2011, Zimmermann et al 2012, the ANS responses have never been employed as the unique information sources to detect motion execution. This paper aims at filling this gap and investigates the feasibility of a BMI to detect motor execution, monitoring only changes in peripheral autonomic signals, reaching similar accuracy levels as hybrid BCIs. Although recent studies have already introduced the concept of BMIs, where physiological signals can be self-controlled and used to detect functional intent (Blain et al 2008), these previous studies are fundamentally based on self-paced physiological signal changes, and thus subjects must be active agents in the changes of their ANS. In our approach, subjects were not requested to change their normal physiological signal responses based on the protocol stimuli. Most of the studies that worked with similar non-self-paced biosignal decoders are found in the field of psychophysiology, where the goal is to estimate subjects' emotions (Kim and Andre 2008), mental workload (Wilson and Russell 2003, Collet et al 2009 and activity engagement (Kushki et al 2012), instead of function intention.
A recently completed study investigated the use of non-self-paced peripherical autonomic signals to detect music imagery . Regardless of the fact that the goal was to decode music imagery instead of motor execution, there are some relevant similarities between these two studies. First, both studies use only physiological autonomic signals (although they used skin temperature, while here mean blood pressure was used). Both studies optimized the number of states and mixtures of a dual HMM classifier and achieved similar accuracy levels (93% in Falk et al (2010), and 84.5% here). The smaller accuracy level achieved in our work may be due to the fixed observation window length: they optimized the window lengths per subject, while we fixed them for all subjects in order to reduce the time required to train the classifier.
HMMs are well known in temporal pattern recognition applications. However, despite their higher ability to classify time-sequential data, compared to discriminative approaches (Sitaram et al 2007, Obermaier et al 2001, they have barely been used in physiology classification (Kulic andCroft 2007, Falk et al 2010). In this paper, we showed that HMMs are a valuable tool to classify motor execution based on time-varying physiological signals. There are, however, some issues with HMMs that must be considered. The per subject optimization of number of states and mixtures requires a large computation time. Here, we studied the effect that fixing the number of states and mixtures for all subjects had on the overall classifier accuracy and found that the optimal fixed model resulted in just a slight reduction of the average classifier accuracy (83.9%). Thus, in order to reduce calibration computation time during normal BCI investigations, it seems reasonable to try to find a reliable fixed HMM structure in future experiments. Some subjects performed significantly better than others. We also noted that the subset of physiological signals with significant changes was different for each participant. We found a significant correlation between the classifier accuracy and the number of significant features in each subject. We performed feature reduction using statistical tools to test how the reduction of observations affected the classifier performance. We did not find an increase in the accuracy in subjects with a reduced number of significant features. Thus, it was not the inclusion of nonsignificant features what decreased the classifier performance. Interestingly, we did not find a clear accuracy decrease neither when we reduced the number of significant physiological signals used in the classifier. The effect of removing features was dependent on each subject's specific ANS responses.
This finding contradicts recent studies that found a clear monotonic increase in classification performance as more physiological signals were added to the decoder , Kushki et al 2012. A possible explanation is that different autonomic systems may react in different ways while performing a movement, compared to a more homogenous response to music imagery , and activity engagement (Kushki et al 2012). Furthermore, the study reported by Kushki et al (2012) was performed with individuals with cerebral palsy and muscular dystrophy who presented some physiological differences due to their disabilities (i.e. features related to respiration and the cardiovascular system may have been affected in some subjects). An a priori detection of the optimal number of features based on training data, as suggested by Kushki et al (2012), could improve the classifier performance for each subject. As an indicative value, selecting the optimum number of features based on all trial data, resulted in an overall performance of 91.6%. However, such an optimization process could also increase the time required to train the classifier.
A major challenge in our research is the comparatively long time periods needed before sufficient information is available to make a decision. As expressed by Blain et al (2008), while some EEG-based BCIs have achieved information transfer rates of up to 27.15 decisions min −1 , to date BMIs that use only peripheral autonomic signals require at least 30 s to make an accurate detection. In this study, a very conservative observation window length was fixed to 15 s (chosen based on the minimum rest period length). Furthermore, a shift of 5 s was applied to account for physiological signal latencies. This led to a maximum detection delay of 20 s. Although 20 s may be seen as an unreasonable delay for BCI applications, for severely disable individuals who rely on access technologies to move and communicate, speed may not be critical. In a survey of 17 patients in the final stage of ALS who were extensively informed about the possibilities and advantages of an invasive electrode-based BCI, only one agreed to implantation. Patients refused the surgical procedure and preferred the slow non-invasive system. They argued that time is no issue if one is completely paralyzed (Birbaumer 2006). Priority will be given in further research steps to shorten this relatively long time delay. A possibility could be to select the window lengths optimally for each subject in order to reduce the delay in subjects with faster ANS responses.
The study reported here was conducted in healthy subjects without neurological lesion. We chose to first study healthy subjects in order to evaluate the normative responses of the noninjured ANS during motion execution. Results from this study provide an important starting point and a framework for comparison for future studies with subjects with neurological injury. As presented in this paper, physiological signals vary significantly between subjects. Neurological injuries, such as stroke or spinal cord injury, may affect the autonomic system, which may introduce further variations in the peripheral signals. For example, traumatic brain injury survivors are known to show abnormalities in the autonomic system (hypofunction or hyperfunction) and show an asymmetric sweating with cold hemiplegic limbs that can affect the SCR signal (Korpelainen et al 1999(Korpelainen et al , 1993. Patients with complete spinal cord injury showed no changes in electrodermal activity below the level of injury (Cariga et al 2002). SCR was shown to be significantly different in patients with multiple sclerosis (Yokota et al 1991). On the other hand, some recent studies have shown the feasibility of using some of the physiological signals presented here (i.e. heart rate, SCR and breathing rate) in stroke rehabilitation (Koenig et al 2011 and with individuals with severe physical disabilities, such as cerebral palsy and muscular dystrophy (Kushki et al 2012). Future work with subjects with neurological injuries will focus on determining if the injured ANS can be consistently employed to control a body-computer interface. We speculate that a good classifier accuracy could still be achieved if a physiological signals analysis with patients is performed prior to the training of the classifier. Weak or absent physiological responses can be discarded by means of feature reduction algorithms (Kushki et al 2012), similar to the statistical approach used in this paper.
The experiment design also suffers from some limitations. It is well known that attention and mental load significantly affect ANS responses. It is therefore possible that the differences between 'activity' and 'rest' periods reported here are associated with mental load, instead of motor execution. A well-designed control task is needed (e.g. mental arithmetics, counting backwards) to really conclude that what is classified is in fact motor execution. Moreover, the proposed method was designed to detect motion execution using data from healthy participants who were actively pinching. Motor imagery has been proposed as a strategy to detect motion intention in BCI studies (Falk et al 2011, Tsui et al 2009. However, we chose to first study isometric pinching partly for convenience (i.e. it is a simple, well-controlled task that minimizes subject movement), but also because motor imagery does not allow for a systematic assessment of subject's performance (i.e. motor imagery ability strongly varies among subjects (Sharma et al 2006)). It is important to establish the normative mechanisms of the ANS during motion execution, thereby providing a framework for comparison for future studies with motor imagery. Future work will focus on testing with a larger group of subjects to determine if motor imagery yields similar results.
Finally, though physiological signals are easy to measure, they are also affected by different environmental disturbances (e.g. auditory or visual stimuli, external temperature) and by the amount of physical activity. In this study, all these disturbances were minimized by conducting the physiological measurements in a silent, dark room while subjects lay supine. However, such a setup is not realistic in a standard therapeutic environment. Ideally, the use of a wide range of different physiological features could account for these undesirable disturbances. Although physiological signals are prone to habituation, no signal degradation was observed during the experiment described here. A possible explanation is that the random presentation of sequences and the complex reference force profile constantly engaged subject's active participation. In order to reduce the negative effects of physiological signal habituation, future experimental protocols will be designed to actively engage the subject in an assist-asneeded manner (Zimmerli et al 2012). The equipment employed to measure biosignals in this study was selected for convenience (it already existed in our laboratories), but other compact, wireless and easy-to-use solutions exist on the market (e.g. Bluetooth heart rate monitors).

Conclusion and outlook
This study showed the feasibility of a BMI to detect motor execution by monitoring only changes of the ANS. Motor execution was accurately classified using a dual HHM classifier based on only peripheral physiological signals with an accuracy level of 84.5%. These results are very encouraging to perform further research on the use of the autonomic system in BMIs for the treatment of severely impaired neurologic patients.
The long term goal of this project is to develop novel human-oriented strategies that enhance the interaction between the robotic system and the user and to incorporate them into robotic systems (e.g. for upper extremity neuro-rehabilitative training). In particular, the robotic system should estimate intention in a continuous manner so that it can optimally assist a human in the anticipated reaching, grasping or manipulation movement. With this approach, participants will control their own movements, while the robotic device will compensate for weakness. The use of physiological signals and binary classifiers may not be enough to achieve the ultimate goal of a continuous decoder. Hence, we plan to use sensor fusion, such that the most likely motor intention can be extracted from a pool of different information sources. These sources include not only physiological recordings, but more sophisticated context analysis (task knowledge and motion history information), gaze and head movement recordings, and recordings of dynamic and kinematic movement components. As an ultimate goal, we plan to incorporate the brain into the loop, integrating measurements of cortical activation acquired through fNIRS.