Building a Better Exoskeleton Controller

With the successful execution of the Extended Kalman Filter (EKF) exoskeleton project, we now had a controller that could adapt the assistance from bilateral ankle exoskeletons in real-time across different terrains and speeds, while keeping synchrony with the user’s gait using the phase variable. This got us a good portion of the way to overcoming a long-standing obstacle to widespread exoskeleton adoption: the lack of a controller that can work outside the limited confines of the laboratory and in real-world scenarios. With this contribution, I was basically ready to graduate UMichigan with my PhD, having done my part to push the exo field forward.

But I wanted to do better! Despite the success of the EKF controller, I noticed it had a few primary limitations, and thus the potential for improvement.

Live by the continuous gait model, die by the continuous gait model: At the heart of the previous EKF controller lay a continuous gait model that continuously mapped gait states to kinematics, such as foot angles. In real-time, the controller inverted measurements of these kinematics, which were readily available from the sensors on the ExoBoot, to estimate the likeliest gait state that produced those measurements. This gait model was regressed using explicit basis functions and an open-source dataset of 10 subjects, which worked well for continuous speed/incline walking where it was pretty easy to find basic linear relationships between gait state and kinematics, which were in turn easily captured using (Fourier/Bezier) basis functions. However, this explicit modelling approach wasn’t capable of handling tasks that were harder to model; for example, the kinematics of stairs locomotion would be more difficult to model. In particular, highly non-transitory locomotion modes, such as transitions between tasks, would be extremely difficult to encode using this gait model approach, which limited the tasks the EKF controller could function over.
It’s (nothing) personal: The gait model in the EKF was regressed using least-squares over that 10 subject dataset, which functionally generated a model that encoded the average kinematics of walking. Only issue? Some individuals walked significantly differently from this average that it led to systematic biases in the gait state estimates from the controller. Additionally, the literature has repeatedly found that individuals have significant differences in their gait kinematics when walking with exoskeleton assistance, which similarly also led to biases in the state estimates as the average gait model expected normative walking.

The successor to the EKF exoskeleton controller would thus need to solve these limitations in order to truly solve the challenge of a practical controller for real-world use.

To address these limitations, I turned to machine learning as a possible solution.

Using machine learning, we could reframe the gait state estimation challenge as a supervised learning task, where we could train a neural network to predict the gait state when given a set of kinematic measurements. The primary advantage this approach grants us is to bypass the need for an explicit gait model entirely, as this network can instead learn a direct relationship between kinematics and gait state without needing to specify any kinematic relationships a priori. This meant that we could extend the gait state, and thus, the possible use cases, to encompass stairs locomotion, and critically, also gracefully handle state transitions between more discrete locomotion tasks, as the network could then learn these transitions on its own during training; thus this provides a potential solution to the first limitation of the EKF. Additionally, neural networks can be pretrained on large corpora of data, and then finetuned to a more specific use case; analogously it would be possible to train a general gait state predictor on vast quantities of open source kinematics data, and then finetune that network using a far smaller personal sample of an individual user’s gait to learn their specific walking patterns, thus solving the second challenge via personalization through finetuning.

For the machine learning architecture, I chose the Transformers architecture, which has seen great usage and widespread popularity over the last few years. I personally had prior experience with Transformers during a side project in which I built a chatbot, so I felt pretty comfortable with the theory. As a high level summary, a Transformer contains two components: an encoder and a decoder. The encoder takes as input a sequence of elements, and encodes each element into a high dimensional latent space. This latent space is what encodes the network’s knowledge of the fundamental task at hand, in this case walking. The decoder then converts this latent representation back to a human interpretable sequence. Key to this process is the Transformer’s attention mechanism, which is in brief a specialized weighting applied by the encoder and decoder that selectively weights elements of the input sequence depending on how relevant or important they are to the output; in effect, the Transformer pays attention to important elements primarily. To illustrate via analogy, in the field of Natural Language Processing (NLP), a Transformer can be used to translate a sentence of words in one language to another. The encoder embeds each word in the original language into a latent space that captures the fundamental semantic meaning of those words, while applying attention to the most important words in the original sentence. The decoder then translates the latent space representation to the target language, with the attention mechanism having eased the task by identifying which words were most important to the translation task.

For the gait state estimation challenge, I posited that a Transformer could essentially “translate” a time series buffer of kinematics into a succinct gait state representation.

The latent space of this Transformer could thus be said to encode the fundamental nature of walking itself, with both gait state and kinematics just being expressions of this walking in different semantic spaces. Important kinematic relationships to the gait state (for example, foot angle being highly informative of incline during early stance) could theoretically naturally emerge as the Transformer learned to pay more attention to those kinematics measurements when performing its translation. In real-time, we could update this buffer with new measurements, and predict new gait states accordingly.

I opted to pre-train this Transformer at the core of this new controller on data from three open-source datasets: the 10-subject dataset used in the EKF experiment, another 10-subject dataset collected by my friend and colleague Emma Reznick in the LocoLab, and a 22-subject dataset collected by researchers in the EPIC lab at Georgia Tech. Each dataset contained kinematics data for a variety of speeds, inclines, and stairs locomotion trials, along with corresponding gait state labels; the total number of samples was roughly 16 million, so plenty of data with which to train a Transformer.

This Transformer would take as input the global foot angle, the global shank angle, those angles’ velocities, and the upward and forward accelerations of the heel. The gait state, extended relative to the EKF, now comprised phase, speed, incline, and a stairs variable (is_stairs) that classified stairs locomotion (-1 for stair descent, 1 for stair ascent, and 0 for non-stairs walking).

To train the Transformer, I signed up for a Google Vertex AI account and leverage the compute offered. The Transformer itself was implemented in PyTorch. Training followed a modified train-test split: of the 42 subjects, 11 were selected as the validation set, with the remaining 31 used to train the network to translate kinematics to gait state. This was done to evaluate the network’s performance on unseen subjects. Training itself was standard and informed by my past experience, featuring mainstays such as the AdamW optimizer and learning rate scheduling upon decreasing loss values. Gait state labels and kinematic measurements were normalized to the range (0,1) to improve convergence. Further, the latent space was set at a higher dimensionality (dim=32) than the gait state vector, which was done to allow the Transformer to learn the biases and slight rotations that would show up for the same measurements across the datasets, suggested by these guys.

In silico, the pretrained Transformer evaluated on the validation subjects walking featured 1.6% phase RMSE, 0.07 m/s speed RMSE, 0.7 degrees incline RMSE, and a 95%+ accuracy on classifying stairs samples. Pretty good! We were able to outdo the EKF in silico performance while also classifying stairs, which boded well for this approach.

Sample results from the simulated validation of the Transformer. Ground truth in blue, predicted in red.

To get a better idea for if the Transformer could be successfully finetuned, I leveraged the data we had already collected from the previous 10 subjects who participated in the EKF study, as we had their labeled kinematics+gait state data already on hand. We could then finetune the pretrained network on each person’s data and determine if estimation improved, and by how much.

To proof against potential ‘catastrophic forgetting’ (as they say in the literature), i.e. the network overfitting to the personalized data, I developed a finetuning scheme that used two established techniques from deep learning: 1) Elastic Weight Consolidation (EWC), which identifies the neurons in a network most important for the original task, and adds an additional regularization loss that penalizes changing those neurons, 2) rehearsal, which simply interleaves in samples from the original pretrained set into the finetuned set such that the network maintains a baseline level of performance.

By combining these two approaches and leveraging the EKF data, we developed a finetuning process that demonstrably improved estimation performance, thus verifying that personalization to improve gait state estimation was possible using deep learning, while ensuring that we did not overfit to the data.

Sample results from the simulated finetuning of the Transformer. Ground truth in blue, predicted from finetuned in red, predicted using the general model in black. As expected, finetuning dramatically increases performance.

OK, so we validated the ideas that the Transformer could address the limitations of the EKF (explicit models, lack of personalization). Should be straightforward to slap it on the ExoBoot right? However, I did have some reservations about using the Transformer. I view machine learning models, in particular deep learning networks, as ‘black-boxes‘ that produce a set of outputs via some hard-to-decipher internal process. Indeed, that’s the whole point of the network, that you don’t have to know the internal process. Consequently, you don’t have nearly as much intuition and explainability for why a network produced the output that it did when given an input, as opposed to an explicit model-based approach, where by definition, you know exactly why the model behaved the way it did. Similarly, there’s not as many guarantees that the network will perform well on future data, since future data is not guaranteed to look at all like past data, and this could potentially lead to fragile systems; in our case, since we’re attaching this controller to a squishy yet valuable human, I wanted the controller to have guarantees of robustness and safety beyond those offered by the Transformer to prevent any injury.

My solution was straightforward: Combine the Transformer with the EKF.

We do this by treating the neural network predictions as a set of direct measurements of the gait state. Through this combination, we could use the best of both worlds: the Transformer handled difficult to model tasks and afforded us personalization, while the EKF granted us access to the robust and well studied Bayesian Filter framework, which is easy to tune and implement.

The overall architecture of the controller I developed is shown above. On the left is the primary gait transformer (green box), which provides the first estimates of the extended gait state. In practice, the Transformer estimates were pretty reliable, and I likened them to a feedforward estimate of the gait state that acted as the operating point, while the EKF acted as a stabilizer around that operating point.

Speaking of, moving right through the controller, we enter the EKF itself (solid blue), which has two main steps.

The first is the process/prediction step (dashed blue), which governs how we expect the state to naturally evolve across time steps. We modeled the gait phase as increasing via numerical integration of the phase rate over the time step, while the other task variables (speed, incline, stairs) stayed constant. This evolution is tunable via a set of process gains, which govern the bandwidth and aggressivity of the filter.

The second, more interesting step, is the measurement/update step (dotted blue), where the EKF uses measurements of the gait state and kinematics to update its estimate of the gait state. Here, we actually use the sensor measurements in combination with a continuous gait model (green) that maps gait state to kinematics.

In the regular EKF, the gait model was expressed using continuous Fourier and Bezier basis functions. For this new controller, owing to the gait state now containing stairs, we represented the gait model using a standard multilayer perceptron neural network, which took as input the extended gait state and produced estimates of foot and shank angles, their velocities, and the accelerations of the heel. These quantities were chosen as we could measure them using the ExoBoot, and I had already developed hardware infrastructure/codebase to interface with them during the past EKF project. Training for these gait model networks was even easier than for the primary Transformer, and they ended up being pretty lightweight.

The continuous gait model that maps gait state to kinematics in the EKF. Since the gait state is now extended, these models are now neural-networks. The left column depicts the kinematics during incline locomotion, while the right column depicts the kinematics during stair ascent locomotion. The model relations of foot angle (top row), heel forward acceleration (middle row) and heel upward acceleration (lower row) as functions of phase, speed, and ramp are shown.

Similarly, the heteroscedastic model (red box above) that dynamically changed the trust in the kinematics measurements as a function of phase was also now expressed as a standard multilayer perceptron. Training for this guy was significantly trickier, as the network needed to learn to predict covariances (i.e. the differences between measurements and the predictions of the gait model) which theoretically spanned a far greater range of possible values (roughly from 0 to 1e5) and thus made normalization far more difficult. I solved this by training the network to predict the log of the covariance, which made normalization straightforward (normalizing by the power of the covariance); the regular covariance was then obtained by taking the exponent of the predicted covariance.

It was noteworthy that the same expected relationships in the heteroscedastic covariance model, such as foot angle being informative of incline, emerged naturally during training, which was neat to confirm.

Finally, the gait state estimate from the Kalman Filter was used as input to a predefined biomimetic torque profile (black box on the right of the figure above), which was chosen to demonstrate proof-of-concept that the torque could adapt in real time to the changing gait state.

The heteroscedastic model that changed trust in the kinematics as a function of phase

With the controller outlined, we were ready to implement on hardware. Just like with the EKF, the primary computer running the controller was a Raspberry Pi 4, which interfaced with the IMUs and the Dephy ExoBoot. Owing to the new neural networks though, which were too intensive to run on the Pi, we hosted the networks on a Jetson Nano and performed all network evaluations there. The networks communicated with the Pi via Ethernet cable in real-time.

To evaluate the network’s performance I had fourteen subjects walk in our lab’s gait circuit platform, which had different sections such as a ramp and staircase, that would test the controller’s capacity to adapt the gait state in real time.

On the left is a plot showing a sample trial from a participant. Here, the participant ascended up the ramp, paused for a bit, walked forward on the flat portion, paused again, and then descended the stairs. Blue denotes the ground truth states (recorded using Vicon mocap), and orange denotes the controller’s estimates.

Pretty good! The controller is clearly keeping synchrony with the user, as the phase is on point. For the speed, the controller starts and stops when the subject paused. The incline estimate clearly responded to the change in ramp when the subject ascended, and similarly, the controller correctly picked up that the subject was descending stairs.

The overall phase root means squared error (RMSE) averaged for the fourteen subjects was 5.1%, which was good, and competitive with the state of the art. The RMSE for the speed was 0.16 m/s, the RMSE for the incline was 1.5 degrees, and the classification accuracy for stairs locomotion was 95%. Overall, a pretty successful extension to the EKF. This controller handles even more tasks than the EKF, including task transitions, and allows for simple personalization to account for individuals’ specific gaits.

One really cool result I wanted to highlight was this visualization of the latent space of the Transformer (left). I fed in some kinematics data through the primary Transformer and recorded the latent space embeddings for each input window of kinematics samples. Then, I used t-SNE to reduce the 32-dimensional latent space to two dimensions and plotted the results. Each dot in the figure is then a set of kinematic measurements at a single time index. For interpretability, I highlighted each type of locomotion a different color.

The first thing that stands out is the presence of these circular patterns, which actually makes a lot of sense given the cyclic nature of walking! This is especially evident when considering the trigonometric embedding of phase into a sine and cosine we used, and it was cool to see that the Transformer learned the cyclic nature of walking. Additionally, these circular patterns of different radii all have a predominantly different color, indicating that the Transformer was able to separate and classify different kinds of gait (e.g. incline walking vs stairs ascent), which was also a testament to the effectiveness of the Transformer.

Building a Better Exoskeleton Controller

The successor to the EKF exoskeleton controller would thus need to solve these limitations in order to truly solve the challenge of a practical controller for real-world use.

To address these limitations, I turned to machine learning as a possible solution.

For the gait state estimation challenge, I posited that a Transformer could essentially “translate” a time series buffer of kinematics into a succinct gait state representation.

My solution was straightforward: Combine the Transformer with the EKF.

Reinforcement Learning to Solve the Abelian Sandpile Game