Abstract
This letter presents approaches that reduce the computational demand of including second-order dynamics sensitivity information into optimization algorithms for robots in contact with the environment. A full second-order differential dynamic programming (DDP) algorithm is presented where all the necessary dynamics partial derivatives are computed with the same complexity as DDP’s first-order counterpart, the iterative linear quadratic regulator (iLQR). Compared to linearized models used in iLQR, DDP more accurately represents the dynamics locally, but it is not often used since the second-order partials of the dynamics are tensorial and expensive to compute. This work illustrates how to avoid the need for computing the derivative tensor by instead leveraging reverse-mode accumulation of derivatives, extending previous work for unconstrained systems. We exploit the structure of the contact-constrained dynamics in this process. The performance of the proposed approaches is benchmarked with a simulated model of the MIT Mini Cheetah executing a bounding gait.
1 Introduction
In comparison to their biological counterparts, legged robots are yet to be as mobile and dexterous. Animals devise and execute fast and fluid movements on the fly despite having numerous degrees-of-freedom (DOF). This capability is most beneficial in new, unknown, and uneven terrain—where the fast and fluid movements are tailored to the novel context. The scientific gap for legged robots to achieve this goal is, in part, due to challenges from the robot’s highly nonlinear dynamics and the curse of dimensionality from many DOFs. Toward endowing robotic platforms with the ability to traverse over ground like their biological counterparts, current approaches often cast the robotic locomotion problem as a trajectory optimization (TO) problem and solve for the optimal control tape. The resulting control inputs ideally allow the robot to achieve some desired motion while maintaining energetic efficiency and respecting several constraints.
However, the high-dimensional state space and highly coupled nonlinear dynamics of robots often render the solution of a whole-body optimal control problem (OCP) impractical. Conventionally, the solutions to these OCPs rely on model reduction to limit the dimensionality and nonlinearity of the problem. For example, Refs. [1,2] use simple dynamical models (i.e., template models [3]) that encode the salient features of the full dynamical model. The authors [1,2,4] use the spring-loaded inverted pendulum (SLIP) template model to optimize the gait of a bipedal robot, an articulated robotic leg, and a humanoid, respectively. The work of Ref. [5] uses the evolution of the angular momentum and the center of mass position, typically known as the centroidal dynamics, for trajectory generation for a humanoid. The work of Ref. [6] uses a single-rigid-body approximation of the full model that ignores the leg dynamics while reducing the OCP to a convex problem. Finally, Ref. [7] uses the zero-moment point to generate reference trajectories for a quadruped. While template models significantly simplify the optimization problem, they do not reason about all pertinent features of the robotic platform. The effect is that the resulting motions cannot reach the performance limits of the full system as compared to motions generated via whole-body motion planning.
Whole-body motion planning considers the full dynamics model of the robot and solves a finite-horizon TO problem. The computational burden to solve this problem is significantly greater than when adopting a template model. With the advent of more computational power, however, the transition to whole-body TO is steadily gaining popularity. For example, the authors in Ref. [8] used a whole-body trajectory optimizer based on differential dynamic programming (DDP) to control a 22DOF humanoid robot. Rather than optimizing all control variables at once, DDP exploits the temporal sparsity of the OCP and solves a sequence of smaller optimization problems at each point along the horizon. Further, DDP enforces trajectory continuation between time points, making it a shooting method. The effect is that trajectories are dynamically valid at any iteration of the DDP optimization process, which encourages their usability online before the optimizer converges. In addition to a feed-forward control tape, DDP also outputs a locally optimal feedback policy, which can be used to handle disturbances [9]. As originally described [10], DDP does not handle constraints. However, several recent modifications to DDP have been proposed that address control limits [11] and general state-control constraints [12,13]. Further, the works of Refs. [12,14–16] have extended the algorithm from smooth dynamical systems to hybrid systems that have sequences of smooth modes with reset maps determining transitions between them. In this work, we consider the extension of DDP to systems undergoing rigid contacts with the environment in the same fashion as Ref. [14], but with the mode timings fixed. Beyond addressing hybrid dynamics that arise due to different contact configurations, these methods can also be applied to adopt coarsening modeling abstractions across the planning horizon [17], which can reduce the computation demand of whole-body planning.
To perform TO, DDP considers second-order approximations of the dynamics model. However, in practice (e.g., Refs. [8,15,18]), many researchers have opted to use a first-order dynamics approximation due to its faster evaluation time, giving rise to the iterative linear quadratic regulator (iLQR). While the second-order dynamics information retains higher fidelity to the full model locally, it is represented by a rank three tensor and is expensive to compute. Our previous work [19] avoided the vector-tensor operation in DDP by presenting an algorithm to calculate the necessary second-order partials efficiently. Reverse-mode accumulation of derivatives was leveraged to achieve that effect for smooth dynamical systems. In this work, we extend [19] to constrained rigid-body systems with a focus on systems in rigid contact with the environment. Despite this application focus, much of our development readily applies to other constraints.
1.1 Specific Contributions.
This letter presents several advances that accelerate DDP for use with systems in rigid contact with the environment, as in legged robots. We (a) expose the structure of the second-order dynamics partials needed in DDP, paving the way toward an efficient computation strategy that avoids calculating the full second-order dynamics derivative tensor. The structural form uncovered motivates us to (b) extend the recursive-Newton–Euler algorithm (RNEA) to support our computation of contact-constrained dynamics partials. The resulting algorithm is called the modified RNEA for contacts (mRNEAc). Finally, we (c) use reverse-mode automatic differentiation (AD) tools with mRNEAc to accelerate the computation of second-order partials needed in DDP as compared to conventional approaches. The final outcome is that the proposed methods allow for the computation of the needed second-order dynamics information in the same computational complexity as in iLQR.
The remainder of this letter is organized as follows. Section 2 provides rigid-body dynamics preliminaries and discusses a DDP algorithm for hybrid systems with fixed mode sequences. Sections 3 and 4 detail the proposed approach for reducing the computational demand of including second-order dynamics information in DDP with contact-constrained systems. Section 5 presents results and an application to optimizing a quadruped bounding gait. Finally, a discussion is provided in Sec. 6, with Sec. 7 concluding and offering future directions.
2 Background: Trajectory Optimization
2.1 Dynamics Preliminaries.
2.2 Hybrid Systems DDP Algorithm.
For the impact event (6d), we consider the quadratic value function approximation across the impact using similar formulas as in Ref. [14] that represent a natural generalization of Eq. (8). The ground penetration constraint (6d) is considered using an augmented Lagrangian approach from Refs. [14,25], whereas other constraints such as torque limits, friction cone constraints, etc. (6f) use a relaxed barrier (ReB) method [9,14,26]. We refer the interested reader to Ref. [14] for detailed derivations and implementation details of how these methods work together.
3 Reducing Complexity in Hybrid Systems DDP
Within DDP/iLQR, the computation of the dynamics partials is the most time intensive operations. We present an approach for reducing the computational complexity of evaluating the tensorial parts of the Q − coefficients in Eq. (8). We first review conventional methods of computing partials for DDP, and then set the stage for our contributions of computing them efficiently.
3.1 Conventional Partials.
Remark 1 As formulated, Eq. (12) involves several matrix-tensor, tensor-vector, and tensor transpose operations. Their implementation details are left out in this letter, though we acknowledge that most operations require careful convention choices for the chain rule operations to provide correct results.
For implementation, AD tools can be used twice on to achieve the desired results. In using AD tools twice, computing is an operation to obtain the partials of the n entries of with respect to the n entries of the z and the n entries of the y vectors. This approach is illustrated in Fig. 1. Within the DDP context, the second-order partials will be contracted with the fixed vector (see Eq. (8)) from the left. The tensor-contraction operation is also an operation. Since this AD approach requires forming a tensor operator, whenever we use this method, the resulting approach is denoted as tensor DDP. We aim to reduce the complexity of these second-order partials by exploiting their structure.
3.2 Proposed Refactoring for Efficient Computation.
4 Efficient Partials Via Modifications to the RNEA
Modified RNEA Algorithm for Contacts (mRNEAc)
Require:
1:
2: for to do
3:
4:
5:
6:
7:
8:
9: end for
10: return
Additionally, this algorithm enables the use of reverse-mode AD [28] tools. Reverse AD provides a general-purpose approach to compute the gradient of any scalar-valued function in the same complexity as the evaluation of the function itself. This result is also known as the “cheap gradient principle,” where the computational cost of evaluating reverse AD is bounded above by a small constant (at most five) times the cost of evaluating the function itself [28].
The result is that by using the mRNEAc with reverse AD, we can efficiently compute all the terms in Eq. (15) without resorting to tensor operations, and in lower computational complexity than tensor DDP. As a key outcome, the term T1 can be calculated in complexity. First, reverse-mode AD can be used to compute in . Then, AD can be used again for the partials of that result, setting the complexity at . The term T2 can also be calculated in complexity, but by running mRNEAc n times with the n columns of as input. Taking the partials of that result using AD sets the complexity of this operation at . This approach is illustrated in Fig. 2.
The effect is that all the second-order partials are calculated efficiently in and without requiring any tensor operations. Overall, this approach reduces the computational complexity of taking the second-order partials needed in DDP as compared to applying tensor contraction. This development represents the third technical contribution in this letter, with this approach for DDP termed mRNEAc DDP.
5 Results
5.1 Dynamics Partials.
First, we evaluate the proposed methods on an underactuated n-link pendubot model (Fig. 3) whose last link is pinned to the ground through a revolute joint. The n-link pendubot model allows us to test the scalability of the proposed methods with an increasing number of DOFs. Our analysis was carried out using matlab with CasADi [27], which allows for rapid and efficient testing of AD approaches. We first compare the computation time of the different methods for evaluating partials for iLQR/DDP.
The computation time for evaluating the dynamics partials is shown in Fig. 4, where each point represents the average time over ten calls. Figure 4 is presented on a log-log scale, and as such, any curve of the form y = C xp will appear as a line log(y) = log(C) + p log(x) whose slope corresponds to the polynomial power. As such, the slope of the plotted lines represents the empirically observed polynomial order of the algorithm. The evaluation of the second-order partials using Tensor DDP took the longest time. The computation of the second-order partials for mRNEAc DDP (slope of 2.28) showed an order reduction compared to Tensor DDP (slope of 3.13). As shown, the time to compute the second-order partials for mRNEAc DDP was a comparable order of magnitude as the first-order partials of iLQR (slope of 2.21). Here, the computational cost of calculating the second-order partials for mRNEAc DDP was at most 1.39 times the cost of evaluating first-order partials for iLQR. We note, however, that compared to first-order partials, second-order partials involve proportionally more mathematical operations. As such, the computation of the second-order partials for mRNEAc DDP takes slightly more time compared to first-order partials for iLQR. The computation time benefits for DDP were especially more apparent for systems with a higher number of DOFs, indicating the potential of the proposed methods to include second-order information into TO algorithms for legged robots or other high-DOF systems.
5.2 Trajectory Optimization.
The resulting motion from TO via iLQR/DDP is illustrated in Fig. 5. The robot starts in the back-stance mode and runs forward at a speed of 0.5 m/s. The initial guess trajectory for this motion was constructed via a heuristic controller that implements proportional-derivative (PD) control in the flight mode to achieve a pre-defined configuration. In the stance mode, the heuristic controller calculates stance leg forces following a SLIP model and converts the obtained ground reaction forces to joint torques. The initial states assume the robot to be moving forward at 0.5 m/s. The hybrid systems iLQR/DDP algorithm [14] considers ground penetration, torque limits, non-negative normal ground reaction force, and friction cone constraints. The resulting motion respects those constraints, as shown in Fig. 6, with snapshots of the final motion included in Fig. 5.
Further, we consider the timing demands of the iLQR/DDP approaches. We consider 50 different initial trajectories, where the iLQR and the DDP variants were optimized for 50 iterations. As shown in Fig. 7, iLQR on average took the least time as compared to the DDP variants, whereas tensor DDP took the most time. As compared to iLQR, on average, mRNEAc DDP took approximately 3.32 times more computation time than iLQR, whereas tensor DDP took 28.3 times more computation time. The evaluation of the dynamics partials (see Fig. 4) via the full mRNEAc DDP took comparatively more time over iLQR since DDP has to first compute the first-order partials and then the second-order partials. Further, DDP often requires a regularization scheme [8] and may occasionally repeat the backward pass to ensure effective cost reduction, which requires additional time.
6 Discussion
The proposed DDP approach provides a reduction in the computational complexity and computation time as compared to the conventional DDP approach. The computational complexity of our method for computing second-order derivatives in DDP was the same as computing first-order derivatives for iLQR. Further, full second-order DDP offers quadratic convergence from near-optimal trajectories whereas iLQR loses that quadratic convergence property. DDP also retains better local fidelity to the dynamics model and better captures the nonlinear effects of the model. As such, for a robot facing modeling inaccuracy (e.g., slight terrain variation) in its contacts with the environment, the increased local fidelity of the robot model could help prevent falls. Moreover, in unstructured environments, when terrain information is known, reasoning about the full model with our methods could help the robot more quickly tailor its trajectories to the environment thanks to DDP’s quadratic convergence.
7 Conclusions and Future Directions
This paper presented several advances that accelerate the inclusion of second-order information into DDP for application to systems making rigid contact with the environment. This work leveraged reverse-mode AD tools and a refactoring of the RNEA known as mRNEAc to efficiently compute second-order partials while avoiding a costly vector-tensor contraction operation. The result presented is a DDP approach that computes second-order partials in the same complexity as first-order partials in iLQR. As compared to conventional DDP, the proposed DDP greatly reduces the computation overhead.
Several opportunities motivate the next steps of this work. First, for conciseness sake, Forward Euler was used as the integration scheme, though we also see an opportunity to extend this work to implicit [30,31] and predictor-corrector integration schemes to further improve numerical integration stability. Second, this work focused on the technical derivations of our contributions and we see value in validating this work on legged robot hardware, which would represent the first use case of full DDP on a legged robot. Third, while this work considers a single-shooting DDP framework, several methods in the literature [16,32] have shown the potential of a multi-shooting framework to reduce TO’s sensitivity to initial conditions. We see the potential to use the same approaches as proposed herein within a multi-shooting framework. Finally, we note that DDP has in theory been shown to exhibit quadratic convergence from near-optimal trajectories [33]. As such, we consider an opportunity for TO to exploit that convergence property with robots operating in uncertain and varied terrains. We also consider theoretical underpinnings on whether that property holds for hybrid systems DDP, as the original DDP algorithm [10] currently only has theoretical convergence guarantees for smooth dynamics.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.
Footnote
There are a few subtleties to make use of λ and π in the algorithm, which are discussed in Appendix B.
Appendix A: All Second-Order Partials
This appendix provides the definitions of the partials where z and y can be q, , or τ. The derivation of has been provided in Eqs. (15), (17), and (18). Derivations for the other second-order partials follow similarly. We include the results of all nonzero second-order partials below.
Appendix B: Details of π and λ Transformations
In this appendix, we discuss how to make proper use of λ and π within the mRNEAc algorithm. For simplicity, we consider the case of a single point-contact constraint, while the development cleanly generalizes to other contact scenarios (e.g., line contact, planar contact, etc.). In Eq. (14), we had that . Looking at T1 in Eq. (15), we observe that (where nc = 3 for point contacts) pre-multiplies the Cartesian contact acceleration . Further, we note that the calculations within the RNEA determine spatial (i.e., 6D) velocities vi and accelerations of each body i [21], where is the angular acceleration of a frame attached to the body, and the rate of change in the body-frame velocity of its origin.