Abstract

This letter presents approaches that reduce the computational demand of including second-order dynamics sensitivity information into optimization algorithms for robots in contact with the environment. A full second-order differential dynamic programming (DDP) algorithm is presented where all the necessary dynamics partial derivatives are computed with the same complexity as DDP’s first-order counterpart, the iterative linear quadratic regulator (iLQR). Compared to linearized models used in iLQR, DDP more accurately represents the dynamics locally, but it is not often used since the second-order partials of the dynamics are tensorial and expensive to compute. This work illustrates how to avoid the need for computing the derivative tensor by instead leveraging reverse-mode accumulation of derivatives, extending previous work for unconstrained systems. We exploit the structure of the contact-constrained dynamics in this process. The performance of the proposed approaches is benchmarked with a simulated model of the MIT Mini Cheetah executing a bounding gait.

1 Introduction

In comparison to their biological counterparts, legged robots are yet to be as mobile and dexterous. Animals devise and execute fast and fluid movements on the fly despite having numerous degrees-of-freedom (DOF). This capability is most beneficial in new, unknown, and uneven terrain—where the fast and fluid movements are tailored to the novel context. The scientific gap for legged robots to achieve this goal is, in part, due to challenges from the robot’s highly nonlinear dynamics and the curse of dimensionality from many DOFs. Toward endowing robotic platforms with the ability to traverse over ground like their biological counterparts, current approaches often cast the robotic locomotion problem as a trajectory optimization (TO) problem and solve for the optimal control tape. The resulting control inputs ideally allow the robot to achieve some desired motion while maintaining energetic efficiency and respecting several constraints.

However, the high-dimensional state space and highly coupled nonlinear dynamics of robots often render the solution of a whole-body optimal control problem (OCP) impractical. Conventionally, the solutions to these OCPs rely on model reduction to limit the dimensionality and nonlinearity of the problem. For example, Refs. [1,2] use simple dynamical models (i.e., template models [3]) that encode the salient features of the full dynamical model. The authors [1,2,4] use the spring-loaded inverted pendulum (SLIP) template model to optimize the gait of a bipedal robot, an articulated robotic leg, and a humanoid, respectively. The work of Ref. [5] uses the evolution of the angular momentum and the center of mass position, typically known as the centroidal dynamics, for trajectory generation for a humanoid. The work of Ref. [6] uses a single-rigid-body approximation of the full model that ignores the leg dynamics while reducing the OCP to a convex problem. Finally, Ref. [7] uses the zero-moment point to generate reference trajectories for a quadruped. While template models significantly simplify the optimization problem, they do not reason about all pertinent features of the robotic platform. The effect is that the resulting motions cannot reach the performance limits of the full system as compared to motions generated via whole-body motion planning.

Whole-body motion planning considers the full dynamics model of the robot and solves a finite-horizon TO problem. The computational burden to solve this problem is significantly greater than when adopting a template model. With the advent of more computational power, however, the transition to whole-body TO is steadily gaining popularity. For example, the authors in Ref. [8] used a whole-body trajectory optimizer based on differential dynamic programming (DDP) to control a 22DOF humanoid robot. Rather than optimizing all control variables at once, DDP exploits the temporal sparsity of the OCP and solves a sequence of smaller optimization problems at each point along the horizon. Further, DDP enforces trajectory continuation between time points, making it a shooting method. The effect is that trajectories are dynamically valid at any iteration of the DDP optimization process, which encourages their usability online before the optimizer converges. In addition to a feed-forward control tape, DDP also outputs a locally optimal feedback policy, which can be used to handle disturbances [9]. As originally described [10], DDP does not handle constraints. However, several recent modifications to DDP have been proposed that address control limits [11] and general state-control constraints [12,13]. Further, the works of Refs. [12,1416] have extended the algorithm from smooth dynamical systems to hybrid systems that have sequences of smooth modes with reset maps determining transitions between them. In this work, we consider the extension of DDP to systems undergoing rigid contacts with the environment in the same fashion as Ref. [14], but with the mode timings fixed. Beyond addressing hybrid dynamics that arise due to different contact configurations, these methods can also be applied to adopt coarsening modeling abstractions across the planning horizon [17], which can reduce the computation demand of whole-body planning.

To perform TO, DDP considers second-order approximations of the dynamics model. However, in practice (e.g., Refs. [8,15,18]), many researchers have opted to use a first-order dynamics approximation due to its faster evaluation time, giving rise to the iterative linear quadratic regulator (iLQR). While the second-order dynamics information retains higher fidelity to the full model locally, it is represented by a rank three tensor and is expensive to compute. Our previous work [19] avoided the vector-tensor operation in DDP by presenting an algorithm to calculate the necessary second-order partials efficiently. Reverse-mode accumulation of derivatives was leveraged to achieve that effect for smooth dynamical systems. In this work, we extend [19] to constrained rigid-body systems with a focus on systems in rigid contact with the environment. Despite this application focus, much of our development readily applies to other constraints.

1.1 Specific Contributions.

This letter presents several advances that accelerate DDP for use with systems in rigid contact with the environment, as in legged robots. We (a) expose the structure of the second-order dynamics partials needed in DDP, paving the way toward an efficient computation strategy that avoids calculating the full second-order dynamics derivative tensor. The structural form uncovered motivates us to (b) extend the recursive-Newton–Euler algorithm (RNEA) to support our computation of contact-constrained dynamics partials. The resulting algorithm is called the modified RNEA for contacts (mRNEAc). Finally, we (c) use reverse-mode automatic differentiation (AD) tools with mRNEAc to accelerate the computation of second-order partials needed in DDP as compared to conventional approaches. The final outcome is that the proposed methods allow for the computation of the needed second-order dynamics information in the same computational complexity as in iLQR.

The remainder of this letter is organized as follows. Section 2 provides rigid-body dynamics preliminaries and discusses a DDP algorithm for hybrid systems with fixed mode sequences. Sections 3 and 4 detail the proposed approach for reducing the computational demand of including second-order dynamics information in DDP with contact-constrained systems. Section 5 presents results and an application to optimizing a quadruped bounding gait. Finally, a discussion is provided in Sec. 6, with Sec. 7 concluding and offering future directions.

2 Background: Trajectory Optimization

This work considers the efficient solution of a finite-horizon OCP for a hybrid system model with a fixed mode sequence. Section 2.1 reviews the formulation for unconstrained and constrained rigid-body dynamics. Finally, in Sec. 2.2, we introduce our DDP formulation for hybrid dynamics.

2.1 Dynamics Preliminaries.

The inverse dynamics of a fully-actuated and unconstrained rigid-body system can be computed with the RNEA [20,21] as
where H, q, τ, and ag denote the inertia matrix, generalized coordinates, joint torques, and gravitational vector, respectively. The term h is the joint-space bias vector, and it groups the Coriolis and centrifugal vector Cq˙ and gravitational components τg together. The RNEA can be evaluated with O(n) complexity, where n is the number of DOFs in the model. The RNEA is a two-pass algorithm where the forward pass propagates the velocity and acceleration terms, and the backward pass determines forces at the joints.
Consider a model experiencing contacts with the environment, which impose constraints on the free dynamics. Given the contact status (also called the contact mode herein), the dynamics can be determined via
(1)
where S denotes the selector matrix that picks out the actuated joints. The terms JcRnc×n and λRnc represent the contact Jacobian and the corresponding contact forces, where nc = 3 for a point contact. The matrix K on the left is typically known as the Karush–Kuhn–Tucker (KKT) matrix [22]. We group the joint acceleration and contact forces as ν=[q¨λ] and group the right-hand side of Eq. (1) as a vector Ψ such that the solution for Eq. (1) can be given as
(2)
where F represents the realized continuous joint acceleration function (i.e., forward dynamics) and g is the realized contact forces function. In flight (i.e., no contacts, nc = 0), Eq. (2) resolves to the unconstrained dynamics Fq¨=H1(Sτh), with g=.
For impact events, the dynamics take the same form as Eq. (1) with the exception that velocities change instantaneously at each impact event such that
(3)
where q˙+ and q˙ denote the pre- and post- impact velocities, respectively. The term, λ^, denotes the impulsive force that acts upon contact and e denotes the coefficient of restitution. We presume perfect inelastic collision such that e = 0. Since actuators cannot generate impulsive torques, control inputs do not appear in the impact equation.

2.2 Hybrid Systems DDP Algorithm.

Consider a model with state x=[qq˙], control input u = τ, and ground reaction forces λ. For each contact mode, the continuous state and control trajectories can be discretized using a numerical integration scheme, with forward Euler assumed here
(4)
where h is the integration step size.
DDP/iLQR is used to solve an OCP with a fixed sequence of m modes such that the total cost J(U) is a summation across each mode, given as
(5)
Here, i is the running cost, U = [U1, …, Um] is the control sequence over the horizon, Ui is the control sequence in mode i. The term Φi is the terminal cost at the end of mode i, which runs from index Ni to Ni+1, with Ni and Ni+ denoting the instants before and after the transition to mode i. To obtain an optimal trajectory, Eq. (5) is minimized across modes as
(6a)
(6b)
(6c)
(6d)
(6e)
(6f)
where fi and gi denote the dynamics and ground reaction force functions in each mode. Equation (6b) represents the dynamics constraints, (6c) the ground reaction force constraint, and (6d) the impact reset map with fImp the reset map equation. Equation (6e) is a switching constraint requiring the foot to be on the contact surface at any instant of impact. Finally, we group all other constraints, such as torque limits, non-negative normal ground reaction force, and friction constraints, into (6f).
By solving Eq. (5), DDP/iLQR provides the optimal control tape as
where the superscript implies an optimal quantity. Rather than optimizing over the entire control tape, DDP/iLQR solves (5) by optimizing a control policy at each time point, and working backwards in time using Bellman’s principle [23]. This process recursively provides a value function approximation as
(7)
whereVN(xN)=Φ(xN)
where the function, Qk(xk, uk), captures the cost to go when starting in state xk at time k, taking action uk, and then acting optimally thereafter. Consider the differential change to Qk in (7) around a nominal state-control pair x¯k and u¯k with δQk(δxk,δuk)=Qk(x¯k+δxk,u¯k+δuk)Qk(x¯k,u¯k), the second-order approximation of the Q function leads to
(8)
The prime in Eq. (8) denotes the next time-step, i.e., Vxx′ = Vxx(k + 1), whereas the subscripts indicate partial derivatives. Throughout this text, [[]]r denotes a rank r tensor, with a rank three tensor assumed when r is omitted. When the [[]] terms are ignored above, i.e., when second-order partials of the dynamics are ignored, the resulting algorithm is known as iLQR [8,22,24]. In DDP, [][[]] implies a tensor contraction with a vector, which returns a matrix. Later in this development, we develop a framework that directly computes this tensor-vector contraction without ever forming the tensor.
Minimizing (7) over δuk attains the incremental control
(9)
(10)
The resulting control from Eq. (10) is used to form the quadratic approximation of the value function as
where ΔV (·) is the expected cost reduction. This process is repeated until a value function approximation is obtained at time k = 0, constituting the backward sweep of DDP. Following this backward sweep, a forward sweep proceeds by simulating the system forward under the incremental control policy (10), resulting in a new state-control trajectory [10]. The sweeps are repeated to convergence.

For the impact event (6d), we consider the quadratic value function approximation across the impact using similar formulas as in Ref. [14] that represent a natural generalization of Eq. (8). The ground penetration constraint (6d) is considered using an augmented Lagrangian approach from Refs. [14,25], whereas other constraints such as torque limits, friction cone constraints, etc. (6f) use a relaxed barrier (ReB) method [9,14,26]. We refer the interested reader to Ref. [14] for detailed derivations and implementation details of how these methods work together.

3 Reducing Complexity in Hybrid Systems DDP

Within DDP/iLQR, the computation of the dynamics partials is the most time intensive operations. We present an approach for reducing the computational complexity of evaluating the tensorial parts of the Q − coefficients in Eq. (8). We first review conventional methods of computing partials for DDP, and then set the stage for our contributions of computing them efficiently.

3.1 Conventional Partials.

Within DDP, we need to account for both the first- and second-order partial derivatives of Eq. (2). Let z and y be vectors representing q,q˙, or τ such that the first-order partials of the dynamics, Kν=Ψ with respect to z are given by
(11)
When only the first-order partials (11) are included within the DDP algorithm, the resulting approach is known as iLQR. While we note that the functional form of Eq. (11) will prove important later on in this development, in implementation, AD tools (e.g., Ref. [27]) can be used once on any algorithm that computes ν=K1Ψ, to achieve the desired result νz. For a fixed number of contacts, this operation can be implemented to have O(n2) complexity. This complexity is the lowest possible since the operation provides the partials of the n entries of ν with respect to the n entries of the z vector. This approach is illustrated in Fig. 1.
Fig. 1
(a) The dynamics function, (b) conventional iLQR, and (c) conventional DDP approach.
Fig. 1
(a) The dynamics function, (b) conventional iLQR, and (c) conventional DDP approach.
Close modal
The second-order partials of Kν=Ψ, for any combination of z and y, are given as
(12)
Note that A=0 when yq and B=0 when zq. The result of Eq. (12) is a rank three tensor. For example, [[2Ψzy]] considers the n2 partials of the n entries of Ψ due to the n entries of z and y. The term [[2Kzy]]4 is a rank four tensor contracted by a vector ν to attain a rank three tensor.

Remark 1 As formulated, Eq. (12) involves several matrix-tensor, tensor-vector, and tensor transpose operations. Their implementation details are left out in this letter, though we acknowledge that most operations require careful convention choices for the chain rule operations to provide correct results.

For implementation, AD tools can be used twice on νK1Ψ to achieve the desired results. In using AD tools twice, computing [[2νzy]] is an O(n3) operation to obtain the partials of the n entries of ν with respect to the n entries of the z and the n entries of the y vectors. This approach is illustrated in Fig. 1. Within the DDP context, the second-order partials will be contracted with the fixed vector γ[Vxλ] (see Eq. (8)) from the left. The tensor-contraction operation is also an O(n3) operation. Since this AD approach requires forming a tensor operator, whenever we use this method, the resulting approach is denoted as tensor DDP. We aim to reduce the O(n3) complexity of these second-order partials by exploiting their structure.

3.2 Proposed Refactoring for Efficient Computation.

Alternatively, consider the contraction of the second-order partials (12) with a fixed vector γ, as needed for DDP
(13)
Here, as we do not need partials of γK1, we consider it as a fixed vector (i.e., its partials with respect to z or y are zero). We define ξ ≜ − K−1γ. For the remainder of this development, we focus on the second-order partials with respect to z = y = q. We note that for these q partials, B=A and we rewrite Eq. (12) as
(14)
where ξ contracts each rank three tensor term to a matrix.

Rather than lumping several quantities into the K,ν, and Ψ terms in Eq. (14), we separate the functions into their constituent terms and partition ξ=[μπ], where μRn and πRnc. This enables us to rewrite Eq. (14) as

(15)
This new structural form for the second-order derivatives of contact-constrained dynamics represents the first contribution of this letter. The corresponding second-order partials with respect to q,q˙, and τ take the same general form as (15) and their details are included in Appendix  A.

4 Efficient Partials Via Modifications to the RNEA

The structural form of T1 in Eq. (15) points toward the development of a variant of the RNEA that we call the modified RNEA for contacts (mRNEAc), which provides
(16)
where ac=Jcq¨+J˙cq˙ gives the contact acceleration. Both inputs μ and π are considered as fixed vectors (i.e., such that taking their partials with respect to q is zero). This definition (16) is motivated by the fact that then
(17)
Further, consider quantities q¨q and λq that are provided as fixed matrices. With these fixed matrices as inputs to the mRNEAc, and taking the partials of the resulting output, we also obtain that
(18)
Note that the outputs of the mRNEAc require similar components as the RNEA, but with the inclusion of the effects of contacts forces, Jcλ, and the contact acceleration, J˙cq˙+Jcq¨. In Ref. [19], a modified RNEA (mRNEA) was introduced that outputs μτ. The mRNEAc is an extension of that algorithm to include contact effects, and is presented in Algorithm 1 using Featherstone’s rigid-body dynamics notation [21].2 The mRNEAc is an O(n) one-pass algorithm and thus has a simpler computation graph compared to RNEA. This development represents the second technical contribution in this letter.

Modified RNEA Algorithm for Contacts (mRNEAc)

Algorithm 1

Require: model,q,q˙,q¨,ag,λ,μ,π

1: v0=0,a0=0,ag0=ag,w0=0,s=0

2: fori=1 to Nbodiesdo

3:  vi=iXp(i)vp(i)+Siq˙i

4:  wi=iXp(i)wp(i)+Siμi

5:  agi=iXp(i)agp(i)

6:  ai=iXp(i)ap(i)+(vi×)Siq˙i+Siq¨i

7:  Fi=Ii(aiagi)+(vi×*)Iiviλi

8:  s+=wiFiπiai

9: end for

10: return[s=μTτπac]

Additionally, this O(n) algorithm enables the use of reverse-mode AD [28] tools. Reverse AD provides a general-purpose approach to compute the gradient of any scalar-valued function in the same complexity as the evaluation of the function itself. This result is also known as the “cheap gradient principle,” where the computational cost of evaluating reverse AD is bounded above by a small constant (at most five) times the cost of evaluating the function itself [28].

The result is that by using the mRNEAc with reverse AD, we can efficiently compute all the terms in Eq. (15) without resorting to tensor operations, and in lower computational complexity than tensor DDP. As a key outcome, the term T1 can be calculated in O(n2) complexity. First, reverse-mode AD can be used to compute qmRNEAc() in O(n). Then, AD can be used again for the partials of that result, setting the complexity at O(n2). The term T2 can also be calculated in O(n2) complexity, but by running mRNEAc n times with the n columns of q¨q as input. Taking the partials of that result using AD sets the complexity of this operation at O(n2). This approach is illustrated in Fig. 2.

Fig. 2
(a) The dynamics function and (b) the proposed mRNEAc DDP approach for the second-order partials of γ⊤∂2ν∂q2. Details of the other second-order partials are included in Appendix A.
Fig. 2
(a) The dynamics function and (b) the proposed mRNEAc DDP approach for the second-order partials of γ⊤∂2ν∂q2. Details of the other second-order partials are included in Appendix A.
Close modal

The effect is that all the second-order partials are calculated efficiently in O(n2) and without requiring any tensor operations. Overall, this approach reduces the computational complexity of taking the second-order partials needed in DDP as compared to applying tensor contraction. This development represents the third technical contribution in this letter, with this approach for DDP termed mRNEAc DDP.

5 Results

5.1 Dynamics Partials.

First, we evaluate the proposed methods on an underactuated n-link pendubot model (Fig. 3) whose last link is pinned to the ground through a revolute joint. The n-link pendubot model allows us to test the scalability of the proposed methods with an increasing number of DOFs. Our analysis was carried out using matlab with CasADi [27], which allows for rapid and efficient testing of AD approaches. We first compare the computation time of the different methods for evaluating partials for iLQR/DDP.

Fig. 3
(a) The pinned pendubot model and (b) the planar quadruped model.
Fig. 3
(a) The pinned pendubot model and (b) the planar quadruped model.
Close modal

The computation time for evaluating the dynamics partials is shown in Fig. 4, where each point represents the average time over ten calls. Figure 4 is presented on a log-log scale, and as such, any curve of the form y = C xp will appear as a line log(y) = log(C) + p log(x) whose slope corresponds to the polynomial power. As such, the slope of the plotted lines represents the empirically observed polynomial order of the algorithm. The evaluation of the second-order partials using Tensor DDP took the longest time. The computation of the second-order partials for mRNEAc DDP (slope of 2.28) showed an order reduction compared to Tensor DDP (slope of 3.13). As shown, the time to compute the second-order partials for mRNEAc DDP was a comparable order of magnitude as the first-order partials of iLQR (slope of 2.21). Here, the computational cost of calculating the second-order partials for mRNEAc DDP was at most 1.39 times the cost of evaluating first-order partials for iLQR. We note, however, that compared to first-order partials, second-order partials involve proportionally more mathematical operations. As such, the computation of the second-order partials for mRNEAc DDP takes slightly more time compared to first-order partials for iLQR. The computation time benefits for DDP were especially more apparent for systems with a higher number of DOFs, indicating the potential of the proposed methods to include second-order information into TO algorithms for legged robots or other high-DOF systems.

Fig. 4
The computation time of the dynamics partials using the proposed methods for the pinned pendubot model with varying DoFs. The slope of each plot is also illustrated.
Fig. 4
The computation time of the dynamics partials using the proposed methods for the pinned pendubot model with varying DoFs. The slope of each plot is also illustrated.
Close modal

5.2 Trajectory Optimization.

Further, we consider TO for a bounding gait with a planar quadruped—a 2D model of the MIT Mini Cheetah [29]. The model of the system (see Fig. 3) has five links (torso, and two links per leg) and has n = 7 DOFs. The bounding sequence is encoded via a design of running and terminal costs in each mode that aim to track a reference configuration, and are given as
Here, xref,i refers to a pre-defined reference configuration in mode i and λref,i refers to reference ground reaction forces. The terms Q, R, and S are weighting matrices for the state, control, and contact forces, respectively, in each mode. As such, this formulation aims to attain a pre-defined configuration while reducing energy consumption and attaining the requisite ground reaction forces. The state weighting matrix, Qf,i, in the terminal cost penalizes deviations away from the desired final state in each mode.

The resulting motion from TO via iLQR/DDP is illustrated in Fig. 5. The robot starts in the back-stance mode and runs forward at a speed of 0.5 m/s. The initial guess trajectory for this motion was constructed via a heuristic controller that implements proportional-derivative (PD) control in the flight mode to achieve a pre-defined configuration. In the stance mode, the heuristic controller calculates stance leg forces following a SLIP model and converts the obtained ground reaction forces to joint torques. The initial states assume the robot to be moving forward at 0.5 m/s. The hybrid systems iLQR/DDP algorithm [14] considers ground penetration, torque limits, non-negative normal ground reaction force, and friction cone constraints. The resulting motion respects those constraints, as shown in Fig. 6, with snapshots of the final motion included in Fig. 5.

Fig. 5
Snapshots of the resulting optimized motion by iLQR/DDP
Fig. 5
Snapshots of the resulting optimized motion by iLQR/DDP
Close modal
Fig. 6
(a) Ground reaction forces satisfying the friction cone constraint for both iLQR/DDP solutions and (b) joint torque control satisfying the control limits. DDP and iLQR converged to the same solution.
Fig. 6
(a) Ground reaction forces satisfying the friction cone constraint for both iLQR/DDP solutions and (b) joint torque control satisfying the control limits. DDP and iLQR converged to the same solution.
Close modal

Further, we consider the timing demands of the iLQR/DDP approaches. We consider 50 different initial trajectories, where the iLQR and the DDP variants were optimized for 50 iterations. As shown in Fig. 7, iLQR on average took the least time as compared to the DDP variants, whereas tensor DDP took the most time. As compared to iLQR, on average, mRNEAc DDP took approximately 3.32 times more computation time than iLQR, whereas tensor DDP took 28.3 times more computation time. The evaluation of the dynamics partials (see Fig. 4) via the full mRNEAc DDP took comparatively more time over iLQR since DDP has to first compute the first-order partials and then the second-order partials. Further, DDP often requires a regularization scheme [8] and may occasionally repeat the backward pass to ensure effective cost reduction, which requires additional time.

Fig. 7
Timing comparison of the proposed methods as compared to the conventional methods
Fig. 7
Timing comparison of the proposed methods as compared to the conventional methods
Close modal

6 Discussion

The proposed DDP approach provides a reduction in the computational complexity and computation time as compared to the conventional DDP approach. The computational complexity of our method for computing second-order derivatives in DDP was the same as computing first-order derivatives for iLQR. Further, full second-order DDP offers quadratic convergence from near-optimal trajectories whereas iLQR loses that quadratic convergence property. DDP also retains better local fidelity to the dynamics model and better captures the nonlinear effects of the model. As such, for a robot facing modeling inaccuracy (e.g., slight terrain variation) in its contacts with the environment, the increased local fidelity of the robot model could help prevent falls. Moreover, in unstructured environments, when terrain information is known, reasoning about the full model with our methods could help the robot more quickly tailor its trajectories to the environment thanks to DDP’s quadratic convergence.

7 Conclusions and Future Directions

This paper presented several advances that accelerate the inclusion of second-order information into DDP for application to systems making rigid contact with the environment. This work leveraged reverse-mode AD tools and a refactoring of the RNEA known as mRNEAc to efficiently compute second-order partials while avoiding a costly vector-tensor contraction operation. The result presented is a DDP approach that computes second-order partials in the same complexity as first-order partials in iLQR. As compared to conventional DDP, the proposed DDP greatly reduces the computation overhead.

Several opportunities motivate the next steps of this work. First, for conciseness sake, Forward Euler was used as the integration scheme, though we also see an opportunity to extend this work to implicit [30,31] and predictor-corrector integration schemes to further improve numerical integration stability. Second, this work focused on the technical derivations of our contributions and we see value in validating this work on legged robot hardware, which would represent the first use case of full DDP on a legged robot. Third, while this work considers a single-shooting DDP framework, several methods in the literature [16,32] have shown the potential of a multi-shooting framework to reduce TO’s sensitivity to initial conditions. We see the potential to use the same approaches as proposed herein within a multi-shooting framework. Finally, we note that DDP has in theory been shown to exhibit quadratic convergence from near-optimal trajectories [33]. As such, we consider an opportunity for TO to exploit that convergence property with robots operating in uncertain and varied terrains. We also consider theoretical underpinnings on whether that property holds for hybrid systems DDP, as the original DDP algorithm [10] currently only has theoretical convergence guarantees for smooth dynamics.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Footnote

2

There are a few subtleties to make use of λ and π in the algorithm, which are discussed in Appendix  B.

Appendix A: All Second-Order Partials

This appendix provides the definitions of the partials 2νzy where z and y can be q, q˙, or τ. The derivation of 2νq2 has been provided in Eqs. (15), (17), and (18). Derivations for the other second-order partials follow similarly. We include the results of all nonzero second-order partials below.

Second-order partials with respect to q,q˙:
Second-order partials with respect to q˙,q˙:
Second-order partials with respect to q, τ:

Appendix B: Details of π and λ Transformations

In this appendix, we discuss how to make proper use of λ and π within the mRNEAc algorithm. For simplicity, we consider the case of a single point-contact constraint, while the development cleanly generalizes to other contact scenarios (e.g., line contact, planar contact, etc.). In Eq. (14), we had that ξ=[μπ]. Looking at T1 in Eq. (15), we observe that πRnc (where nc = 3 for point contacts) pre-multiplies the Cartesian contact acceleration ac=J˙cq˙+Jcq¨. Further, we note that the calculations within the RNEA determine spatial (i.e., 6D) velocities vi and accelerations ai=[ω˙i,v˙i] of each body i [21], where ω˙i is the angular acceleration of a frame attached to the body, and v˙i the rate of change in the body-frame velocity of its origin.

We can use this information to calculate the acceleration of any contact points. For example, assuming zero contact velocity, the acceleration of contact point c on body i can be calculated by ac=v˙i+ω˙i×pc/i. We can thus define a matrix ZRnc×6 such that ac=Zai. If we define the spatial vector πi=ZπR6, we have that
where πi is used in the mRNEAc. Likewise, we can convert contact forces λRnc to a spatial (6D) force/moment pair via defining λi=ZλR6, which is also used in the mRENAc. This construction generalizes readily to multiple contacts, where bodies not in contact are assigned πi = 0 and λi = 0 within the mRNEAc. We note that the above transformations can be most easily obtained by placing the body-fixed reference frame at the contact point.

References

1.
Apgar
,
T.
,
Clary
,
P.
,
Green
,
K.
,
Fern
,
A.
, and
Hurst
,
J. W.
,
2018
, “
Fast Online Trajectory Optimization for the Bipedal Robot Cassie
,”
Robotics: Science and Systems
,
Vol. 101
, pp.
14
.
2.
Wensing
,
P. M.
, and
Orin
,
D. E.
,
2013
, “
High-Speed Humanoid Running Through Control With a 3D-SLIP Model
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Tokyo, Japan
,
Nov. 3–7
, pp.
5134
5140
.
3.
Koditschek
,
D. E.
, and
Full
,
R. J.
,
1999
, “
Templates and Anchors: Neuromechanical Hypotheses of Legged Locomotion on Land
,”
J. Exp. Biol.
,
202
(
23
), pp.
3325
3332
.
4.
Hutter
,
M.
,
Remy
,
C. D.
,
Höpflinger
,
M. A.
, and
Siegwart
,
R.
,
2010
, “
SLIP Running With an Articulated Robotic Leg
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Taipei, Taiwan
,
Oct. 18–22
, pp.
4934
4939
.
5.
Dai
,
H.
,
Valenzuela
,
A.
, and
Tedrake
,
R.
,
2014
, “
Whole-Body Motion Planning With Centroidal Dynamics and Full Kinematics
,”
IEEE-RAS International Conference on Humanoid Robots
,
Madrid, Spain
,
Nov. 18–20
, pp.
295
302
.
6.
Bledt
,
G.
,
Powell
,
M. J.
,
Katz
,
B.
,
Di Carlo
,
J.
,
Wensing
,
P. M.
, and
Kim
,
S.
,
2018
, “
MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Madrid, Spain
,
Oct. 1–5
, pp.
2245
2252
.
7.
Bellicoso
,
C. D.
,
Jenelten
,
F.
,
Gehring
,
C.
, and
Hutter
,
M.
,
2018
, “
Dynamic Locomotion Through Online Nonlinear Motion Optimization for Quadrupedal Robots
,”
IEEE Robot. Autom. Lett.
,
3
(
3
), pp.
2261
2268
.
8.
Tassa
,
Y.
,
Erez
,
T.
, and
Todorov
,
E.
,
2012
, “
Synthesis and Stabilization of Complex Behaviors Through Online Trajectory Optimization
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Vilamoura-Algarve, Portugal
,
Oct. 7–12
, pp.
4906
4913
.
9.
Grandia
,
R.
,
Farshidian
,
F.
,
Ranftl
,
R.
, and
Hutter
,
M.
,
2019
, “
Feedback MPC for Torque-Controlled Legged Robots
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Macau, China
,
Nov. 3–8
, pp.
4730
4737
.
10.
Mayne
,
D.
,
1966
, “
A Second-Order Gradient Method for Determining Optimal Trajectories of Non-Linear Discrete-Time Systems
,”
Int. J. Control.
,
3
(
1
), pp.
85
95
.
11.
Tassa
,
Y.
,
Mansard
,
N.
, and
Todorov
,
E.
,
2014
, “
Control-Limited Differential Dynamic Programming
,”
IEEE International Conference on Robotics and Automation
,
Hong Kong, China
,
May 31–June 7
, pp.
1168
1175
.
12.
Lantoine
,
G.
, and
Russell
,
R. P.
,
2012
, “
A Hybrid Differential Dynamic Programming Algorithm for Constrained Optimal Control Problems. Part 1: Theory
,”
J. Optim. Theory Appl.
,
154
(
2
), pp.
382
417
.
13.
Howell
,
T.
,
Jackson
,
B.
, and
Manchester
,
Z.
,
2019
, “
Altro: A Fast Solver for Constrained Trajectory Optimization
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Macau, China
,
Nov. 3–8
, pp.
7674
7679
.
14.
Li
,
H.
, and
Wensing
,
P. M.
,
2020
, “
Hybrid Systems Differential Dynamic Programming for Whole-Body Motion Planning of Legged Robots
,”
IEEE Robot. Autom. Lett.
,
5
(
4
), pp.
5448
5455
.
15.
Kong
,
N. J.
,
Council
,
G.
, and
Johnson
,
A. M.
,
2021
, “
iLQR for Piecewise-Smooth Hybrid Dynamical Systems
,”
IEEE Conference on Decision and Control
,
Austin, TX
,
Dec. 14–17
, pp.
5374
5381
.
16.
Tang
,
Y.
,
Chu
,
X.
, and
Au
,
K.
,
2021
, “
HM-DDP: A Hybrid Multiple-Shooting Differential Dynamic Programming Method for Constrained Trajectory Optimization
,”
arXiv preprint
. https://arxiv.org/abs/2109.07131
17.
Li
,
H.
,
Frei
,
R. J.
, and
Wensing
,
P. M.
,
2021
, “
Model Hierarchy Predictive Control of Robotic Systems
,”
IEEE Robot. Autom. Lett.
,
6
(
2
), pp.
3373
3380
.
18.
Li
,
W.
, and
Todorov
,
E.
,
2004
, “
Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems
,”
ICINCO
(
1
), pp.
222
229
.
19.
Nganga
,
J.
, and
Wensing
,
P. M.
,
2021
, “
Accelerating Second-Order Differential Dynamic Programming for Rigid-Body Systems
,”
IEEE Robot. Autom. Lett.
,
6
(
4
), pp.
7659
7666
.
20.
Orin
,
D. E.
,
McGhee
,
R.
,
Vukobratović
,
M.
, and
Hartoch
,
G.
,
1979
, “
Kinematic and Kinetic Analysis of Open-Chain Linkages Utilizing Newton-Euler Methods
,”
Math Biosci.
,
43
(
1–2
), pp.
107
130
.
21.
Featherstone
,
R.
,
2014
,
Rigid Body Dynamics Algorithms
,
Springer
,
New York
.
22.
Budhiraja
,
R.
,
2019
, “
Multi-Body Locomotion: Problem Structure and Efficient Resolution
,” Ph.D. thesis,
Institut national des sciences appliquées de Toulouse
,
Toulouse, France
.
23.
Denardo
,
E. V.
,
2012
,
Dynamic Programming: Models and Applications
,
Courier Corporation
,
Englewood Cliffs, NJ
, pp.
1
227
.
24.
Budhiraja
,
R.
,
Carpentier
,
J.
,
Mastalli
,
C.
, and
Mansard
,
N.
,
2018
, “
Differential Dynamic Programming for Multi-Phase Rigid Contact Dynamics
,”
IEEE-RAS International Conference on Humanoid Robots
,
Beijing, China
,
Nov. 6–9
, pp.
1
9
.
25.
Farshidian
,
F.
,
Neunert
,
M.
,
Winkler
,
A. W.
,
Rey
,
G.
, and
Buchli
,
J.
,
2017
, “
An Efficient Optimal Planning and Control Framework for Quadrupedal Locomotion
,”
IEEE International Conference on Robotics and Automation
,
Singapore
,
May 29–June 3
, pp.
93
100
.
26.
Hauser
,
J.
, and
Saccon
,
A.
,
2006
, “
A Barrier Function Method for the Optimization of Trajectory Functionals With Constraints
,”
IEEE Conference on Decision and Control
,
San Diego, CA
,
Dec. 13–15
, pp.
864
869
.
27.
Andersson
,
J. A.
,
Gillis
,
J.
,
Horn
,
G.
,
Rawlings
,
J. B.
, and
Diehl
,
M.
,
2019
, “
CasADi: A Software Framework for Nonlinear Optimization and Optimal Control
,”
Math. Program. Comput.
,
11
(
1
), pp.
1
36
.
28.
Grievank
,
A.
,
2000
,
Principles and Techniques of Algorithmic Differentiation: Evaluating Derivatives
,
SIAM
,
Philadelphia
.
29.
Katz
,
B.
,
Di Carlo
,
J.
, and
Kim
,
S.
,
2018
, “
Mini Cheetah: A Platform for Pushing the Limits of Dynamic Quadruped Control
,”
IEEE International Conference on Robotics and Automation
,
Madrid, Spain
,
Oct. 1–5
, pp.
6295
6301
.
30.
Chatzinikolaidis
,
I.
, and
Li
,
Z.
,
2021
, “
Trajectory Optimization of Contact-Rich Motions Using Implicit Differential Dynamic Programming
,”
IEEE Robot. Autom. Lett.
,
6
(
2
), pp.
2626
2633
.
31.
Jallet
,
W.
,
Mansard
,
N.
, and
Carpentier
,
J.
,
2022
, “
Implicit Differential Dynamic Programming
,”
International Conference on Robotics and Automation
,
Philadelphia, PA
,
May 23–27
, pp.
1455
1461
.
32.
Pellegrini
,
E.
, and
Russell
,
R. P.
,
2017
, “
Applications of the Multiple-Shooting Differential Dynamic Programming Algorithm With Path and Terminal Constraints
,”
AAS/AIAA Astrodynamics Specialist Conference
,
Stevenson, WA
,
Aug. 20–27
.
33.
Liao
,
L. Z.
, and
Shoemaker
,
C. A.
,
1991
, “
Convergence in Unconstrained Discrete-Time Differential Dynamic Programming
,”
IEEE. Trans. Autom. Contr.
,
36
(
6
), pp.
692
706
.