四轴飞行器算法研究下载_在线阅读_6

is_932206

暂无简介

四轴飞行器算法研究 Multi-Agent Quadrotor Testbed Control Design: Integral Sliding Mode vs. Reinforcement Learning∗ Steven L. Waslander†, Gabriel M. Hoffmann†,‡ Ph.D. Candidate Aeronautics and Astronautics Stanford University {stevenw, gabeh}@stanford.edu Jung Soon Jang Research A...

Multi-Agent Quadrotor Testbed Control Design: Integral Sliding Mode vs. Reinforcement Learning∗ Steven L. Waslander†, Gabriel M. Hoffmann†,‡ Ph.D. Candidate Aeronautics and Astronautics Stanford University {stevenw, gabeh}@stanford.edu Jung Soon Jang Research Associate Aeronautics and Astronautics Stanford University jsjang@stanford.edu Claire J. Tomlin Associate Professor Aeronautics and Astronautics Stanford University tomlin@stanford.edu Abstract—The Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control (STARMAC) is a multi-vehicle testbed currently comprised of two quadrotors, also called X4-ﬂyers, with capacity for eight. This paper presents a comparison of control design techniques, speciﬁcally for outdoor altitude control, in and above ground effect, that accommodate the unique dynamics of the aircraft. Due to the complex airﬂow in- duced by the four interacting rotors, classical linear techniques failed to provide sufﬁcient stability. Integral Sliding Mode and Reinforcement Learning control are presented as two design techniques for accommodating the nonlinear disturbances. The methods both result in greatly improved performance over classical control techniques. I. INTRODUCTION As ﬁrst introduced by the authors in [1], the Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Con- trol (STARMAC) is an aerial platform intended to validate novel multi-vehicle control techniques and present real-world problems for further investigation. The base vehicle for STARMAC is a four rotor aircraft with ﬁxed pitch blades, referred to as a quadrotor, or an X4-ﬂyer. They are capable of 15 minute outdoor ﬂights in a 100m square area [1]. Fig. 1. One of the STARMAC quadrotors in action. There have been numerous projects involving quadrotors to date, with the ﬁrst known hover occurring in October, 1922 [2]. Recent interest in the quadrotor concept has been sparked by commercial remote control versions, such as the ∗Research supported by ONR under the MURI contract N00014-02-1- 0720 called “CoMotion: Computational Methods for Collaborative Motion”, by the NASA Joint University Program under grant NAG 2-1564, and by NASA grant NCC 2-5536. † These authors contributed equally to this work. ‡Funding provided by National Defense Science and Engineering Grant. DraganFlyer IV [3]. Many groups [4]–[7] have seen signif- icant success in developing autonomous quadrotor vehicles. To date, however, STARMAC is the only operational multi- vehicle quadrotor platform capable of autonomous outdoor ﬂight, without tethers or motion guides. The ﬁrst major milestone for STARMAC was autonomous hover control, with closed loop control of attitude, altitude and position. Using inertial sensing, the attitude of the aircraft is simple to control, by applying small variations in the relative speeds of the blades. In fact, standard integral LQR techniques were applied to provide reliable attitude stability and tracking for the vehicle. Position control was also achieved with an integral LQR, with careful design in order to ensure spectral separation of the successive loops. Unfortunately, altitude control proves less straightforward. There are many factors that affect the altitude loop specif- ically that do not amend themselves to classical control techniques. Foremost is the highly nonlinear and destabilizing effect of four rotor downwashes interacting. In our experi- ence, this effect becomes critical when motion is not damped by motion guides or tethers. Empirical observation during manual ﬂight revealed a noticeable loss in thrust upon descent through the highly turbulent ﬂow ﬁeld. Similar aerodynamic phenomenon for helicopters have been studied extensively [8], but not for the quadrotor, due to its relative obscurity and complexity. Other factors that introduce disturbances into the altitude control loop include blade ﬂex, ground effect and battery discharge dynamics. Although these effects are also present in generating attitude controlling moments, the differential nature of the control input eliminates much of the absolute thrust disturbances that complicate altitude control. Additional complications arise from the limited choice in low cost, high resolution altitude sensors. An ultrasonic ranging device [9] was used, which suffers from non- Gaussian noise—false echoes and dropouts. The resulting raw data stream includes spikes and echoes that are difﬁcult to mitigate, and most successfully handled by rejection of infeasible measurements prior to Kalman ﬁltering. In order to accommodate this combination of noise and disturbances, two distinct approaches are adopted. Integral Sliding Mode (ISM) control [10]–[12] takes the approach that the disturbances cannot be modeled, and instead designs a control law that is guaranteed to be robust to disturbances as long as they do not exceed a certain magnitude. Model-based reinforcement learning [13] creates a dynamic model based on recorded inputs and responses, without any knowledge of the underlying dynamics, and then seeks an optimal control law using an optimization technique based on the learned model. This paper presents an exposition of both methods and contrasts the techniques from both a design and implementation point of view. II. SYSTEM DESCRIPTION STARMAC consists of a ﬂeet of quadrotors and a ground station. The system communicates over a Bluetooth Class 1 network. The core of the aircraft are microcontroller circuit boards designed and assembled at Stanford, for this project. The microcontrollers run real-time control code, interface with sensors and the ground station, and supervise the system. The aircraft are capable of sensing position, attitude, and proximity to the ground. The differential GPS receiver is the Trimble Lassen LP, operating on the L1 band, providing 1 Hz updates. The IMU is the MicroStrain 3DM-G, a low cost, light weight IMU that delivers 76 Hz attitude, attitude rate, and acceleration readings. The distance from the ground is found using ultrasonic ranging at 12 Hz. The ground station consists of a laptop computer, to interface with the aircraft, and a GPS receiver, to provide differential corrections. It also has a battery charger, and joysticks for control-augmented manual ﬂight, when desired. III. QUADROTOR DYNAMICS The derivation of the nonlinear dynamics is performed in North-East-Down (NED) inertial and body ﬁxed coordinates. Let {eN, eE, eD} denote the inertial axes, and {xB,yB, zB} denote the body axes, as deﬁned in Figure 2. Euler angles of the body axes are {φ, θ, ψ} with respect to the eN, eE and eD axes, respectively, and are referred to as roll, pitch and yaw. Let r be deﬁned as the position vector from the inertial origin to the vehicle center of gravity (CG), and let ωB be deﬁned as the angular velocity in the body frame. The current velocity direction is referred to as ev in inertial coordinates. Fig. 2. Free body diagram of a quadrotor aircraft. The rotors, numbered 1− 4, are mounted outboard on the xB, yB, −xB and −yB axes, respectively, with position vectors ri with respect to the CG. Each rotor produces an aerodynamic torque, Qi, and thrust, Ti, both parallel to the rotor’s axis of rotation, and both used for vehicle control. Here, Ti ≈ ui kt1+0.1s , where ui is the voltage applied to the motors, as determined from a load cell test. In ﬂight, Ti can vary greatly from this approximation. The torques, Qi, are proportional to the rotor thrust, and are given by Qi = krTi. Rotors 1 and 3 rotate in the opposite direction as rotors 2 and 4, so that counteracting aerodynamic torques can be used independently for yaw control. Horizontal velocity results in a moment on the rotors, Ri, about −ev, and a drag force, Di, in the direction, −ev. The body drag force is deﬁned as DB , vehicle mass is m, acceleration due to gravity is g, and the inertia matrix is I ∈ R3×3. A free body diagram is depicted in Figure 2. The total force, F, and moment, M, can be summed as, F = −DBev + mgeD + 4∑ i=1 (−TizB −Diev) (1) M = 4∑ i=1 (QizB −Riev −Di(ri × ev) + Ti(ri × zB)) (2) The full nonlinear dynamics can be described as, mr¨ = F Iω˙B + ωB × IωB = M (3) where the total angular momentum of the rotors is assumed to be near zero, because they are counter-rotating. Near hover conditions, the contributions by rolling moment and drag can be neglected in Equations (1) and (2). Deﬁne the total thrust as T = ∑4 i=1 Ti. The translational motion is deﬁned by, mr¨ = F = −Rψ ·Rθ ·RφTzB + mgeD (4) where Rφ, Rθ, and Rψ are the rotation matrices for roll, pitch, and yaw, respectively. Applying the small angle ap- proximation to the rotation matrices, m ⎡ ⎣ r¨xr¨y r¨z ⎤ ⎦ = ⎡ ⎣ 1 ψ θψ 1 φ θ −φ 1 ⎤ ⎦ ⎡ ⎣ 00 −T ⎤ ⎦+ ⎡ ⎣ 00 mg ⎤ ⎦ (5) Finally, assuming total thrust approximately counteracts grav- ity, T ≈ T¯ = mg, except in the eD axis, m ⎡ ⎣ r¨xr¨y r¨z ⎤ ⎦ = ⎡ ⎣ 00 mg ⎤ ⎦+ ⎡ ⎣ 0 −T¯ 0T¯ 0 0 0 0 1 ⎤ ⎦ ⎡ ⎣ φθ T ⎤ ⎦ (6) For small angular velocities, the Euler angle accelerations are determined from Equation (3) by dropping the second order term, ω × Iω, and expanding the thrust into its four constituents. The angular equations become, ⎡ ⎣ Ixφ¨Iy θ¨ Izψ¨ ⎤ ⎦ = ⎡ ⎣ 0 l 0 −ll 0 −l 0 Kr −Kr Kr −Kr ⎤ ⎦ ⎡ ⎢⎢⎣ T1 T2 T3 T4 ⎤ ⎥⎥⎦ (7) where the moment arm length l = ||ri× zB|| is identical for all rotors due to symmetry. The resulting linear models can now be used for control design. IV. ESTIMATION AND CONTROL DESIGN Applying the concept of spectral separation, inner loop control of attitude and altitude is performed by commanding motor voltages, and outer loop position control is performed by commanding attitude requests for the inner loop. Accurate attitude control of the plant in Equation (7) is achieved with an integral LQR controller design to account for thrust biases. Position estimation is performed using a navigation ﬁlter that combines horizontal position and velocity information from GPS, vertical position and estimated velocity information from the ultrasonic ranger, and acceleration and angular rates from the IMU in a Kalman ﬁlter that includes bias estimates. Integral LQR techniques are applied to the horizontal com- ponents of the linear position plant described in Equation (6). The resulting hover performance is shown in Figure 6. As described above, altitude control suffers exceedingly from unmodeled dynamics. In fact, manual command of the throttle for altitude control remains a challenge for the authors to this day. Additional complications arise from the ultrasonic ranging sensor, which has frequent erroneous readings, as seen in Figure 3. To alleviate the effect of this noise, rejection of infeasible measurements is used to remove much of the non-Gaussian noise component. This is followed by altitude and altitude rate estimation by Kalman ﬁltering, which adds lag to the estimate. This section proceeds with a derivation of two control techniques that can be used to overcome the unmodeled dynamics and the remaining noise. 180 190 200 210 220 0 50 100 150 Time [s] So ni c Ra ng e [cm ] Fig. 3. Characteristic unprocessed ultrasonic ranging data, displaying spikes, false echoes and dropouts. Powered ﬂight commences at 185 seconds. A. Integral Sliding Mode Control A linear approximation to the altitude error dynamics of a quadrotor aircraft in hover is given by, x˙1 = x2 x˙2 = u + ξ(g, x) (8) where {x1, x2} = {(rz,des−rz), (r˙z,des−r˙z)} are the altitude error states, u = ∑4 i=1 ui is the control input, and ξ(·) is a bounded model of disturbances and dynamic uncertainty. It is assumed that ξ(·) satisﬁes ‖ξ‖ ≤ γ, where γ is the upper bounded norm of ξ(·). In early attempts to stabilize this system, it was observed that LQR control was not able to address the instability and performance degradation due to ξ(g, x). Sliding Mode Control (SMC) was adapted to provide a systematic approach to the problem of maintaining stability and consistent perfor- mance in the face of modeling imprecision and disturbances. However, until the system dynamics reach the sliding mani- fold, such nice properties of SMC are not assured. In order to provide robust control throughout the ﬂight envelope, the Integral Sliding Mode (ISM) technique is applied. The ISM control is designed in two parts. First, a standard successive loop closure is applied to the linear plant. Second, integral sliding mode techniques are applied to guarantee disturbance rejection. Let u = up + ud up = −Kpx1 −Kdx2 (9) where Kp and Kd are proportional and derivative loop gains that stabilize the linear dynamics without disturbances. For disturbance rejection, a sliding surface, s, is designed, s = s0(x1, x2) + z s0 = α(x2 + kx1) (10) such that state trajectories are forced towards the manifold s = 0. Here, s0 is a conventional sliding mode design, z is an additional term that enables integral control to be included, and α, k ∈ R are positive constants. Based on the following Lyapunov function candidate, V = 12s 2 , the control component, ud, can be determined such that V˙ < 0, guranteeing convergence to the sliding manifold. V˙ = ss˙ = s [ α(x˙2 + kx˙1) + z˙ ] = s [ α(up + ud + ξ(g, x) + kx2) + z˙ ] < 0 (11) The above condition holds if z˙ = −α(up + kx2) and ud can be guaranteed to satisfy, s [ ud + ξ(g, x) ] < 0, α > 0 (12) Since the disturbances, ξ(g, x), are bounded by γ, deﬁne ud to be ud = −λs with λ ∈ R. Equation (11) becomes, V˙ = s [ α(−λs + ξ(g, x) ] ≤ α [ − λ|s|2 + γ|s| ] < 0 (13) and it can be seen that λ|s| − γ > 0. As a result, for up and ud as above, the sliding mode condition holds when, |s| > γ λ (14) With the input derived above, the dynamics are guaranteed to evolve such that s decays to within the boundary layer, γ λ , of the sliding manifold. Additionally, the system does not suffer from input chatter as conventional sliding mode controllers do, as the control law does not include a switching function along the sliding mode. V. REINFORCEMENT LEARNING CONTROL An alternate approach is to implement a reinforcement learning controller. Much work has been done on continuous state-action space reinforcement learning methods [13], [14]. For this work, a nonlinear, nonparametric model of the system is ﬁrst constructed using ﬂight data, approximating the system as a stochastic Markov process [15], [16]. Then a model-based reinforcement learning algorithm uses the model in policy-iteration to search for an optimal control policy that can be implemented on the embedded microprocessors. In order to model the aircraft dynamics as a stochas- tic Markov process, a Locally Weighted Linear Regression (LWLR) approach is used to map the current state, S(t) ∈ R ns , and input, u(t) ∈ Rnu , onto the subsequent state esti- mate, Sˆ(t+1). In this application, S = [ rz r˙z r¨z V ], where V is the battery level. In the altitude loop, the input, u ∈ R, is the total motor power, u. The subsequent state mapping is the summation of the traditional LWLR estimate, using the current state and input, with the random vector, v ∈ Rns , representing unmodeled noise. The value for v is drawn from the distribution of output error as determined by using a maximum likelihood estimate [16] of the Gaussian noise in the LWLR estimate. Although the true distribution is not perfectly Gaussian, this model is found to be adequate. The LWLR method [17] is well suited to this problem, as it ﬁts a non-parametric curve to the local structure of the data. The scheme extends least squares by assigning weights to each training data point according to its proximity to the input value, for which the output is to be computed. The technique requires a sizable set of training data in order to reﬂect the full dynamics of the system, which is captured from ﬂights ﬂown under both automatic and manually controlled thrust, with the attitude states under automatic control. For m training data points, the input training samples are stored in X ∈ R(m)×(ns+nu+1), and the outputs correspond- ing to those inputs are stored in Y ∈ Rm×ns . These matrices are deﬁned as X = ⎡ ⎢⎣ 1 S(t1) T u(t1) T . . . . . . . . . 1 S(tm) T u(tm) T ⎤ ⎥⎦ , Y = ⎡ ⎢⎣ S(t1 + 1) T . . . S(tm + 1) T ⎤ ⎥⎦ (15) The column of ones in X enables the inclusion of a constant offset in the solution, as used in linear regression. The diagonal weighting matrix W ∈ Rm×m, which acts on X , has one diagonal entry for each training data point. That entry gives more weight to training data points that are close to the S(t) and u(t) for which Sˆ(t + 1) is to be computed. The distance measure used in this work is Wi,i = exp ( −||x(i) − x|| 2τ2 ) (16) where x(i) is the ith row of X , x is the vector [ 1 S(t) T u(t) T ], and ﬁt parameter τ is used to adjust the range of inﬂuence of training points. The value for τ can be tuned by cross validation to prevent over- or under-ﬁtting the data. Note that it may be necessary to scale the columns before taking the Euclidean norm to prevent undue inﬂuence of one state on the W matrix. The subsequent state estimate is computed by summing the LWLR estimate with v, Sˆ(t + 1) = ( X T WX ) −1 X T W T x + v (17) Because W is a continuous function of x and X , as x is varied, the resulting estimate is a continuous non-parametric curve capturing the local structure of the data. The matrix computations, in code, exploit the large diagonal matrix W ; as each Wi,i is computed, it is multiplied by row x(i), and stored in WX . The matrix being inverted is poorly conditioned, because weakly related data points have little inﬂuence, so their contribution cannot be accurately numerically inverted. To more accurately compute the numerical inversion, one can perform a singular value decomposition, (XTWX) = UΣV T . Then, numerical error during inversion can be avoided by using the n singular values σi with values of σmaxσi < Cmax, where the value of Cmax is chosen by cross validation. In this work, Cmax ≈ 10 was found to minimize numerical error, and was typically satisﬁed by n = 1. The inverse can be directly computed using the n upper singular values in the diagonal matrix Σn ∈ Rn×n, and the corresponding singular vectors, in Un ∈ Rm×n and Vn ∈ Rm×n. Thus, the stochastic Markov model becomes Sˆ(t + 1) = VnΣ −1 n U T nX T W T x + v (18) Next, model-based reinforcement learning is implemented, incorporating the stochastic Markov model, to design a controller. A quadratic reward function is used, R(S,Sref ) = −c1(rz − rz,ref ) 2 − c2r˙ 2 z (19) where R : R2ns → R, c1 > 0 and c2 > 0 are constants giving reward for accurate tracking and good damping re- spectively, and Sref = [ rz,ref r˙z,ref r¨z,ref Vref ] is the reference state desired for the system. The control policy maps the observed state S onto the input command u. In this work, the state space has the constraint of rz ≥ 0, and the input command has the constraint of 0 ≤ u ≤ umax. The control policy is chosen to be π(S,w) = w1 + w2(rz − rz,ref ) + w3r˙z + w4r¨z (20) where w ∈ Rnc is the vector of policy coefﬁcients w1, . . . , wnc . Linear functions were sufﬁcient to achieve good stability and performance. Additional terms, such as battery level and integral of altitude error, could be included to make the policy more resilient to differing ﬂight conditions. Policy iteration is performed as explained in Algorithm 1. The algorithm aims to ﬁnd the value of w that yields the greatest total reward Rtotal, as determined by simulating the system over a ﬁnite horizon from a set of random initial conditions, and summing the values of R(S,Sref ) at each state encountered. Algorithm 1 Model-Based Reinforcement Learning 1: Generate set S0 of random initial states 2: Generate set T of random reference trajectories 3: Initialize w to reasonable values 4: Rbest ← −∞, wbest ← w 5: repeat 6: Rtotal ← 0 7: for s0 ∈ S0, t ∈ T do 8: S(0) ← s0 9: for t = 0 to tmax − 1 do

本文档为【四轴飞行器算法研究】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

四轴飞行器算法研究

热门搜索

历史搜索