橢圓軌跡直擺凸輪組合機(jī)構(gòu)的設(shè)計【說明書+CAD+PROE+仿真】
橢圓軌跡直擺凸輪組合機(jī)構(gòu)的設(shè)計【說明書+CAD+PROE+仿真】,說明書+CAD+PROE+仿真,橢圓軌跡直擺凸輪組合機(jī)構(gòu)的設(shè)計【說明書+CAD+PROE+仿真】,橢圓,軌跡,凸輪,組合,機(jī)構(gòu),設(shè)計,說明書,仿單,cad,proe,仿真
and as rol freedom final states of the unactuated DOFs are viewed as an indirect consequence of the profile of the policies. Dynamical systems are used as acceleration policies, providing the actuated system with convenient 1. Introduction are tasks. verse types viors. compe tball. a viors est posture transitions. Moreover, cyclic and composed tasks may be Procedures that synthesize behaviors rely on the availability of ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier Neurocomputin 15630-C02-01). Neurocomputing 72 (2009) 36243630 offering kinematic solutions to the problem of achieving the upc15838 (C. Angulo). specifications in terms of the workspace, which may include a complete path or simply the initial and final positions of the controlled element. These specifications may be derived from a direct human imposition or by path-planning methods, often 0925-2312/$-see front matter meanwhile, the resulting .com/locate/neucom g ARTICLE IN PRESS desired motor behavior. Subsequently, robot control methods are required to compute torque-level actions that drive the mechan- ism according to the given specifications. As an inheritance from experience in the control of robot manipulators, where the element that defines the workspace is the end-effector of a joint chain with a limited motion domain, most of the approaches to robot control consider a workspace reference as input data, computed off-line or generated within the methodology of the relevant approach, starting from pure kinematic information about the solution to the task. When workspace references are followed, the torque control actions are computed with the The key attribute in both approaches was the direct connection between the desired behavior and the torque commands, i.e. the workspace requirements were almost null, leaving the system free to be modulated in order to fulfill the behavior, i.e. lift a defined weight. Both approaches use optimization as the main route; nevertheless, the analytical solution in 19 implies a detailed formulation of the problem and its restrictions, which is perfectly viable for manipulators in structured environments, but this is not the case for AMRs. On the other hand, the solution given in 15 is not analytical but numerical; it searches in the solution space using a learning algorithm, i.e. a numerical optimization of policy D. Pardo et al. / Neurocomputing 72 (2009) 36243630 3625 purpose of achieving desired joint accelerations, which implies compensation of rigid-body dynamics, canceling the effects of gravity, and other nonlinearities. Some of the approaches described in 9 consider the kinematics and rigid-body dynamics of the robot in the genera- tion of the workspace reference, but this is usually done to establish sufficient conditions for the accomplishment of the behavior, rather than to take advantage of the kinematics and rigid-body dynamics. An example of this is evidenced in the gait control system used in the ASIMO humanoid (which has one of the best humanoid gaits developed so far), where control forces are computed to maintain balance stability during gait execution, i.e. the effects of gravity are canceled while suitable accelerations are imposed to accomplish the motion; consequently, the energy consumption is more than 15 times (scaled) the amount required during human gait 18. However, it has been demonstrated that during human gait, not only are the dynamic effects of gravity not always canceled but also they are actually employed 14. It seems that the current strategies to carry out a given motor behavior are well-suited to obtaining a particular solution of the problem. Thus, the space of behavior solutions is narrowed by the approach used rather than by the capacities of the robot. However, some results using new perspectives showevidence of alternative solutions, ones that favor the execution of the motion and expand the capacities of the robot. For instance, results in 19 show that the given factory-maximum payload of an industrial manipulator (a 6-DOF PUMA-762) can be greatly increased by exploring new zones of the solution space with suitable control policies. The approach used was the formulation of a parameter- ized optimal control problem, where body dynamics and time ranges were stated as restrictions. Torque-level actions were found such that the payload lifted by the manipulator was much more (six times) than the load reachable by the default aggregation of path planning, workspace reference and torque control. Surpris- ingly, contrary to standard procedures, the resulting trajectories included singularities, letting the robot rest the payload against its structure on its way to the goal. Along the same lines, a similar result was later presented in 15, where a simple manipulator (2D, 3-DOF) accomplished a weightlifting behavior, avoiding workspace restrictions in the formulation. Besides maximizing the payload lifted, the results included quite different workspace trajectories that accomplished the same behavior. Fig. 1. (a) Initial state of the robot for the standing-up behavior. (b) Desired final configur parameters by means of iterative evaluation of experiences. Nevertheless, its control framework, based on the coordination of lower-level PID controllers, cannot be directly extrapolated to more complex problems. Recently, the attention given to the use of learning as a paradigm to exploit the capacity of robots has being growing. The latest publications on learning of motions by robots 7 revolve around early results on imitation 6, where the initial solution in the workspace is directly guided by a human, and afterwards the robot joints are controlled by parameterized policies that are intended to accomplish the behavior. The type of the functions used as control policies is that of dynamical systems (DSs). The optimal parameters of the policy are found using reinforcement learning (RL) 17 algorithms. Extensive work on RL algorithms suitable for computing robot control policies has been presented in 13. In the methodology presented in this paper, we assume the availability of kinematic information equivalent to the initial and final states of the desired behavior. In contrast to the imitation approach, a reference in the workspace is not specified. Our control framework, based on local control policies at the joint acceleration level, attracts actuated DOFs to the desired final configuration; meanwhile, the resulting final states of the unactuated DOFs are viewed as a consequence of the actuated acceleration profiles. DSs are used as acceleration controllers, providing the systemwith these attractor properties. Additionally, the control policies are parameterized around imposed simple primitives, which may by deformed by means of changes in the parameters in order to obtain complex accelerations. Subsequently, we present an example that provides a qualita- tive description of the type of problems that this paper addresses. The standing-up behavior illustrates those motor behaviors of underactuated systems in which dynamic balance is compro- mised. Fig. 1 shows the initial and final states for this behavior. Note that the behavior is enclosed by a motion where the initial and final velocities are equal to zero. The robot starts in a lying- down posture and should stand up, ending up as shown in Fig.1b. However, gravity and other nonlinearities can influence the behavior in such a way that the robot ends up in a different state (see Fig. 1c). The achievement of desired values for the actuated DOFs is not enough for the desired behavior to be the result. ation. (c) Undesired final configuration, where motor behavior has failed. ARTICLE IN PRESS u Mq q N _ q;q4 D. Pardo et al. / Neurocomputing 72 (2009) 362436303626 where u 2 R n represents the vector of joint torques, M 2 R nC2n is the mass matrix that describes the inertial properties of the robot, and N 2 R n contains nonlinear terms such as Coriolis forces and gravity. The accelerations of the actuated DOFs q b 2 R b and the unactuated DOFs q c 2 R c , may be considered separately without We present the basic definitions and a formal formulation of the problem in Section 2, and then the methodology for computing controllers is described in Section 3. A demonstration motor behavior in a simulated humanoid is synthesized by applying this methodology in Section 4. Finally, the conclusions are gathered together in Section 5. 2. Controlling robot motor behaviors The configuration of a robot specifies the location of all parts of the mechanismwith respect to a reference frame, and is given bya vector of independent position variables (generalized coordinates) q 2 R n ; the number of generalized coordinates defines the number of degrees of freedom of the mechanism. The joint space is the set of possible values for the joints of the robotH2 R b ; brn. The state of the robot is given by the set formed by the positions and velocities of the generalized coordinates, i.e. z q _ qC1382R 2n . An element controlled by a robot also has a configuration that determines its position and orientation. This configuration defines the workspace, denoted by x 2 R m . The geometrical relation that maps the generalized coordinates to the configuration of the controlled element is known as the forward kinematics x f kinem q1 In those cases where the number of DOFs of a robot is greater than that of the controlled element, i.e. n4m, the system is called redundant owing to the existence of more DOFs than are required for the control of the operational space. The relation between the velocities and accelerations in the operational space and those in the configuration space is obtained from the derivative and second derivative of (1), leading to _ x Jq _ q x Jq q _ Jq _ q 2 where Jq is the Jacobian matrix of f kinem q. Now that the elements that describe the motion of a robot have been established, we now focus on the forces that generate the motion. The relation between the control vector (the applied forces) u 2 R b and the resulting accelerations q 2 R n is given by the robot dynamics, and may be written in the constrained form q f 1 q; _ q; tf 2 q; _ q;tu 3 The system is called underactuated when the configuration is not able to command an instantaneous acceleration in an arbitrary direction of q. In the case of an AMR, the assumption that all joints are actuated is coherent, but, because they are not secured to the ground, the AMR can move in directions other than those allowed by its actuators, and therefore it has more DOFs than joints, i.e. rankf 2 C138bon. The nonactuated DOF may be represented as virtual joints that correspond to the motion of a reference frame at the robot base with respect to the inertial frame 16, here this virtual joints are denoted as q c 2 R c , where c n C0 b. Provided that articulated robots are rigid bodies, their dynamics can be described as second-order systems where torque commands (control actions) interact with the rest of the forces acting on the mechanism. A well-known model describing this interaction is given by loss of generality 2. We can expand the dynamics of the system in (4) as u b 0 C20C21 M bb M cb M bc M cc # q b q c # N b N c # 5 where u b 2 R b is the vector of actuation torques, and N b ;N c C138 T gather together the projections of other forces (gravitational forces, Coriolis forces and ground reactions) in the corresponding subspace. The inertia matrix M is divided into four matrices (M bb ;M cb ;M bc ;M cc ), consequently relating causes and effects for actuated and unactuated accelerations. We denote an element of the given robot posture set byP, i.e. P2fsitting; standing; arm raised; etc:g. Let us assume that a certain function P maps robot configurations q 2 R n to the corresponding robot postures, i.e. P : q-P. Let us denote by Q P the set of all configurations that represent the posture P, i.e. Q P fqjPqPg. We define as motor behavior the process of taking the robot from an initial robot posture Q 0 to a final posture Q F within a delimited amount of time 0otot f .Itis assumed that representative elements of the initial and final posture sets are known, i.e. q 0 2Q 0 ;q f 2Q F , including actuated and unactuated DOFs, i.e. q 0 q 0b ;q 0c , q f q fb ;q fc .Itis assumed that all joints are actuated, i.e. q b H. We control actuated joints at the acceleration level using dynamical systems as policies, which allows on-line modification of acceleration commands. Provided the DSs have attractor properties, the policies are designed to attract each joint to its corresponding final state q fb H g , while it follows a smooth trajectory. In order to select an appropriate policy, we first define the following term as a local primitive: p i ta i C0 _ y i tb i y g i C0y i tC138; i 2f1; .; bg6 This represents the state of the velocity and position error of joint i at instant t. The parameters a i ;b i locally encode the properties of the primitive. We define as a local control policy the combination of the local primitives involved q d b;i t X b j1 o ij C1 p j t7 where the o ij are scaling factors for the influence of the primitives j on the desired accelerations q d b;i of individual joints. In this paper, we assume that there exists a lower-level controller that converts the desired accelerations into suitable torque commands (e.g. by the computed torque method). Therefore we assume that the actual accelerations of the actuated DOFs are given by (7). Before continuing, and for clarificationpurposes, we define as a basic policy the case where the scaling factors are given by o i;j 0; iaj 1; i j ( 8 When a basic policy is used, the position and velocity errors of the joints behave as a simple damped harmonic system with an attractor in (y g i ;0). The dynamics of this system are modulated when the nondiagonal weighting factors are not zero. This allows the generation of complex forms in the actual profiles followed by the joints. The whole-body policy, generically denoted by q b hq; _ q, may be explicitly presented, dropping the time dependency notation, in a single expression q b WC1 A _ q b q b # 9 where the matrix A 2 R bC22b is formed by the set of a;b-factors from the local primitives represented in (6), and the matrix W 2 bC2b R gathers together the scaling factors of the policy. ARTICLE IN PRESS (13). Given that the objective of the task is to take the robots states from an initial to a final value, the euclidian distance from by the expected value of a special term G APP W r E b 2 C1 RW r mn j m C1n j C20C21 17 where G APP W r represents the estimated value of the gradient in the point W r , whereas n j is a uniformly random unitary vector in the search space and m is a size factor; the term b 2 is the size of the vector of parameters. It has been demonstrated in 4 that with a suitable selection of these parameters, the gradient approximation leads to convergence at least to a local minimum. In order to solve (17), attempts, also known as roll-outs, to perform the task must take place (with 0ojrJ). The resulting performance of a single roll-out is obtained using an acceleration policy with perturbed parameters, i.e. RW r mn j . In practice, a single roll-out may be enough for an unbiased gradient estimate 4. Here we present the corresponding algorithm A simp ed in ding int wh al confi The humanoid ankle, ever n D. Pardo et al. / Neurocomputing 72 (2009) 36243630 3627 the resulting final states qt f to the desired states q f is a valid representation of the quality of the controllers R 1 Wq f C0 qt f C138 T q f C0 qt f C138 1=2 14 Moreover, as the unactuated DOFs are a virtual concept used for representation purposes, an equivalent measure of the distance function is obtained if the actual and desired values of the workspace vector are measured RWx f C0 xt f C138 T x f C0 xt f C138 1=2 15 This implies that the workspace variables x must be observable. If we aim to minimize this function, the computation of the optimal parameters may be performed by iteratively calculating the steepest descent for the distance function, i.e. W r1 W r gr W RC138 WWr 16 where r denotes the iteration number, and 0ogo1 is a learning rate. The gradient of R with respect to W evaluated in W r is written as r W RC138. However, because an analytical computation of this gradient is not viable in the current framework, we approximate the gradient based on data collected from the robots experience, i.e. we use an instance of the Policy Gradient Reinforcement Learning (PGRL) algorithms (see e.g. 13). The With the intention of analyzing the consequences of the joint policies for the whole-body behavior, using (5) we write the resulting effects on the unactuated DOFs, generically denoted by q c g q b , explicitly as q c M C01 cc C0M bc q b C0N c C13810 Note that the external forces N c interact with the forces related to q b and together modify the motor behavior, i.e. they affect (2) and generate workspace trajectories from the internal dynamics of the robot. The problem of completing a motor behavior can be solved if a function h C3 is found such that q C3 c g3h C3 q; _ q11 accomplishing Z t f 0 Z t f 0 g3h C3 q; _ qdtdt C25 q fc C18 12 The parameters w i;j in (7) must be such that above condition is satisfied. An optimization framework is presented in the following section for the establishment of a numerical methodology to find the optimal parameters of the acceleration policies and therefore solve for the motor behavior. 3. Learning control policies The problem of finding the parameters of the acceleration policies that fulfill the motor behavior has the necessary elements for it to be formulated as a parametric optimization problem. We evaluate the quality of a given set of parameters as the objective function, by measuring the performance of the robot attempting to accomplish the task RWHq b ; _ q b ; _ q c ;q c ; q b 13 where R 2 R is a scalar that represents how the state and actions of the robot proceed for some particular parameters in the policy. Without loss of generality, we may specify a type of relation for stochastic approximation of the gradient proposed in 4 is given Fig. 2. Fujitsu Hoap-2 simulated humanoid used as a test bed for motion behavior synthesis. absolute displacement) is q 2 R , n 7. As mentioned in the humanoid, needed the size of the vector of generalized coordinates to determine its posture and orientation (assuming no How , owing to the dynamic balance condition of the right hip and knee, i.e. b 4. (see Fig. 3 for details). uses four joints during the motion: left ankle, fin gurations of the task are shown in Fig. 2. po ere the humanoid stands using justone leg. The initial and stan -up position, the goal of this task is to reach an equilibrium test le motor behavior, named humanoid equilibrium, was a simulated humanoid (Fujitsu Hoap-2). Starting from a where /C1S denotes the average of the gradient estimators. 4. Results input: m; b;x f 1 repeat 2 select n j 2 R b 2 , Jn j J 1;n j C24U0;1 3 perform roll-out with perturbed parameters W mn j 4 sense controlled element configuration xt f 5 compute the performance RWmn j x f C0 xt f C138 T x f C0 xt f C138 1 2 6 estimate the gradient vector G APP W r b 2 C1 RWrmn j m C1n j DE 7 until gradient estimate G APP W r converged. return: gradient estimate G APP W r (a) Initial posture; and (b) final posture. ARTICLE IN PRESS description of the methodology in Section 2, a goal is manually established for each actuated DOF q f;b . Parameters for the acceleration primitives of joint i, i.e. a i ;b i , are imposed to generate standard exponential trajectories, such as the one shown (dotted line) in Fig. 4. Using the basic policy defined in (8) to generate accelerations, despite the fact that the joints arrive at their desired positions, the unactuated DOFs finish in a completely different state, i.e. the robot falls down. Using th
收藏