Abstract In this paper, the visual servoing problem is addressed by coupling nonlinear control theory with a convenient representation of the visual information used by the robot。 The visual representation, which is based on a linear camera model, is extremely compact to comply with active vision requirements。 The devised control law is proven to ensure global asymptotic stability in the Lyapunov sense, assuming exact model and state measurements。 It is also shown that, in the presence of bounded uncertainties, the closed-loop behavior is characterized by a global attractor。 The well known pose ambiguity arising from the use of linear camera models is solved at the control level by choosing a hybrid visual state vector including both image space (2D) information and 3D object parameters。 A method is expounded for on-line visual state estimation that avoids camera calibration。 Simulation and real-time experiments validate the theoretical framework in terms of both system convergence and control robustness。 ©1999 Elsevier Science B。V。 All rights reserved。71998
1。 Introduction
Vision provides a powerful set of sensory pro-cesses for a robot moving in unstructured environ-ments, since it permits noncontact measurement of the external world and increases task accuracy [7]。 Early vision-based robotic systems operated in an open-loop fashion using an approach called static look-then-move [11]。 The accuracy can be largely im-proved by position loop closure based on visual feed-back。 The main objective of this approach, referred
to as visual servoing [10], is to control the position of the robot with respect to a target object or a set of target features。 Fig。 1 shows the general blocks of a visual servoing system。 The system accepts two inputs: a description of the task to be performed (ref-erence input), and object motion (disturbance input)。 Visual analysis provides a description of the visual environment as related to the current camera position; the description is processed by the controller block, which produces camera motion。
Visual servoing systems can be classified as “proper” visual servos or dynamic look-and-move systems [11]。 In the first case, the joint torques are the output of the visual-based controller, while in the second case the output of the controller is a reference twist screw fed to an inner position/velocity loop。
244 F。 Conticelli et al。 / Robotics and Autonomous Systems 29 (1999) 243–256
Fig。 1。 The basic blocks of the visual servoing system。
Concerning the control aspects of visual servoing, two main paradigms can be outlined: position-based and image-based servoing。 In the first one, the error is defined in the three-dimensional (3D) space based on image feature extraction and relative position esti-mation [15,20]。 In such a way, robot tasks are easily specified in the robot workspace, but the estimated quantities used in the feedback are heavily affected by camera calibration errors。 In image-based servo-ing instead, any visual task is described in the image plane as a desired evolution of object appearance to-wards a goal one, since the error is computed in terms of image features [1,8,9,17,18]。
The problem of camera positioning with respect a planar object is also solved in [21] by including 3D variables in the state space representation。 In that case the estimation procedure of the rotation matrix and the distance ratio is based on the homography between the feature points extracted from two images of the same target object。 Notice that the stable control law proposed in [21] requires the estimation of the axis and angle of rotation at each iteration, and robustness analysis with respect to these estimated quantities is not formally considered。
In this paper, we propose a hybrid affine visual ser-voing approach, in which image space (2D) informa-tion and 3D object parameters are used together in the control design of a dynamic look-and-move system。 A linear model of camera–object interaction (affine ob-ject shape transformations) is assumed, thus reducing the size of visual representation according to active vi-sion requirements [3]。 We demonstrate that the partial reintroduction of 3D information in the visual state representation permits to disambiguate image-based