Abstract
The scope of this chapter is the development of an aerial manipulator platform using an octarotor drone with an attached manipulator. An on-board spherical camera provides visual information for the drone’s surroundings, while a Pan-Tilt-Zoom camera system is used to track targets. A powerful computer with a GPU offers significant on-board computational power for the visual servoing of the aerial manipulator system. This vision system, along with the Inertial Management Unit based controller provides exemplary guidance in confined and outdoor spaces. Coupled with the manipulator’s force sensing capabilities the system can interact with the environment. This aerial manipulation system is modular as far as attaching various payloads depending on the application (i.e., environmental sensing, facade cleaning and others, aerial netting for evader-drone geofencing, and others). Experimental studies using a motion capture system are offered to validate the system’s efficiency.
Keywords
- aerial manipulation
- visual localization
1. Introduction
The introduction of drones has revolutionized many sectors, including but not limited to cinematography [1], search and rescue [2, 3], maintenance [4], surveillance [5, 6], delivery of goods and transportation [7, 8].
The main components of a drone are its Propelling System and its Flight Control Unit (FCU). The propelling system provides the necessary thrust to change the attitude of the drone, described by its pitch, roll and yaw angles, and thus its three dimensional motion. The dominant propelling system currently is composed by propellers driven by a brushless motor and an Electronic Speed Controller (ESC) combination. The FCU is the “brain” of the drone, since it issues the control commands to the ESCs for changing the attitude and the pose of a drone. It usually contains GPS receiver(s), accelerometer(s), gyroscope(s), magnetometer(s) and barometer(s) coupled to environment sensing devices like laser scanners to extract the current pose of the drone. The output of a FCU is computed by taking into account the current pose and the desired reference.
Multi-rotor drones have been very popular among researchers with their naming typically by the rotor count (tricopters, quadcopters, hexacopters, and octacopters). The drone’s thrust increases with the number of rotors allowing the lift of higher payloads at the expense of a reduced flight time, and power tethering systems are usually sought [9].
The majority of the off-the-self drones have a 1-2 kg payload capability with very few drones being capable of lifting an order of higher magnitude [10]. This is primarily due to the FCU’s necessary tuning, the advanced ESCs and the need to abide to the laws imposed by each country’s regulatory authority.
Pertaining to the described challenges, this chapter presents a drone that based on its mission can be modular in terms of software and hardware while lifting a high payload. The drone can operate either indoors or outdoors and has navigation and mapping capabilities as well as can interact with the environment through an attached robot manipulator.
In Section 2 the mechatronic design of the drone is presented, while in Section 3 the drone’s software for localization is explained and evaluated. The drone’s ability to perform either in a collaborating or an adversarial environment using computer vision is discussed in Section 4. The aerial manipulation concept is addressed in Section 5, followed by Concluding remarks.
2. Drone’s mechatronic design
2.1 Drone electric power units
The developed octarotor drone has a take-off weight of 40 kg and a 30 min flight time. The drone’s frame was designed and fabricated in collaboration with Vulcan UAV©. The authors’ input on this aspect is related with both extending the bare-bone design of Vulcan to accommodate for payload carriage, as well as fabricating the final prototype and mounting all the additional modalities mentioned in the sequel. The backbone structure consists of three ø 25 mm, 1200 mm length aluminum tubes in a triangular cross sectional configuration. Four 575 mm length aluminum rectangular arms attached at each end of this structure and carry two motors in a coaxial configuration. The arms are fixed to the main frame using a 5 mm thick carbon plate. The resulting “H-frame” configuration can be visualized in Figure 1.

Figure 1.
Drone’s backbone structure.
Although the lower motor provides 25% less thrust [11] it offers some redundancy against single motor failure. The selected 135 KV KDE© brushless motors coupled with ø71.12 cm custom designed carbon propellers, collectively provide 37.2 kg of thrust at 50% throttle input. The extra thrust can be used for rapid maneuvering of the drone and for exerting forces by the aerial manipulator shown in Section 5.
Power is provided by a 12S 22 Ah LiPo battery pair connected in parallel to the Power Distribution Board (PDB). At 50% thrust with full payload while hovering, the octarotor’s motors sink 11.7 A each, resulting in a flight time of
Two carbon rods of ø12 mm are fixed at the underside of the mainframe tubes for payload carriage. The maximum payload weight is 30 kg and can be easily dismantled from the main frame using quick release clamps. Similarly, the retractable landing gear assembly is attached with these clamps to the main frame tubes for enhanced modularity, as shown in Figure 2. The gear can retract within a

Figure 2.
Landing gear detail (left) and payload assembly with battery holder (right).
Additional power for peripherals and sensing modalities can be supplied through a dedicated 750 W buck converter, mounted on the payload carrier assembly, as shown in Figure 3. The converter is contained within a custom 3D printed case and standard Unmanned Aerial Vehicle (UAV)

Figure 3.
Enhanced power distribution board (left) and i7-minicomputer (right).
2.2 Flight command unit and related software
The PixHawk Cube FCU was selected [13] featuring triple redundant dampened Inertial Measurement Units (IMUs), with a modular design and industrial standard I/O connectors. Additional telemetry and R/C circuits are deployed to enable monitoring and intervention and comply with flying regulations.
The
A high processing power 8th generation Intel NUC i7-computing unit with 32 GB RAM and 1 TB SSD, shown in Figure 3, was mounted symmetrically to the buck converter on the underside of the main frame. This 90 W computing unit allows for online computations on demanding tasks such as the visual object tracking methods of Section 4, as well as the easy development of autonomous flying applications.
On the software side, the ArduCopter flight stack [15] was selected to run on the FCU. The pose estimation is carried through a sophisticated Extended Kalman Filter (EKF) at 400 Hz. The Intel NUC companion computer is serially connected to the FCU at a baud rate of 1 Mbps and the communication packages are following the MAVlink protocol. The NUC’s operating system was Ubuntu Linux 16.04 and all applications are developed through the Robot Operating System (ROS) and MAVROS [16] middleware with a 50 Hz refresh rate.
The developed drone without any payload can be visualized in Figure 4.

Figure 4.
Drone prototype.
3. Drone localization
3.1 Drone outdoor localization using RTK GNSS
The RTK enhancement feature of GPS is used for outdoor localization purposes. This is due to the more precise positioning [17] because the of the GPS satellite measurements’ correction using feedback from an additional stationary GPS module. The disadvantage of such systems is that their use is bounded to a significant pre-flight setup time which is inversely proportional to the achieved accuracy (cm range).
Although the internal loop of the flight controller operates at 400 Hz, the GPS receiver streams data at a lower rate of 5 Hz. In popular flight software such as ArduPilot, the aforementioned rate needs to be taken into consideration by the underlying EKFs running by the FCU. A typical comparison of the achieved accuracy using a drone in a hovering state can be seen in Figure 5.

Figure 5.
Drone’s EKF 3D-position output with (red) and without (blue) RTK correction.
The drone was flown in a hovering position with the RTK GPS module injecting measurements to the flight controller and the output of the FCU’s EKF was compared with and without the presence of the injected RTK measurements. The red line represents the EKF’s output based solely on the GPS signal, whilst the blue line indicates the same output when RTK correction (using a 30 min warmup period) is injected on the FCU.
The standard deviation was computed equal to 0.74 m, 0.47 m and 0.27 m for
3.2 Drone indoor localization
During indoor navigation: a) the lack of GPS guidance, b) pressure changes affecting the barometric sensor, and c) power lines affecting compass accuracy can severely affect the output of a FCU. With only the accelerometers and gyroscopes being unaffected, the injection of an external feedback source to the FCU is considered essential. Such feedback is usually based on visual techniques, such as those presented in [18, 19].
For experimentation purposes, the used Motion Capture System (MoCaS) [20] injects measurements in the ArduCopter flight stack. The system comprises of 24 Vicon cameras uniformly scattered within an orthogonal space of
The utilized ROS software at the MoCaS operates at 25 Hz and can efficiently wirelessly stream the measurements to the drone’s FCU. The latency time
Because of the MoCaS’s efficiency, its weighing to the EKF is ten times larger compared to the GPS’s weight when flying outdoors. Subsequently, the efficiency of the implementation is assessed by comparing EKF’s position output with and without MoCaS’s feedback injection. In Figure 6 the drone’s position error (in each axis) between the aforementioned two quantities is visualized, where the red, green and blue lines represent the error along the

Figure 6.
Drone’s EKF position error when MoCaS’ measurements are not injected.
Real time pose tracking is satisfactorily achieved and minor differences are attributed to the EKF’s weighting of the accelerometer and gyroscope measurements during calculations.
4. Drone awareness of surrounding environment
An important parameter on aerial navigation is awareness of the surrounding environment including being in close proximity between cooperating or evasive drones [21, 22] to avoid potential contacts.
High accuracy awareness may not be feasible [23] and can become prohibitive in indoor environments; visual sensors along with Lidars can assist in this aspect. A spherical camera provides an all-around visualization of the surroundings and can detect neighboring targets. A Pan-Tilt-Zoom (PTZ) camera with a limited Field of View (FoV) can then provide a more accurate description of this target. The suggested target relies on the detection of moving objects. Correlation techniques and/or deep learning Visual Object Tracking (VOT) methods [24] are employed for this purpose.
4.1 Environment awareness using a spherical camera
Rather than using several cameras with a limited Field of View (FoV) to observe the surrounding space, a 360° FoV camera [25] is used. The spherical camera records images in a “spherical format” which is comprised of two wide-angle frames stitched together to form a virtual sphere [25]. The image can be rectified to the classic distortionless rectilinear format of a pinhole camera [26]. However, due to the nature of the “spherical format,” it is preferable to split the image into smaller segments and rectify each one to achieve results closer to the pinhole camera model. Instead of splitting into equal sized square segments [27], each image is split into tiles based on orientation-independent circles. With every tile having a different a-priori known calibration, the rectification can be carried out for each one independently, without high computational cost. By applying the solution and rectifying the image in Figure 7, for a selection of

Figure 7.
Spherical flat image.

Figure 8.
Rectified N = 12 partitions for a single “spherical” frame.
For the case of collaborating drones, it is assumed that each one carries passive markers for visual recognition. Subsequently, the rectified images are processed for identification of these markers [28, 29, 30, 31] thus estimating the neighboring drone’s pose. For improved pose extraction, the solution of a multi-marker rhombicuboctahedron formation arrangement [32] is assumed to be present in each target.
The experimental setup for evaluation consists of the spherical camera mounted in a 2.7 m protruding stick, which subsequently is mounted to the underside of the octarotor using the generic mount base discussed in Section 5. A rhombicuboctahedron arrangement with markers at its faces is attached to a DJI-Mavic drone. Both UAVs were located within the MoCaS test volume, as shown in Figure 9. The quadrotor drone was flown in a randomized trajectory near the vicinity of the octarotor.

Figure 9.
Experimental setup for 360° camera relative localization.
In Figure 10 the relative 3D-flight path between the drones is presented. The results recorded from the MoCaS and the visual ones are shown, where for the cases of detecting the marker the relative accuracy these measurements was 2.2 cm respectively.

Figure 10.
3D-relative path inferred through the MoCaS and the visual method between two collaborating drones.
4.2 Visual object tracking using a pan-tilt-zoom camera
Having identified the adversary or collaborating drone, a PTZ-camera is utilized to track its motion. This Visual Object Tracking (VOT) problem is challenging when the drone is occluded, thus Long Term Efficient (LTE) algorithms are sought for moving objects. Despite the development of Short Term Efficient (STE) algorithms [33] using either correlation methods or deep learning ones, an initial bounding box containing the target is required. In the authors’ case, the developed VOT algorithm employs two methods relying on a comparison: (a) between the tracking of the points transformed based on the PTZ-parameters and those using an optical flow, and (b) between the homography matrix transformed points and the optical flow.
The first method is based on the PTZ known motion and IMU’s acceleration and gyroscope measurements (Figure 11), in order to estimate the motion of the pixels due to the motion of the camera in relation to the surroundings [34]. An IMU with triple accelerometers, gyroscopes, and magnetometers is attached to the PTZ-camera, as shown in Figure 11. While the enhancement provided by the PTZ camera allows for efficient VOT, the need to control its parameters (pan, tilt, and zoom) while placed on a floating base and at the presence of several occlusions needs to be addressed.

Figure 11.
PTZ-camera for visual object tracking.
The objective is to provide the bounding box

Figure 12.
Sample drone tracking setup.
A GPU-based background subtraction technique eliminates the background pixels leaving only the moving object pixels. The bounding box encapsulates all pixels of the moving drone and the pan and tilt angles are adjusted to position the centroid of the moving bounding box at the image’s center while the zoom is adjusted to enlarge this box. The communication between the i7-minicomputer and the PTZ-camera is shown in Figure 13, while the VOT algorithm is shown in Figure 14.

Figure 13.
PTZ-camera hardware tracking and control schematic.

Figure 14.
Pan-tilt-zoom/IMU and optical flow VOT algorithm.
The feature points are recognized in each frame and the transformation matrix between successive frames follows along [35]; the formulas provide the transformation based on the PTZ-parameters and an augmentation is needed to account for the camera’s translation, as provided by the on-board accelerometers. The pixels that correspond to static background objects will follow the predicted motion by the camera motion and coincide to the positions predicted by an optical flow based estimation, while the rest will be classified as belonging to moving objects of interest (Figure 15). The computations for the optical flow parallels that of the Lucas-Kanade method using a pyramidal scheme with variable image resolutions [36]. The basic optical flow premise is to discover the positioning of an image feature in the previous frame, in the current frame captured by the camera.

Figure 15.
Background/foreground estimation using Homography-based VOT.
The second method is relying only on visual feedback and homography calculations [37] between two successive frames and does not require either the PTZ or the IMU-measurements, as shown in Figure 16. Initially a set using “strong image features” is identified on the previous camera frame and an optical flow technique is used to estimate the position of the features in the current frame. The method involves the discovery of special image areas with specific characteristics.

Figure 16.
Homography-based VOT.
The algorithm used for finding the strong corners image features relies on the GPU-enhanced “goodFeaturestoTrack” [38]. Under the assumption that the background is formed by the majority of the pixels, a homography is calculated that transforms the features positions from the previous to the current frame; these correspond to the background pixels. The previous frame features are then transformed using the homography to get their position in the current frame. Herein, it is assumed that the background points transformed with the homography will coincide with the estimated ones by the optical flow, while the moving objects’ features estimated by the optical flow will diverge from the homography transformed pixels.
One downside of the technique is that when the tracked object remains static and blends with the background it is unable to identify it. In this case, a fast correlation-based STE-tracker relying on the MOSSE algorithm [39], is also used in order to estimate the drone’s position until new measurements of a moving drone are available. Several more robust but slower tracking algorithms were evaluated, including the KCF [40], CSRT [41], MIL [42], MedianFlow [43], TLD [44], and the MOSSE-algorithm was selected because of its fast implementation (600 Frames-per-Second (FpS)). A Kalman prediction scheme [45] was used to predict the bounding box and the one obtained from the MOSSE in the presence of noisy measurements of the moving object center, using a 2D-constant acceleration model for the estimated tracking window.
5. Aerial manipulation
A seven Degree-of -Freedom (DoF) robotic arm has been attached for exerting forces on surfaces in aerial manipulation tasks, such as grinding, cleaning or physical contact based inspection [46]. The Kinova Gen 2 Assistive 7DoF robot [47] was attached through a custom mount. This manipulator is characterized by a 2:1 weight to payload ratio, with the available payload at the end-effector being 1.2 kg grasped by the 3-finger gripper. Torque sensing is provided at each joint and these measurements along with the joint angles are communicated to the main computer at 100 Hz under ROS middleware.
For mounting the robot to the drone’s payload attachment rods, a generic payload mount base was designed and manufactured. The base is firmly mounted to the drone’s payload carrying rods utilizing quick attachment clamps. The construction material was selected to be T-6065 aluminum and features four 10 mm openings for attaching the payload. A second rigid base was similarly designed for attaching the robot’s base to the generic payload mount base using 10 mm hex bolts. An exploded view of the entire mounting configuration can be visualized in Figure 17. The aerial manipulator is shown in Figure 18.

Figure 17.
Universal mount of robotic manipulators on aerial platform.

Figure 18.
Aerial manipulation system with PTZ-camera.
The indoor position hold scheme of Section 3.2 was expanded [48, 49] so as to utilize the manipulator in a surface ultra-sound scanning scenario. The surface is placed at
The described scheme is aimed for future use in the Abu Dhabi airport’s Miedfiled Terminal [50], for scanning the integrity of critical structures such as facades and rooftop. Figure 19 presents the hovering pose of the physical prototype while scanning the surface, whilst the full video concept including moments of the experiment is available through the link given in [51].

Figure 19.
Surface ultra-sound scanning utilizing aerial manipulation.
6. Conclusions/discussion
In this chapter the mechatronic aspects (hardware and software) of a heavy lift drone are presented. This drone can operate either indoors or outdoors in an autonomous manner. Equipped with spherical and PTZ cameras, the drone has environment perception capabilities and can collaborate with other drones. A robot manipulator is attached at the drone for physical interaction purposes. The ability to carry out the aforementioned tasks in an accurate and modular manner depicts the efficiency of the system for future robotic aerial applications of increased complexity. However, many challenges are yet to be examined. The authors’ aim is to focus future research on autonomous navigation in confined environments as well as high interaction forces aerial manipulation [52].
In aerial manipulation, the challenge lies with the forces at the tip of a stiff 7-DoF manipulator being directly transferred to the main UAS frame. Additionally, their orientation can be varying, depending on the pose of the manipulator. Thus, the ability of the aerial manipulator to robustly maintain its position and attitude while performing the task is mandatory. Compared to the depicted experimentation of this book chapter the induced forces from such operation are calculated to be in the area of 10 to 100 N. Subsequently, although the existing position controller of the ArduCopter flight stack is able to withhold a proper pose while ultrasound scanning of inclined areas, advanced control techniques [49] will be utilized in the sequel. The authors intend to test the efficiency of the built-in attitude controller of the ArduCopter flight stack, as well as exploit the adaptive backstepping control strategies in [48, 49] and other (model predictive) control techniques. The implementation of such controllers relies on the ability to directly control the angular velocity of the drone’s motors independently, at rates greater or equal to 1 kHz.
Conflict of interest
The authors declare no conflict of interest.
Nomenclature
unmanned aerial vehicle remote control power distribution board inertial measurement unit flight control unit global navigation satellite system global positioning system real-time kinematic extended Kalman filter robot operating system degree-of-freedom field-of-view pan-tilt-zoom visual object tracking