Stereo Cameras Enable Autonomous Mobile and Pick-and-Place Robots

3D sensors are a fundamental technology for measuring depth perception. These sensors can be found in several common 3D vision technologies like stereo cameras, LiDAR, time-of- flight cameras and laser triangulation.

A manufacturer’s selection of 3D technology depends on the specific application and requirements, as each technology delivers specific advantages. For example, LiDAR and laser triangulation technologies are not suitable for ruggedized applications due to moving parts like rotating mirrors.

Stereo cameras are a better fit for outdoor applications because they are not affected by sunlight interference. Plus, the cost of stereo cameras is typically lower than the other 3D sensor options. Stereo cameras compute 3D data from images, and this requires higher computational power compared to the other technologies mentioned above. However, some stereo cameras offer onboard processing to offload the host data. Stereo cameras can also provide color images and color point clouds, whereas the other common 3D vision technologies require a separate color camera.

Range and accuracy for various 3D vision technologies. — An explanation of the balance between range and accuracy for various 3D vision technologies.

With any of these vision sensors, there is typically a trade-off between range and accuracy. For example, a long-range sensor has lower accuracy, while a short-range sensor has higher accuracy. LiDAR offers the longest range, followed by stereo cameras and then time-of-flight. Laser triangulation has the shortest range but higher accuracy.

Longer range capabilities are needed for autonomous navigation and obstacle avoidance, while the medium range is needed for pick-and-place functions. Closer range is needed for object identification and inspection.

Stereo camera industrial applications

Stereo cameras are suitable for most warehouse robotic applications because they offer flexible range with sufficient accuracy. These cameras are relatively low-cost, easily ruggedized and offer color images needed for object recognition.

The two most common industrial applications for stereo cameras are autonomous mobile robots (AMRs) and pick-and-place robots.

AMRs use stereo cameras to perform SLAM (simultaneous localization and mapping) by building a map of the environment and localizing themselves in the map at the same time. They plan routes to given destinations, detect obstacles (objects/people) and navigate around them.

Following are the standard stereo camera feature/characteristic requirements for AMR applications:

High frame rate
Low latency
Robust and reliable
Calibration retention
Wide field-of-view
Longer working distance
High dynamic range for indoor and outdoor use

Stereo camera use case in pick-and-place robot application. Courtesy of Taiga Robotics and Kindred/Ocado.

The key components for pick-and-place robotic applications include a vision system to perceive the environment, a control system to process the data for decision making and a robot arm with gripper or suction to manipulate the objects. Pick-and-place robots can be used for a variety of applications such as assembly, palletization, depalletization and bin picking.

Using bin picking as an example, the objective is to remove randomly placed objects from a container. In this application, the vision system will be used to recognize and locate an object and then compute its orientation so the gripper can grasp it properly. The control system then determines the robot trajectory, avoiding obstacles on its way. Finally, the robot picks up the object and places it at the destination.

The standard stereo camera feature/characteristic requirements for pick-and-place robot applications are:

High accuracy
Low latency
Robust and reliable
Calibration retention
Capable of withstanding dusty/humid industrial environments
Different sizes of objects require flexibility on field-of-view and working distances.

Addressing 3D point cloud, latency and deployment issues

Because the quality of a 3D point cloud depends on the image sensor data, a typical machine vision challenge involves having sufficient lighting to avoid long exposure time, which may cause image blur.

Robot performance and decision making depend on the quality of the obtained 3D point cloud.

The 3D point cloud can be improved by:

Higher sensor and stereo resolution produce more 3D points.
Accuracy of the 3D points is improved with a wider baseline, higher resolution and a narrower field-of-view.
Denser and cleaner point clouds are obtained with a better stereo algorithm, but there is typically a trade-off between quality and speed.
For low-texture scenes, point cloud density is increased by using a pattern projector.
Noise in the point cloud is reduced by doing some post-processing such as median filter, speckle filter or temporal filter.

Latency is the delay between an image being captured by the sensor in the camera and the transferring of the 3D data to the host. The benefits of low latency are faster decision-making and more responsive interaction with the environment. Receiving the 3D data faster also allows more time for any subsequent AI processing.

The key factors that help reduce latency are:

With a more streamlined camera architecture, pixels in a pipeline can be processed so that there is no need to wait until the previous module finishes the whole image before starting the next module in the image pipeline.
With faster stereo processing speed, the disparity image will be generated quicker and reduce latency.
With higher transmission bandwidth, less time will be taken to transmit the data from the camera to the host, reducing latency.

After a system is deployed for use in production, it is crucial to continue to monitor performance over time. Here are some of the practical issues that can occur after deployment:

If the camera is working intermittently or dropping connection frequently, which could be caused by unstable interface connection, consider a more stable industrial interface such as Ethernet rather than USB.
If the camera fails due to the shock and vibration on the system, select a camera with high reliability, robustness, and IP rating.
If robot performance degrades over time, the stereo camera may need recalibration.

Calibration retention is also critical for a stereo camera. A stereo camera that is not properly calibrated impacts the decision-making capabilities for the application.

Moreover, as the calibration error goes up, the stereo accuracy would get worse. It is important to select a stereo camera that will retain good calibration over time. Otherwise, it will need frequent re-calibration which is not practical after deployment in the field.

Stephen Se is senior engineering manager at Teledyne Vision Solutions.

More Automation World coverage of AMRs and pick-and-place robotics:

The Bumblebee X stereo camera

Given the needs of robotics professionals as described in this article, Teledyne Flir has developed the Bumblebee X stereo camera featuring an industrial grade (IP67-rated) 5 gigabit Ethernet onboard processing stereo vision system.

Key features of Bumblebee X: