How Stereo Depth Perception Enables Advanced Robotic Capabilities

March 25, 2025
A detailed look at how the combination of stereo depth perception with 3D data in stereo cameras enables the functions of autonomous mobile and pick-and-place robots.

3D sensors are a fundamental technology for measuring depth perception. These sensors can  be found in several common 3D vision technologies like stereo cameras, LiDAR, time-of- flight cameras and laser triangulation. 

A manufacturer’s selection of 3D technology depends on the specific application and  requirements, as each technology delivers specific advantages. For example, LiDAR and laser  triangulation technologies are not suitable for ruggedized applications due to moving parts  like rotating mirrors.

Stereo cameras are a better fit for outdoor applications because they  are not affected by sunlight interference. Plus, the cost of stereo cameras is typically lower  than the other 3D sensor options. Stereo cameras compute 3D data from images, and this requires higher computational  power compared to the other technologies mentioned above. However, some stereo cameras  offer onboard processing to offload the host data.  Stereo cameras can also provide color  images and color point clouds, whereas the other common 3D vision technologies require a  separate color camera.  

With any of these vision sensors, there is typically a trade-off between range and accuracy.  For example, a long-range sensor has lower accuracy, while a short-range sensor has higher  accuracy. LiDAR offers the longest range, followed by stereo cameras and then time-of-flight.  Laser triangulation has the shortest range but higher accuracy. 

Longer range capabilities are needed for autonomous navigation and obstacle avoidance,  while the medium range is needed for pick-and-place functions. Closer range is needed for  object identification and inspection. 

Stereo camera industrial applications

Stereo cameras are suitable for most warehouse robotic applications because they offer  flexible range with sufficient accuracy. These cameras are relatively low-cost, easily  ruggedized and offer color images needed for object recognition.  

The two most common industrial applications for stereo cameras are autonomous mobile  robots (AMRs) and pick-and-place robots.

AMRs use stereo cameras to perform SLAM (simultaneous localization and mapping) by  building a map of the environment and localizing themselves in the map at the same time.  They plan routes to given destinations, detect obstacles (objects/people) and navigate  around them. 

Following are the standard stereo camera feature/characteristic requirements for AMR  applications:

  • High frame rate
  • Low latency
  • Robust and reliable
  • Calibration retention
  • Wide field-of-view
  • Longer working distance
  • High dynamic range for indoor and outdoor use

The key components for pick-and-place robotic applications include a vision system to  perceive the environment, a control system to process the data for decision making and a  robot arm with gripper or suction to manipulate the objects.  Pick-and-place robots can be  used for a variety of applications such as assembly, palletization, depalletization and bin  picking. 

Using bin picking as an example, the objective is to remove randomly placed objects from a  container. In this application, the vision system will be used to recognize and locate an object  and then compute its orientation so the gripper can grasp it properly. The control system  then determines the robot trajectory, avoiding obstacles on its way. Finally, the robot picks  up the object and places it at the destination.

The standard stereo camera feature/characteristic requirements for pick-and-place robot  applications are:

  • High accuracy
  • Low latency
  • Robust and reliable
  • Calibration retention
  • Capable of withstanding dusty/humid industrial environments 
  • Different sizes of objects require flexibility on field-of-view and working distances. 

Addressing 3D point cloud, latency and deployment issues 

Because the quality of a 3D point cloud depends on the image sensor data, a typical machine  vision challenge involves having sufficient lighting to avoid long exposure time, which may  cause image blur. 

Robot performance and decision making depend on the quality of the obtained 3D point  cloud. 

The 3D point cloud can be improved by:  

  • Higher sensor and stereo resolution produce more 3D points.  
  • Accuracy of the 3D points is improved with a wider baseline, higher resolution and a  narrower field-of-view.  
  • Denser and cleaner point clouds are obtained with a better stereo algorithm, but there is  typically a trade-off between quality and speed.  
  • For low-texture scenes, point cloud density is increased by using a pattern projector.  
  • Noise in the point cloud is reduced by doing some post-processing such as median filter,  speckle filter or temporal filter.

Latency is the delay between an image being captured by the sensor in the camera and the  transferring of the 3D data to the host. The benefits of low latency are faster decision-making  and more responsive interaction with the environment. Receiving the 3D data faster also  allows more time for any subsequent AI processing. 

The key factors that help reduce latency are:  

  • With a more streamlined camera architecture, pixels in a pipeline can be processed so that  there is no need to wait until the previous module finishes the whole image before starting  the next module in the image pipeline.  
  • With faster stereo processing speed, the disparity image will be generated quicker and  reduce latency.  
  • With higher transmission bandwidth, less time will be taken to transmit the data from the  camera to the host, reducing latency.

After a system is deployed for use in production, it is crucial to continue to monitor  performance over time. Here are some of the practical issues that can occur after  deployment:  

  • If the camera is working intermittently or dropping connection frequently, which could be  caused by unstable interface connection, consider a more stable industrial interface such as  Ethernet rather than USB.  
  • If the camera fails due to the shock and vibration on the system, select a camera with high  reliability, robustness, and IP rating.  
  • If robot performance degrades over time, the stereo camera may need recalibration.

Calibration retention is also critical for a stereo camera. A stereo camera that is not properly  calibrated impacts the decision-making capabilities for the application. 

Moreover, as the calibration error goes up, the stereo accuracy would get worse. It is  important to select a stereo camera that will retain good calibration over time. Otherwise, it  will need frequent re-calibration which is not practical after deployment in the field.

Stephen Se is senior engineering manager at Teledyne Vision Solutions.

Sponsored Recommendations

Food Production: How SEW-EURODRIVE Drives Excellence

Optimize food production with SEW-EURODRIVEโ€™s hygienic, energy-efficient automation and drive solutions for precision, reliability, and sustainability.

Rock Quarry Implements Ignition to Improve Visibility, Safety & Decision-Making

George Reed, with the help of Factory Technologies, was looking to further automate the processes at its quarries and make Ignition an organization-wide standard.

Water Infrastructure Company Replaces Point-To-Point VPN With MQTT

Goodnight Midstream chose Ignition because it could fulfill several requirements: data mining and business intelligence work on the system backend; powerful Linux-based edge deployments...

The Purdue Model And Ignition

In the automation world, the Purdue Model (also known as the Purdue reference model, Purdue network model, ISA 95, or the Automation Pyramid) is a well-known architectural framework...