Camera Only
Overview
The Camera Only perception system provides vision-based cone detection using YOLO object detection models combined with Intel RealSense depth cameras. This module processes RGB-D (color + depth) images to detect cones and determine their 3D positions relative to the vehicle.
Historical Note: The system originally used a single camera setup, but the field of view (FOV) was not large enough to adequately cover the track width and detect cones on both sides. The dual camera configuration significantly improves lateral coverage and detection reliability.
System Architecture
The dual camera system uses two Intel RealSense depth cameras:
Left Camera (D435i): Positioned on the left side of the vehicle, mounted upside down
Right Camera (D435): Positioned on the right side of the vehicle
Each camera independently:
Captures RGB color images
Captures aligned depth images
Runs YOLO-based cone detection on color images
Uses depth data to compute 3D cone positions
Publishes detected cones with color classification
Dual Camera Node
The DualCamera class manages both cameras and performs synchronized processing.
Key Features
Independent Processing: Each camera processes its data stream independently
YOLO Segmentation: Uses YOLO model with instance segmentation for precise cone detection
Depth Integration: Combines segmentation masks with depth data for 3D localization
Color Classification: Detects and classifies blue and yellow cones
Confidence Filtering: Only publishes detections above configurable confidence threshold
Visualization Support: Optional depth overlay and visualization outputs
ROS Topics
Subscribed Topics:
Topic |
Type |
Description |
---|---|---|
/camera/realsense_imu/color/image_raw |
sensor_msgs/Image |
Left camera RGB image |
/camera/realsense_imu/aligned_depth_to_color/image_raw |
sensor_msgs/Image |
Left camera depth image aligned to RGB |
/camera/realsense2/color/image_raw |
sensor_msgs/Image |
Right camera RGB image |
/camera/realsense2/aligned_depth_to_color/image_raw |
sensor_msgs/Image |
Right camera depth image aligned to RGB |
Published Topics:
Topic |
Type |
Description |
---|---|---|
/realsense/d435i/cones |
ConesCartesian |
Left camera cone detections |
/realsense/d435/cones |
ConesCartesian |
Right camera cone detections |
/perception_cones_viz |
PointCloud |
Visualization of detected cones |
Processing Pipeline
For each camera, the processing pipeline follows these steps:
Image Reception
Color and depth images are received via ROS subscriptions. Processing only occurs when both images are available.
YOLO Detection
The color image is passed through a YOLO model to detect cone instances with segmentation masks.
Depth Extraction
# For each detected cone segmentation mask: # 1. Extract depth values within the mask depths = depth_image[mask == 1] # 2. Back-project pixels to 3D camera coordinates pixels = np.vstack((u, v, ones)) camera_coords = (inv_K @ pixels) * depths # 3. Compute median position cone_position = np.median(camera_coords, axis=1)
Coordinate Transform
Camera coordinates are converted to vehicle frame:
X (forward): Camera Z-axis
Y (left): Camera X-axis
Z (up): Camera Y-axis (not published)
Color Classification
YOLO class outputs are mapped to FEB color codes:
Blue cones (class 8) → color = 2
Yellow cones (class 1) → color = 1
Other cones → color = 0
Publishing
Cone detections are published as
ConesCartesian
messages with timestamp and evaluation time.
Configuration Parameters
From CameraOnlySettings:
Parameter |
Description |
---|---|
yolo_minimum_confidence |
Minimum YOLO confidence threshold for detections (e.g., 0.5) |
publish_visual |
Enable/disable visualization publishing |
dual_camera |
Enable dual camera mode (vs single camera) |
From EarlyStageFusionSettings:
Parameter |
Description |
---|---|
left_intrinsic |
3x3 camera intrinsic matrix for left camera |
right_intrinsic |
3x3 camera intrinsic matrix for right camera |
Camera Calibration
Both cameras require intrinsic calibration matrices for accurate 3D reconstruction. The intrinsic matrix K relates pixel coordinates to camera rays:
Where:
(fx, fy): Focal lengths in pixels
(cx, cy): Principal point (optical center)
Image Processing Notes
Depth Units:
RealSense depth images are provided in millimeters and must be converted to meters:
depth_image = depth_image.astype(np.float32) / 1000.0
Image Flipping:
The left camera images are flipped 180 degrees due to physical mounting orientation:
img = cv2.flip(img, -1) # Flip both axes
depth_image = cv2.flip(depth_image, -1)
Depth Alignment:
Depth images are pre-aligned to the color image frame by the RealSense driver, ensuring pixel correspondence.
Important
To enable proper depth-to-color alignment, you must launch the cameras using the perception.py
or camera.py
launch files from the sn4_bringup
package. These launch files configure the RealSense driver with the align_depth.enable: True
parameter, which ensures the depth image is spatially aligned to match the color image frame. Without this alignment, depth values will not correspond to the correct pixel locations in the RGB image.
# Correct: Launches cameras with alignment enabled
ros2 launch sn4_bringup camera.py
# Or launch full perception system
ros2 launch sn4_bringup perception.py
Visualization and Debugging
The module includes several visualization tools:
Depth Overlay:
Shows segmented cone regions overlaid on the original image with color-coded depth information (green intensity indicates depth).
Interactive Depth Viewer:
Matplotlib-based interactive viewer allows hovering over depth images to inspect exact depth values at each pixel.
Usage Example
import rclpy
from camera_only.dual_camera import DualCamera
rclpy.init()
node = DualCamera()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()
Performance Considerations
Processing Rate: Limited by YOLO inference time (~30-60 FPS depending on hardware)
Latency: Includes image acquisition, YOLO inference, and depth processing
Accuracy: Dependent on depth quality, lighting conditions, and cone visibility
Range: Effective up to ~15 meters depending on cone size and lighting
Advantages of Dual Camera System
Increased FOV Coverage:
Single camera FOV: ~69° horizontal (D435)
Dual camera coverage: ~120-140° effective horizontal coverage
Eliminates blind spots on track edges
Improved Reliability:
Redundancy if one camera fails or is occluded
Better cone detection at different lateral positions
Complementary viewpoints reduce false negatives
Challenges:
Increased computational load (2x processing)
Camera synchronization and calibration requirements
More complex system integration