Camera Only

Overview

The Camera Only perception system provides vision-based cone detection using YOLO object detection models combined with Intel RealSense depth cameras. This module processes RGB-D (color + depth) images to detect cones and determine their 3D positions relative to the vehicle.

Historical Note: The system originally used a single camera setup, but the field of view (FOV) was not large enough to adequately cover the track width and detect cones on both sides. The dual camera configuration significantly improves lateral coverage and detection reliability.

System Architecture

The dual camera system uses two Intel RealSense depth cameras:

  • Left Camera (D435i): Positioned on the left side of the vehicle, mounted upside down

  • Right Camera (D435): Positioned on the right side of the vehicle

Each camera independently:

  1. Captures RGB color images

  2. Captures aligned depth images

  3. Runs YOLO-based cone detection on color images

  4. Uses depth data to compute 3D cone positions

  5. Publishes detected cones with color classification

Dual Camera Node

The DualCamera class manages both cameras and performs synchronized processing.

Key Features

  • Independent Processing: Each camera processes its data stream independently

  • YOLO Segmentation: Uses YOLO model with instance segmentation for precise cone detection

  • Depth Integration: Combines segmentation masks with depth data for 3D localization

  • Color Classification: Detects and classifies blue and yellow cones

  • Confidence Filtering: Only publishes detections above configurable confidence threshold

  • Visualization Support: Optional depth overlay and visualization outputs

ROS Topics

Subscribed Topics:

Topic

Type

Description

/camera/realsense_imu/color/image_raw

sensor_msgs/Image

Left camera RGB image

/camera/realsense_imu/aligned_depth_to_color/image_raw

sensor_msgs/Image

Left camera depth image aligned to RGB

/camera/realsense2/color/image_raw

sensor_msgs/Image

Right camera RGB image

/camera/realsense2/aligned_depth_to_color/image_raw

sensor_msgs/Image

Right camera depth image aligned to RGB

Published Topics:

Topic

Type

Description

/realsense/d435i/cones

ConesCartesian

Left camera cone detections

/realsense/d435/cones

ConesCartesian

Right camera cone detections

/perception_cones_viz

PointCloud

Visualization of detected cones

Processing Pipeline

For each camera, the processing pipeline follows these steps:

  1. Image Reception

    Color and depth images are received via ROS subscriptions. Processing only occurs when both images are available.

  2. YOLO Detection

    The color image is passed through a YOLO model to detect cone instances with segmentation masks.

  3. Depth Extraction

    # For each detected cone segmentation mask:
    # 1. Extract depth values within the mask
    depths = depth_image[mask == 1]
    
    # 2. Back-project pixels to 3D camera coordinates
    pixels = np.vstack((u, v, ones))
    camera_coords = (inv_K @ pixels) * depths
    
    # 3. Compute median position
    cone_position = np.median(camera_coords, axis=1)
    
  4. Coordinate Transform

    Camera coordinates are converted to vehicle frame:

    • X (forward): Camera Z-axis

    • Y (left): Camera X-axis

    • Z (up): Camera Y-axis (not published)

  5. Color Classification

    YOLO class outputs are mapped to FEB color codes:

    • Blue cones (class 8) → color = 2

    • Yellow cones (class 1) → color = 1

    • Other cones → color = 0

  6. Publishing

    Cone detections are published as ConesCartesian messages with timestamp and evaluation time.

Configuration Parameters

From CameraOnlySettings:

Parameter

Description

yolo_minimum_confidence

Minimum YOLO confidence threshold for detections (e.g., 0.5)

publish_visual

Enable/disable visualization publishing

dual_camera

Enable dual camera mode (vs single camera)

From EarlyStageFusionSettings:

Parameter

Description

left_intrinsic

3x3 camera intrinsic matrix for left camera

right_intrinsic

3x3 camera intrinsic matrix for right camera

Camera Calibration

Both cameras require intrinsic calibration matrices for accurate 3D reconstruction. The intrinsic matrix K relates pixel coordinates to camera rays:

\[\begin{split}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = K \begin{bmatrix} X/Z \\ Y/Z \\ 1 \end{bmatrix}\end{split}\]
\[\begin{split}K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}\end{split}\]

Where:

  • (fx, fy): Focal lengths in pixels

  • (cx, cy): Principal point (optical center)

Image Processing Notes

Depth Units:

RealSense depth images are provided in millimeters and must be converted to meters:

depth_image = depth_image.astype(np.float32) / 1000.0

Image Flipping:

The left camera images are flipped 180 degrees due to physical mounting orientation:

img = cv2.flip(img, -1)  # Flip both axes
depth_image = cv2.flip(depth_image, -1)

Depth Alignment:

Depth images are pre-aligned to the color image frame by the RealSense driver, ensuring pixel correspondence.

Important

To enable proper depth-to-color alignment, you must launch the cameras using the perception.py or camera.py launch files from the sn4_bringup package. These launch files configure the RealSense driver with the align_depth.enable: True parameter, which ensures the depth image is spatially aligned to match the color image frame. Without this alignment, depth values will not correspond to the correct pixel locations in the RGB image.

# Correct: Launches cameras with alignment enabled
ros2 launch sn4_bringup camera.py

# Or launch full perception system
ros2 launch sn4_bringup perception.py

Visualization and Debugging

The module includes several visualization tools:

Depth Overlay:

Shows segmented cone regions overlaid on the original image with color-coded depth information (green intensity indicates depth).

Interactive Depth Viewer:

Matplotlib-based interactive viewer allows hovering over depth images to inspect exact depth values at each pixel.

Usage Example

import rclpy
from camera_only.dual_camera import DualCamera

rclpy.init()
node = DualCamera()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()

Performance Considerations

  • Processing Rate: Limited by YOLO inference time (~30-60 FPS depending on hardware)

  • Latency: Includes image acquisition, YOLO inference, and depth processing

  • Accuracy: Dependent on depth quality, lighting conditions, and cone visibility

  • Range: Effective up to ~15 meters depending on cone size and lighting

Advantages of Dual Camera System

Increased FOV Coverage:

  • Single camera FOV: ~69° horizontal (D435)

  • Dual camera coverage: ~120-140° effective horizontal coverage

  • Eliminates blind spots on track edges

Improved Reliability:

  • Redundancy if one camera fails or is occluded

  • Better cone detection at different lateral positions

  • Complementary viewpoints reduce false negatives

Challenges:

  • Increased computational load (2x processing)

  • Camera synchronization and calibration requirements

  • More complex system integration

API Reference

class perception.camera_only.camera_only.dual_camera.DualCamera[source]

Bases: rclpy.node.Node

realsense_right_callback(msg)[source]
realsense_right_depth_callback(msg)[source]
realsense_left_callback(msg)[source]
realsense_left_depth_callback(msg)[source]
process(left)[source]
Parameters

left (bool) –

visualize_depth_overlay(img, depth_image, mask)[source]
display_depth(depth_image)[source]

Display a raw depth image with hover-to-inspect values in meters.

Parameters

depth_image (np.ndarray) – Depth image in mm or meters.

class perception.camera_only.camera_only.model_operations.ModelOperations[source]

Bases: object

predict(img_msg)[source]