Camera Only

Overview

The Camera Only perception system provides vision-based cone detection using YOLO object detection models combined with Intel RealSense depth cameras. This module processes RGB-D (color + depth) images to detect cones and determine their 3D positions relative to the vehicle.

Historical Note: The system originally used a single camera setup, but the field of view (FOV) was not large enough to adequately cover the track width and detect cones on both sides. The dual camera configuration significantly improves lateral coverage and detection reliability.

System Architecture

The dual camera system uses two Intel RealSense depth cameras:

Left Camera (D435i): Positioned on the left side of the vehicle, mounted upside down
Right Camera (D435): Positioned on the right side of the vehicle

Each camera independently:

Captures RGB color images
Captures aligned depth images
Runs YOLO-based cone detection on color images
Uses depth data to compute 3D cone positions
Publishes detected cones with color classification

Dual Camera Node

The DualCamera class manages both cameras and performs synchronized processing.

Key Features

Independent Processing: Each camera processes its data stream independently
YOLO Segmentation: Uses YOLO model with instance segmentation for precise cone detection
Depth Integration: Combines segmentation masks with depth data for 3D localization
Color Classification: Detects and classifies blue and yellow cones
Confidence Filtering: Only publishes detections above configurable confidence threshold
Visualization Support: Optional depth overlay and visualization outputs

ROS Topics

Subscribed Topics:

Topic	Type	Description
/camera/realsense_imu/color/image_raw	sensor_msgs/Image	Left camera RGB image
/camera/realsense_imu/aligned_depth_to_color/image_raw	sensor_msgs/Image	Left camera depth image aligned to RGB
/camera/realsense2/color/image_raw	sensor_msgs/Image	Right camera RGB image
/camera/realsense2/aligned_depth_to_color/image_raw	sensor_msgs/Image	Right camera depth image aligned to RGB

Published Topics:

Topic	Type	Description
/realsense/d435i/cones	ConesCartesian	Left camera cone detections
/realsense/d435/cones	ConesCartesian	Right camera cone detections
/perception_cones_viz	PointCloud	Visualization of detected cones

Processing Pipeline

For each camera, the processing pipeline follows these steps:

Image Reception

Color and depth images are received via ROS subscriptions. Processing only occurs when both images are available.
YOLO Detection

The color image is passed through a YOLO model to detect cone instances with segmentation masks.

Depth Extraction

# For each detected cone segmentation mask:
# 1. Extract depth values within the mask
depths = depth_image[mask == 1]

# 2. Back-project pixels to 3D camera coordinates
pixels = np.vstack((u, v, ones))
camera_coords = (inv_K @ pixels) * depths

# 3. Compute median position
cone_position = np.median(camera_coords, axis=1)

Coordinate Transform

Camera coordinates are converted to vehicle frame:
- X (forward): Camera Z-axis
- Y (left): Camera X-axis
- Z (up): Camera Y-axis (not published)
Color Classification

YOLO class outputs are mapped to FEB color codes:
- Blue cones (class 8) → color = 2
- Yellow cones (class 1) → color = 1
- Other cones → color = 0
Publishing

Cone detections are published as ConesCartesian messages with timestamp and evaluation time.

Configuration Parameters

From CameraOnlySettings:

Parameter	Description
yolo_minimum_confidence	Minimum YOLO confidence threshold for detections (e.g., 0.5)
publish_visual	Enable/disable visualization publishing
dual_camera	Enable dual camera mode (vs single camera)

From EarlyStageFusionSettings:

Parameter	Description
left_intrinsic	3x3 camera intrinsic matrix for left camera
right_intrinsic	3x3 camera intrinsic matrix for right camera

Camera Calibration

Both cameras require intrinsic calibration matrices for accurate 3D reconstruction. The intrinsic matrix K relates pixel coordinates to camera rays:

\[\begin{split}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = K \begin{bmatrix} X/Z \\ Y/Z \\ 1 \end{bmatrix}\end{split}\]

\[\begin{split}K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}\end{split}\]

Where:

(fx, fy): Focal lengths in pixels
(cx, cy): Principal point (optical center)

Image Processing Notes

Depth Units:

RealSense depth images are provided in millimeters and must be converted to meters:

depth_image = depth_image.astype(np.float32) / 1000.0

Image Flipping:

The left camera images are flipped 180 degrees due to physical mounting orientation:

img = cv2.flip(img, -1)  # Flip both axes
depth_image = cv2.flip(depth_image, -1)

Depth Alignment:

Depth images are pre-aligned to the color image frame by the RealSense driver, ensuring pixel correspondence.

Important

To enable proper depth-to-color alignment, you must launch the cameras using the perception.py or camera.py launch files from the sn4_bringup package. These launch files configure the RealSense driver with the align_depth.enable: True parameter, which ensures the depth image is spatially aligned to match the color image frame. Without this alignment, depth values will not correspond to the correct pixel locations in the RGB image.

# Correct: Launches cameras with alignment enabled
ros2 launch sn4_bringup camera.py

# Or launch full perception system
ros2 launch sn4_bringup perception.py

Visualization and Debugging

The module includes several visualization tools:

Depth Overlay:

Shows segmented cone regions overlaid on the original image with color-coded depth information (green intensity indicates depth).

Interactive Depth Viewer:

Matplotlib-based interactive viewer allows hovering over depth images to inspect exact depth values at each pixel.

Usage Example

import rclpy
from camera_only.dual_camera import DualCamera

rclpy.init()
node = DualCamera()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()

Performance Considerations

Processing Rate: Limited by YOLO inference time (~30-60 FPS depending on hardware)
Latency: Includes image acquisition, YOLO inference, and depth processing
Accuracy: Dependent on depth quality, lighting conditions, and cone visibility
Range: Effective up to ~15 meters depending on cone size and lighting

Advantages of Dual Camera System

Increased FOV Coverage:

Single camera FOV: ~69° horizontal (D435)
Dual camera coverage: ~120-140° effective horizontal coverage
Eliminates blind spots on track edges

Improved Reliability:

Redundancy if one camera fails or is occluded
Better cone detection at different lateral positions
Complementary viewpoints reduce false negatives

Challenges:

Increased computational load (2x processing)
Camera synchronization and calibration requirements
More complex system integration

API Reference

class perception.camera_only.camera_only.dual_camera.DualCamera[source]

Bases: rclpy.node.Node

realsense_right_callback(msg)[source]

realsense_right_depth_callback(msg)[source]

realsense_left_callback(msg)[source]

realsense_left_depth_callback(msg)[source]

process(left)[source]

Parameters: left (bool) –

visualize_depth_overlay(img, depth_image, mask)[source]

display_depth(depth_image)[source]

Display a raw depth image with hover-to-inspect values in meters.

Parameters: depth_image (np.ndarray) – Depth image in mm or meters.

class perception.camera_only.camera_only.model_operations.ModelOperations[source]

Bases: object

predict(img_msg)[source]