SLAM Architecture Overview

This page provides a detailed comparison of our two GraphSLAM implementations and explains when to use each approach.

Developer Note: The Late Stage Fusion (MWEKF) implementation was developed as an advanced robotics project to maximize sensor utilization and state estimation accuracy. This approach is preferred because it fully leverages the camera’s higher update rate (30 Hz vs LiDAR’s 10 Hz), incorporates MPC control outputs directly into the dynamics model, and provides a flexible framework for adding future sensors (GPS, steering encoders, etc.) directly into the Extended Kalman Filter.

Fusion Strategy Comparison

Early Stage Fusion vs Late Stage Fusion

Aspect	Early Stage Fusion	Late Stage Fusion (MWEKF)
Input	Single fused cone message	Separate camera & LiDAR messages
Update Rate	Limited by slower sensor (LiDAR)	Full rate of both sensors
Architecture	GraphSLAM only	MWEKF + GraphSLAM
State Estimation	IMU + wheelspeeds between updates	Continuous EKF with MPC dynamics
Complexity	Simpler, fewer components	More complex, two-stage filtering
Latency	Moderate (depends on solve frequency)	Low (EKF runs at sensor rates)

When to Use Each Approach

Early Stage Fusion is recommended when:

Simpler system architecture is preferred
Computational resources are limited
Perception module already provides well-fused observations
Lower maintenance overhead is desired
Real-time performance is less critical

Late Stage Fusion (MWEKF) is recommended when:

Maximum sensor utilization is required
High-frequency state estimates are needed (for fast control loops)
MPC controller is available and providing control inputs
Camera and LiDAR operate at different rates
More accurate state estimation is critical
Computational resources are available

Preferred Approach: Late Stage Fusion (MWEKF) is generally preferred as it fully utilizes both sensor update rates instead of downsampling to match the slower sensor.

GraphSLAM Backend

Both implementations share the same core GraphSLAM optimization backend, which represents the SLAM problem as a factor graph:

State Variables:

Robot poses: \(\mathbf{x}_t = [x_t, y_t]\) at each timestep \(t\)
Landmark positions: \(\mathbf{l}_i = [x_i, y_i]\) for each cone \(i\)

Factor Graph Constraints:

Odometry factors: \(\mathbf{x}_t - \mathbf{x}_{t-1} = \Delta\mathbf{x}_t\)
- Weight: dx_weight (default: 2.0)
- Represents relative motion between timesteps
Observation factors: \(\mathbf{l}_i - \mathbf{x}_t = \mathbf{z}_{i,t}\)
- Weight: z_weight (default: 1.0)
- Represents observed cone position relative to robot

Optimization:

The system is solved as a sparse least squares problem:

\[\min_{\mathbf{x}, \mathbf{l}} \|A[\mathbf{x}; \mathbf{l}] - \mathbf{b}\|^2\]

Where:

\(A\) is a sparse constraint matrix (stored as LIL format, converted to CSR for solving)
\(\mathbf{b}\) contains the constraint values
Solution uses scipy.sparse.linalg.spsolve with normal equations: \(A^T A \mathbf{x} = A^T \mathbf{b}\)

Performance: Typical solve time is ~630 microseconds with optimized sparse matrix operations.

Matrix Accumulation Visualization

The GraphSLAM algorithm builds a sparse constraint matrix incrementally as the vehicle moves and observes landmarks. Here’s how the system matrix grows:

Initial Conditions and First Observation:

Starting from origin \(\mathbf{P}_0 = (0,0)\) with a cone observed at \((-2, 3)\) relative to the vehicle:

\[\begin{split}\underbrace{\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ -1 & 0 & 1 & 0 \\ 0 & -1 & 0 & 1 \end{bmatrix}}_{\text{Constraint Matrix } A} \underbrace{\begin{bmatrix} P_{x_0} \\ P_{y_0} \\ l_{1_x} \\ l_{1_y} \end{bmatrix}}_{\text{State Vector}} = \underbrace{\begin{bmatrix} 0 \\ 0 \\ -2 \\ 3 \end{bmatrix}}_{\text{Measurements}}\end{split}\]

After Vehicle Motion:

Vehicle moves to \((0, 2)\) and observes the same cone from a new pose:

\[\begin{split}\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 0 & 1 & 0 \\ 0 & -1 & 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 & -1 & 0 \\ 0 & 0 & 0 & 1 & 0 & -1 \end{bmatrix} \begin{bmatrix} P_{x_0} \\ P_{y_0} \\ l_{1_x} \\ l_{1_y} \\ P_{x_1} \\ P_{y_1} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ -2 \\ 3 \\ 0 \\ 2 \\ -2 \\ 1 \end{bmatrix}\end{split}\]

Where:

First 2 rows: Initial pose constraint \(P_{x_0} = 0, P_{y_0} = 0\)
Rows 3-4: First landmark observation \(l_1 - P_0 = (-2, 3)\)
Rows 5-6: Odometry constraint \(P_1 - P_0 = (0, 2)\)
Rows 7-8: Second observation of same landmark \(l_1 - P_1 = (-2, 1)\)

General Structure:

As the vehicle continues to move and observe landmarks, the matrix grows in a specific pattern:

\[\begin{split}\begin{bmatrix} P_w & 0 & 0 & 0 & 0 & 0 & \cdots \\ 0 & P_w & 0 & 0 & 0 & 0 & \cdots \\ -P_w & 0 & L_w & 0 & 0 & 0 & \cdots \\ 0 & -P_w & 0 & L_w & 0 & 0 & \cdots \\ -P_w & 0 & 0 & 0 & P_w & 0 & \cdots \\ 0 & -P_w & 0 & 0 & 0 & P_w & \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix} \begin{bmatrix} P_{x_0} \\ P_{y_0} \\ l_{1_x} \\ l_{1_y} \\ P_{x_1} \\ P_{y_1} \\ \vdots \end{bmatrix}\end{split}\]

Where:

\(P_w\) = Position weight (dx_weight, default: 2.0)
\(L_w\) = Landmark observation weight (z_weight, default: 1.0)

Sparsity Pattern:

The constraint matrix is highly sparse because:

Each pose only connects to adjacent poses (odometry)
Each landmark only connects to poses where it was observed
Most matrix entries remain zero, enabling efficient sparse solvers

Key Properties:

Block structure: Each pose and landmark occupies 2 rows (x, y coordinates)
Incremental growth: Matrix expands as vehicle explores new areas
Loop closure: Re-observing landmarks creates additional constraints, improving accuracy
Sparse solution: Solved using \(A^T A x = A^T b\) with CSR format for efficiency

Data Association

Both implementations support two data association strategies:

Nearest Neighbor (Default)

Matches new observations to existing landmarks based on Euclidean distance
New landmark created if distance exceeds max_landmark_distance
Fast and works well when odometry is reasonably accurate
Used by default in early stage fusion

Coordinate Frame Transforms

The system handles multiple coordinate frames:

Sensor Frames: Camera and LiDAR have their own coordinate systems
Vehicle Frame: Center of Gravity (CoG) of the car
Map Frame: Global coordinate system for SLAM

Important Transforms:

LiDAR to CoG: Forward offset of -0.9652m (configured in early stage fusion)
Camera frames: Handled via rotate_and_translate_cones() in late stage fusion
All cone observations are transformed to the vehicle CoG before being added to the graph

Lap Detection

Both implementations include automatic lap counting functionality:

Orange Cone Gate Detection:

System identifies large orange cones (color code 0)
Computes vertical gate boundaries when sufficient cones detected
Monitors vehicle trajectory for gate crossing using line segment intersection
Increments lap counter when crossing detected (with minimum 20-30 second cooldown)

Behavior Changes:

After First Lap: System publishes global map containing all discovered cones
Lap Count: Included in published /slam/state message for race management
Race End: Can trigger end-of-race signal after configured number of laps

State Message Format

The published state message (feb_msgs/State) contains:

State:
  x: float          # Global x position (meters)
  y: float          # Global y position (meters)
  velocity: float   # Forward velocity (m/s)
  heading: float    # Yaw angle (radians)
  lap_count: int    # Current lap number

This state is used by:

Local path planner for trajectory generation
Global path planner for long-term planning
Control module for feedback
Visualization and logging systems

SLAM Architecture Overview

Fusion Strategy Comparison

Early Stage Fusion vs Late Stage Fusion

When to Use Each Approach

GraphSLAM Backend

Matrix Accumulation Visualization

Data Association

Nearest Neighbor (Default)

ICP Refinement (Optional)

Coordinate Frame Transforms

Lap Detection

State Message Format