Motion Aware Perception

MAP (Motion Aware Perception) is a vision-based semantic segmentation system designed for autonomous driver-assistance applications. The system processes road scene imagery captured from vehicle-mounted cameras and classifies each pixel into classes such as road surface, lane markings, vehicles, pedestrians, others, using deep learning architectures, primarily U-Net with skip connections and a pretrained yolo26 model. A 12-week pipeline covers data acquisition from real-world and public datasets (Cityscapes), model training with Dice loss optimization, augmentation for robustness across diverse driving conditions and quantization for low latency edge deployment. Post-segmentation, extracted lane boundary data feeds into a real-time lane assistance module capable of detecting vehicle offset and triggering corrective alerts. The system demonstrates accurate scene understanding and lays groundwork for future integration with steering control, sensor fusion, and embedded automotive deployment.

GOOGLE MEET LINK: https://meet.google.com/gco-bxyo-usm

Mentees:

Raktim Phukan

Damien Joseph Pereira

Saicharanreddy

Sagar

Shankar Rajshekar D

Anirudh Pranesh

Jashin Bhattarai

Introduction

Modern autonomous and driver-assistance systems depend critically on a vehicle's ability to interpret its surroundings in real time. Semantic segmentation, the task of assigning a class label to every pixel in an image, provides a dense, structured understanding of the scene that downstream modules can directly act upon.

We use images from vehicle-mounted cameras, the system distinguishes road surfaces, lane markings, vehicles, pedestrians, etc at pixel-level precision. The core model is U-Net — an encoder-decoder architecture originally developed for biomedical imaging but highly effective in road scene parsing due to its skip connections, which preserve spatial detail lost during downsampling.

The project also explores U-Net++ variants and benchmarks them against YOLOv26 instance segmentation to evaluate speed-accuracy trade-offs. Beyond segmentation, MAP extends into practical driver assistance. Extracted lane boundary data is used to compute vehicle offset from the lane centre, enabling alert generation when deviation exceeds a safe threshold.

Objectives

The primary objective is to design and train a semantic segmentation model that accurately classifies road scene imagery into predefined classes across diverse driving conditions. This includes building a clean data pipeline, implementing U-Net from scratch in PyTorch, optimising with appropriate loss functions (cross-entropy and Dice loss), and evaluating performance using IoU and Dice score metrics.

The secondary objective is to extract lane boundaries from the predicted mask, compute lateral vehicle offset, and generate alerts when the vehicle drifts beyond a defined safe corridor or when another vehicle is in its path.

System Architecture & Methodology

The pipeline is structured across four stages.

Data Acquisition

Road images are sourced from the Cityscapes dataset, which provides high-quality urban driving scenes with 19-class pixel-level annotations. Images span varied conditions, ensuring the model generalises beyond a single environment. Data is split into training, validation, and hold-out test sets.

Preprocessing & Augmentation

Images are normalised and resized to a fixed resolution compatible with the U-Net input. To combat overfitting and improve real-world applications, augmentation is used.

Model U-Net

The encoder progressively downsamples the input via convolutional blocks followed by max pooling, producing increasingly abstract feature maps. The decoder mirrors this with transposed convolutions for upsampling, and crucially, skip connections concatenate encoder feature maps directly into corresponding decoder layers.

Training

The model is trained end-to-end using a combined cross-entropy and Dice loss, which balances pixel-wise classification accuracy with overlap quality. The Adam optimiser with a learning rate scheduler is used.

Feature Extraction

Post-segmentation, the lane marking class mask is isolated. Lane boundaries are fitted using polynomial curve fitting on the detected pixels.

Tools & Technologies

Python serves as the primary programming language. PyTorch is used for model definition, training, and inference. OpenCV handles image I/O, mask visualisation, and lane geometry computation. The Cityscapes dataset provides annotated training data. YOLOv26 is evaluated in the final phase as a comparative baseline for segmentation speed.

Results & Discussion

The trained U-Net demonstrates strong segmentation performance on road scenes, accurately delineating road surfaces, lane markings, vehicles, and pedestrians. Lane boundaries are cleanly extracted from predicted masks and remain stable across straight roads and moderate curves.

Conclusion

MAP successfully demonstrates an end-to-end semantic segmentation pipeline for road scenes, from raw camera input to structured scene understanding and lane-level driver assistance. Both the U-Net architecture and YOLOv26 prove effective at preserving spatial detail critical for lane and obstacle delineation. The project establishes a functional prototype that bridges deep learning perception with real-world driving-assistance logic and sets the stage for deployment on embedded automotive hardware.

Future Scope

Planned extensions include real-time inference optimisation for embedded platforms such as Raspberry Pi, integration with steering control for closed-loop lane centring, multi-sensor fusion combining camera output with LiDAR point clouds and RADAR for adverse weather robustness, and the addition of traffic sign recognition as a higher-level scene understanding module.

Virtual Expo 2026

Abstract