All Projects
👁️
AI-Driven Structural Performance3D Computer VisionActive

NeRF-UAVeL: Unified Attention-driven Volumetric Learning for Enhanced NeRF-based 3D Object Detection

Unified attention-driven volumetric learning framework for state-of-the-art NeRF-based 3D object detection

98.5%
R25
3D-FRONT dataset
77.2%
R50
3D-FRONT dataset
87.3%
AP25
3D-FRONT dataset
66.6%
AP50
3D-FRONT dataset
📖

Overview

Three-dimensional (3D) object detection based on Neural Radiance Fields (NeRF) has emerged as a promising direction for reconstructing complex environments from posed RGB images. However, existing NeRF-based detectors often suffer from coarse feature encoding and limited attention to multi-scale volumetric structure, leading to inaccurate localization and poor generalization in real-world scenarios. NeRF-UAVeL addresses these challenges with a unified attention-driven volumetric learning detection framework that integrates four novel modules into a NeRF-derived 3D volumetric backbone: Multi-dimensional Volumetric Attention Pooling (MVAP), Tri-Scale Asymmetric Convolutional Aggregation (TACA), Dual-Domain Attention Fusion (DDAF), and Volumetric Cross-Window Attention Fusion (V-CWAF). Extensive experiments on the 3D-FRONT and ScanNet datasets demonstrate that NeRF-UAVeL outperforms both point cloud-based and multi-view-based methods, improving AP50 by +6.7% and R50 by +7.3% over the baseline on 3D-FRONT, and achieving +6.9% in AP50 and +2.9% in R50 on ScanNet.

🎬

3D Object Detection Visualizations

💡

Novel Contributions

1

Multi-dimensional Volumetric Attention Pooling (MVAP)

Enhances spatial selectivity through adaptive attention-based pooling, enabling fine-grained volumetric feature representation in NeRF-derived volumes.

2

Tri-Scale Asymmetric Convolutional Aggregation (TACA)

Captures multi-scale volumetric features through asymmetric convolutional branches, enabling robust multi-scale structure understanding.

3

Dual-Domain Attention Fusion (DDAF)

Applies lightweight channel and spatial recalibration for refined feature emphasis, improving localization accuracy across spatial and frequency domains.

4

Volumetric Cross-Window Attention Fusion (V-CWAF)

Injects cross-window attention with dual-stage channel recalibration to boost high-level semantic encoding for precise 3D bounding box predictions.

⚙️

Technology Stack

NeRFPyTorch3D Object DetectionAttention MechanismsVolumetric LearningPython
📄

Related Publications

NeRF-UAVeL: Unified Attention-driven Volumetric Learning for Enhanced NeRF-based 3D Object Detection

Goshu, H.L., Wakjira, T.G., Atlaw, M.M., Chan, K.C., Lai, S., Lam, K.M.

Neurocomputing

Under Review2026
📬

Interested in This Research?

For code access, collaboration opportunities, or questions about this project, please contact the PI directly.

Contact PI