NeRF-UAVeL: Unified Attention-driven Volumetric Learning for Enhanced NeRF-based 3D Object Detection
Unified attention-driven volumetric learning framework for state-of-the-art NeRF-based 3D object detection
Overview
Three-dimensional (3D) object detection based on Neural Radiance Fields (NeRF) has emerged as a promising direction for reconstructing complex environments from posed RGB images. However, existing NeRF-based detectors often suffer from coarse feature encoding and limited attention to multi-scale volumetric structure, leading to inaccurate localization and poor generalization in real-world scenarios. NeRF-UAVeL addresses these challenges with a unified attention-driven volumetric learning detection framework that integrates four novel modules into a NeRF-derived 3D volumetric backbone: Multi-dimensional Volumetric Attention Pooling (MVAP), Tri-Scale Asymmetric Convolutional Aggregation (TACA), Dual-Domain Attention Fusion (DDAF), and Volumetric Cross-Window Attention Fusion (V-CWAF). Extensive experiments on the 3D-FRONT and ScanNet datasets demonstrate that NeRF-UAVeL outperforms both point cloud-based and multi-view-based methods, improving AP50 by +6.7% and R50 by +7.3% over the baseline on 3D-FRONT, and achieving +6.9% in AP50 and +2.9% in R50 on ScanNet.
3D Object Detection Visualizations
Novel Contributions
Multi-dimensional Volumetric Attention Pooling (MVAP)
Enhances spatial selectivity through adaptive attention-based pooling, enabling fine-grained volumetric feature representation in NeRF-derived volumes.
Tri-Scale Asymmetric Convolutional Aggregation (TACA)
Captures multi-scale volumetric features through asymmetric convolutional branches, enabling robust multi-scale structure understanding.
Dual-Domain Attention Fusion (DDAF)
Applies lightweight channel and spatial recalibration for refined feature emphasis, improving localization accuracy across spatial and frequency domains.
Volumetric Cross-Window Attention Fusion (V-CWAF)
Injects cross-window attention with dual-stage channel recalibration to boost high-level semantic encoding for precise 3D bounding box predictions.
Technology Stack
Related Publications
NeRF-UAVeL: Unified Attention-driven Volumetric Learning for Enhanced NeRF-based 3D Object Detection
Goshu, H.L., Wakjira, T.G., Atlaw, M.M., Chan, K.C., Lai, S., Lam, K.M.
Neurocomputing
Interested in This Research?
For code access, collaboration opportunities, or questions about this project, please contact the PI directly.