Hoang M. Truong

Trương Minh Hoàng 🇻🇳

Hi! I’m a recent graduate from the University of Science, Viet Nam National University Ho Chi Minh City (VNU-HCM).

I am interested in how machines can understand and interact with the real world. My research focuses on vision-language models, egocentric (first-person) video understanding, and robotics foundation models, with the goal of building scalable and robust AI systems capable of cross-modal understanding and real-world interaction.

Publications

Semantic Alignment in Hyperbolic Space for Open-Vocabulary Semantic Segmentation

Hoang M. Truong, Hai Nguyen-Truong, Dang Huynh

CVPR Workshop 2026

Summary We tackle semantic misalignment in open-vocabulary semantic segmentation by proposing (1) HyRo, a hyperbolic rotation module that refines angular relationships in the Poincaré ball while preserving hierarchical structure, and (2) a hyperbolic fine-tuning framework that decouples semantic alignment (angle) from hierarchical alignment (radius), enabling more accurate pixel-level predictions and state-of-the-art performance.

Paper Website GitHub

Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling

Hoang M. Truong, Vinh-Thuan Ly, Thuan-Phat Nguyen, Huy G. Tran, Tram T. Doan

CVPR Workshop 2025

Summary We improve event-based eye tracking for AR/VR by addressing abrupt eye movements and noise, by proposing (1) a robust augmentation pipeline including temporal shift, spatial flip, and event deletion, and (2) KnightPupil, a hybrid model with EfficientNet-B3, BiGRU, and an LTV-SSM to handle sparse, noisy inputs.

Paper arXiv

TinyGiantVLM: A Lightweight Vision-Language Architecture for Spatial Reasoning under Resource Constraints

Vinh-Thuan Ly, Hoang M. Truong, Xuan-Huong Nguyen

ICCV Workshop 2025

Summary We introduce TinyGiantVLM, a lightweight vision-language model that achieves comparable performance in warehouse-scale spatial reasoning. Our novel framework combines RGB and depth data through a Mixture-of-Experts module to handle high-modality inputs and diverse question types, demonstrating that compact models can match larger systems in spatial reasoning tasks.

Website arXiv

Honors and Awards

2025, Jensen Huang Scholarship, NVIDIA
2025, Odon Vallet Scholarship
2025, Mathematics Development Scholarship, Vietnam Institute for Advanced Study in Mathematics (VIASM)
2025, AmCham Scholarship, American Chamber of Commerce in Vietnam