Hoang M. Truong

Trương Minh Hoàng 🇻🇳

Hi! I’m a recent graduate from the University of Science, Viet Nam National University Ho Chi Minh City (VNU-HCM).

I am interested in how machines can understand and interact with the real world. My research focuses on vision-language models, egocentric (first-person) video understanding, and robotics foundation models, with the goal of building scalable and robust AI systems capable of cross-modal understanding and real-world interaction.

Publications

HyRo paper

Semantic Alignment in Hyperbolic Space for Open-Vocabulary Semantic Segmentation

Hoang M. Truong, Hai Nguyen-Truong, Dang Huynh

CVPR Workshop 2026

Summary We tackle semantic misalignment in open-vocabulary semantic segmentation by proposing (1) HyRo, a hyperbolic rotation module that refines angular relationships in the Poincaré ball while preserving hierarchical structure, and (2) a hyperbolic fine-tuning framework that decouples semantic alignment (angle) from hierarchical alignment (radius), enabling more accurate pixel-level predictions and state-of-the-art performance.

3ET paper

Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling

Hoang M. Truong, Vinh-Thuan Ly, Thuan-Phat Nguyen, Huy G. Tran, Tram T. Doan

CVPR Workshop 2025

Summary We improve event-based eye tracking for AR/VR by addressing abrupt eye movements and noise, by proposing (1) a robust augmentation pipeline including temporal shift, spatial flip, and event deletion, and (2) KnightPupil, a hybrid model with EfficientNet-B3, BiGRU, and an LTV-SSM to handle sparse, noisy inputs.

TinyGiantVLM example

TinyGiantVLM: A Lightweight Vision-Language Architecture for Spatial Reasoning under Resource Constraints

Vinh-Thuan Ly, Hoang M. Truong, Xuan-Huong Nguyen

ICCV Workshop 2025

Summary We introduce TinyGiantVLM, a lightweight vision-language model that achieves comparable performance in warehouse-scale spatial reasoning. Our novel framework combines RGB and depth data through a Mixture-of-Experts module to handle high-modality inputs and diverse question types, demonstrating that compact models can match larger systems in spatial reasoning tasks.

Honors and Awards

  • 2025, Jensen Huang Scholarship, NVIDIA
  • 2025, Odon Vallet Scholarship
  • 2025, Mathematics Development Scholarship, Vietnam Institute for Advanced Study in Mathematics (VIASM)
  • 2025, AmCham Scholarship, American Chamber of Commerce in Vietnam