Hyperbolic Multimodal Learning for Video Understanding

Project description

Vision-Language Models (VLMs) have revolutionized multimodal AI by learning joint representations of images and text. However, conventional approaches using Euclidean geometry struggle to capture the compositional and hierarchical nature of real-world data. Visual scenes, language, and consequently video exhibit inherent hierarchical structures, from objects within scenes to actions within videos to micro-movements within actions.

Hyperbolic geometry offers a natural mathematical framework for hierarchical data, with exponentially expanding volume that enables efficient embedding of tree-like structures. Recent work has demonstrated improved semantic entailment and image-text alignment using hyperbolic embeddings.

The goal of this PhD project is to investigate the geometric foundations of multimodal representation learning, focusing on extending hyperbolic VLMs to temporal reasoning and exploring the limits of hierarchy as an organizing principle for video understanding. Key research questions include: Can hyperbolic geometry capture the full complexity of spatio-temporal-semantic relationships in video?  Do temporal dynamics require alternative geometric frameworks for periodic and cyclical patterns? How can geometric properties be leveraged to manipulate inter-modal and intra-modal relationships?

Scientific work

Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU
Àlex Pujol VidalKamal NasrollahiThomas B. MoeslundSergio Escalera
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2025.

Funding

This project is funded by Milestone Systems A/S as part of the Milestone Research Programme.

Contact

PhD Fellow: Alex Pujol Vidal
Email: alexpv@create.aau.dk
Webpage

Supervisor: Thomas B. Moeslund
Email: tbm@create.aau.dk
Webpage

Co-Supervisor: Kamal Nasrollahi
Email: kn@create.aau.dk
Webpage

Co-Supervisor: Sergio Escalera
Email: seg@create.aau.dk
Webpage