Building Robust Vision-Language Models for Multimodal Agents

mercredi 12 novembre 2025, 10:30 à 11:15

Lieu
Terminal de croisières de Québec
84 Rue Dalhousie, Québec, QC G1K 4C4
Salle
Salle RDVIAQ

Vision-language models (VLMs) have emerged as a core component in AI agents, enabling them to connect natural language instructions with the visual world of documents, interfaces, and environments. Yet, their ability to navigate and understand complex visual structures is brittle. In this talk, I will present a research trajectory that combines the (1) construction of targeted benchmarks that reveal the limitations of current VLMs, with the (2) development of VLM architectural innovations that illustrate how inductive biases can be introduced to address these gaps. Taken together, these efforts highlight how benchmarks and architectures co-evolve, and move us toward a new generation of VLMs capable of both understanding and acting in complex multimodal environments.

Perouz Taslakian
Perouz Taslakian
Research Lead, Law Data Learning Program
ServiceNow Research
Perouz Taslakian is an AI Research Scientist and Research Lead at ServiceNow…
Perouz Taslakian is an AI Research Scientist and Research Lead at ServiceNow Research, where she heads the Multimodal Foundation Models Program. Her research focuses on efficient large language inference and vision-language models, with the goal of advancing the reasoning capabilities of AI systems. She earned her PhD in Computer Science from McGill University, where she is now an Adjunct Professor contributing to both research and the training of highly qualified personnel. Previously, she served as professor and chair of the BSc in Computational Sciences program at the American University of Armenia. Perouz has authored numerous publications and patents in AI and machine learning. Her leadership emphasizes mentoring the next generation of AI researchers and fostering academic-industry partnerships that advance both fundamental science and real-world impact.