Seminar by Erkut Erdem on April 8th @13.00, Seminar Room Z022, METU Research Park

Title: Bridging Vision and Language, and Beyond: A Unified Journey in Multimodal Generative AI

Abstract:

Over the past 10 years, our research has taken a unified approach to computer vision and natural language processing, developing methods to better process, understand, and manipulate visual data. In this talk, I will present a personal and comprehensive survey of our contributions to the field—now often recognized as Generative AI. In particular, I will discuss our past and recent efforts in modeling and benchmarking that not only integrate vision and language but also extend to additional modalities. I will highlight how a unified, multimodal perspective can drive progress in the field.

Bio:

Erkut Erdem is a Professor in the Department of Computer Engineering at Hacettepe University, Ankara, and is co-affiliated with the KUIS AI Center. He earned his Ph.D. from Middle East Technical University (METU) and did his postdoctoral studies at TELECOM ParisTech in France. He is one of the founding members of the Hacettepe University Computer Vision Laboratory (HUCVL). His research focuses on developing powerful methods to understand and manipulate visual data, with a special emphasis on using additional modalities, such as language, as complementary tools. In recognition of his contributions, he received the Outstanding Young Scientist Award (GEBIP) from the Turkish Academy of Sciences in 2018 and was recently awarded funding from the TÜBİTAK 2247-A National Outstanding Researchers Program. He also serves as an Associate Editor for the IEEE Transactions on Multimedia.