Seminar by Çağrı Toraman on May 5th @13.30, Seminar Room Z022, METU Research Park

Title: Bridging the Language Gap: Challenges and Strategies for Turkish Large Language Models

Abstract:

Developing effective large language models for low-resource languages such as Turkish faces significant challenges, primarily due to limited high-quality data and reliable adaptation methods. This presentation synthesizes findings from two recent studies focused on bridging this gap for Turkish. We first examine the critical need to evaluate the quality of benchmark datasets, revealing limitations that affect progress. We then explore practical strategies for adapting existing open-source generative large language models to improve their performance on Turkish. By highlighting insights from these case studies, the presentation will outline approaches to overcome resource constraints and advance the capabilities of Turkish large language models.

Bio:

Dr. Çağrı Toraman earned his BS, MS, and PhD degrees from the Department of Computer Engineering at Bilkent University in 2009, 2011, and 2017, respectively. He worked as a Software Engineer at HAVELSAN from 2017 to 2018. He was a Postdoctoral Research Scientist at the University of Central Florida, USA from 2018 to 2019. Between 2020 and 2023, he served as the Natural Language Processing Team Leader at ASELSAN. Since 2024, Dr. Toraman has been an Assistant Professor in the Department of Computer Engineering at Middle East Technical University, where he directs the Applied Natural Language Processing Research Laboratory. His research interests include Natural Language Processing, Generative Artificial Intelligence, and Social Informatics.