The ability to interpret word meanings in context is a core yet underexplored challenge for Large Language Models (LLMs). While these models demonstrate remarkable linguistic fluency, the extent to which they genuinely grasp word semantics remains an open question. In this talk, we investigate the disambiguation capabilities of state-of-the-art instruction-tuned LLMs, benchmarking their performance against specialized systems designed for Word Sense Disambiguation (WSD). We also examine lexical ambiguity as a persistent challenge in Machine Translation (MT), particularly when dealing with rare or context-dependent word senses. Through an in-depth error analysis of both disambiguation and translation tasks, we reveal systematic weaknesses in LLMs, shedding light on the fundamental challenges they face in semantic interpretation. Furthermore, we show the limitations of standard evaluation metrics in capturing disambiguation performance, reinforcing the need for more targeted evaluation frameworks. By presenting dedicated testbeds, we introduce more effective ways to assess lexical understanding both within and across languages. With this talk we highlight the gap between the impressive fluency of LLMs and their actual semantic comprehension, raising important questions about their reliability in critical applications. |