Metaphor Detection
Table 3: Results of metaphor detection.
Meme is an expressive medium that often conveys rich emotions and intentions. Recent studies have confirmed the critical role of metaphors in meme understanding. However, existing metaphor research heavily relies on manual annotations, and mainstream vision-language models (VLMs) still struggle with the recognition and comprehension of metaphors. 🌈 To address these challenges, we introduce MetaGPT, the first vision-language model specifically designed for meme metaphor understanding. MetaGPT is capable of identifying and extracting metaphors in memes, and generating accurate meme interpretations. Furthermore, we construct a dedicated dataset for meme understanding, MUnd, which comprises approximately 32,000 high-quality question-answer (QA) pairs across three core tasks: metaphor detection, metaphor domain extraction, and meme interpretation. Based on MUnd, we further propose an evaluation benchmark for meme understanding and conduct a comprehensive assessment of existing VLMs. Experimental results reveal that current models still face challenges in metaphor comprehension, while MetaGPT consistently outperforms them across all tasks, highlighting its potential in advancing meme understanding. Our code and appendix are available in the supplementary materials.
Table 1: Results of metaphor domain extraction. We use BERTScore F1 to compute the similarity between the predicted sourcetarget domain pairs and the references, under different thresholds τ ∈ {0.5, 0.6, 0.7, 0.8}. A prediction is considered correct if the score exceeds the threshold. '-' denotes that the model fails to perform this task and achieves a score of zero.
Table 2: Performance comparisons on meme interpretation. The top-2 scores are marked in bold and underlined, respectively. △ indicates the performance gap between our method and the best baseline.
Table 3: Results of metaphor detection.
Table 4: Human evaluation on metaphor domain extraction.