EVWSD-ITA

Introduction

Enhanced-VWSD for Italian is a task of EVALITA 2026. The task is based on the VWSD task proposed at SemEval 2023. In this task, the objective is to select an image, out of ten possible candidates, which correctly represents the sense of a target word in an input sentence. The sentence consists of a word of interest (that is, the target to disambiguate) and additional context words to support disambiguation.

We propose a new task that combines both high-level and fine-grained semantics. The goal is not only to identify the broad sense of the target word, but also to accurately recognise its specific sense. We propose using co-Hyponyms extracted from a semantic network to find hard negatives. That is, the images are related to the general sense of the target word, but they represent a different specific sense. We propose to improve the task by mixing the two types of images: 1. images for other senses of the target word (therefore, the same as the original VWSD challenge); 2. images that share the same broad sense as the target word.

Data and Further Information

Task Details: Given one query consisting of three or more words, find the correct image described by the query out of the ten possible ones.

Query Details: To create the query we combine: the lemma of the correct synset, one of the lemmas of the hypernym of the correct synset, and a word from the gloss of the correct synset. This process is done manually for the test set and automatically for the train set.

Images Details: All images will be resized to 336x336 to strike a balance between effectiveness and efficiency.

Use of External Data: Usage of data from other sources is allowed.
UPDATE: This train set and the future test set were created by leveraging BabelNet. To limit possible train-test set contamination, we ask the participants to not use BabelNet as data source for train set augmentation.

Copyright: Our work leverages BabelNet, the data is subject to its license. The BabelNet Non-Commercial License allows users to share and adapt data, as long as they belong to a research institution.

Evaluation metrics: HIT@1 and MRR. We will have two different leaderboards for each metric.
Given r = [r₁, ..., r_n], where n is the cardinality of the test set and r_i is the rank of the correct image given as output by the model, MRR is defined as: $$ MRR = \frac{1}{n} \sum_{i=1}^{n}{\frac{1}{r_i}} $$ This metric is used to evaluate the goodness of the ranking; the closer r_i is to 1 (i.e., the first position in the ranking), the better the result. HIT@1 is defined as: $$ HIT@1 = \frac{1}{n} \sum_{i=1}^{n}{I(r_i)} $$ where I is a function that returns 1 if r_i == 1 (i.e. the correct image is ranked first), and 0 otherwise. Therefore, this metric assesses the model's ability to select the correct image as the best possible candidate.

Training data has been released! You can find it at the following link.

ADDITIONAL DETAILS FOR THE TEST SET
For the test set, we provide a .jsonl file (for candidate images and query for each instance) and a .zip file (for images).
Each instance contains a textual query (manually written following the train set methodology) and a list of candidate image paths (identified by index).

Your submission should be a file containing the ranking (or prediction) for every instance in the ".jsonl" file.
We will provide baseline results using this model.

Finally, note that the words in the textual query have been shuffled to avoid possible attempts to leverage external tools (e.g. gaining information regarding the lemma of the target sense). TEST SET HAS BEEN RELEASED HERE.

BibTeX

TBD

Acknowledgements

TBD

Enhanced Visual Word Sense Disambiguation for Italian

Video Presentation

Introduction

Example of instance for EVWSD-ITA (limited to seven images rather than ten for visualization purposes)

Data and Further Information

BibTeX

Acknowledgements