SemEval-2025 Task 1

AdMIRe - Advancing Multimodal Idiomaticity Representation

Development datasets should be used for model/system selection and to ensure that participants are able to make use of the CodaBench platform ahead of the evaluation phase.

Note that labels are not supplied for the development data here; output scoring should be done using the CodaBench platform. Labelled versions of the development datasets may be obtained from the ORDA repository. Note that the sentence_type and expected_order fields should not be consumed by systems and are made available for analysis purposes.

Subtask A - Static Images

English

English development data for Subtask A can be obtained here:

Subtask A Development Data - English

Portuguese

Portuguese development data for Subtask A can be obtained here:

Subtask A Development Data - Portuguese

Subtask B - Sequences

English development data for Subtask B can be obtained here:

Subtask B Development Data - English

Data Description

The development datasets contain the folllowing files:

Subtask A

AdMIRe Subtask A Dev.zip

English development data for Subtask A.

15 items.

See the Training Data page for a detailed description. Note that the fields sentence_type and expected_order are blank in the development dataset, with the latter being the target output.

AdMIRe Subtask A PT Dev.zip

Brazilian Portuguese development data for Subtask A.

10 items.

See the Training Data page for a detailed description. Note that the fields sentence_type and expected_order are blank in the development dataset, with the latter being the target output.

Subtask B

AdMIRe Subtask B Dev.zip

English training data for Subtask B.

5 items.

See the Training Data page for a detailed description. Note that the fields sentence_type and expected_order are blank in the development dataset, and are the target output for systems.