SemEval-2025 Task 1

AdMIRe - Advancing Multimodal Idiomaticity Representation

Development datasets should be used for model/system selection and to ensure that participants are able to make use of the CodaBench platform ahead of the evaluation phase.

Note that labels are not supplied for the development data here; output scoring should be done using the CodaBench platform. Labelled versions of the development datasets may be obtained from the ORDA repository. Note that the sentence_type and expected_order fields (labels) should not be consumed by systems and are made available for analysis purposes.

Data Description

The development datasets contain the folllowing files:

Subtask A

AdMIRe Subtask A Dev.zip

English development data for Subtask A.

15 items.

See the Training Data page for a detailed description. Note that the fields sentence_type and expected_order should be treated as blanks for development, with the latter being the target output.

AdMIRe Subtask A PT Dev.zip

Brazilian Portuguese development data for Subtask A.

10 items.

See the Training Data page for a detailed description. Note that the fields sentence_type and expected_order should be treated as blanks for development, with the latter being the target output.

Subtask B

AdMIRe Subtask B Dev.zip

English training data for Subtask B.

5 items.

See the Training Data page for a detailed description. Note that the fields sentence_type and expected_order should be treated as blanks for development, and are the target output for systems.