Publications

Listed below are references for publications that are related to the ReproHum project (having been down as part of the project, or recent related work on reproducibility by members of the project team):


A Metrological Perspective on Reproducibility in NLP

A Belz (2022)

Computational Linguistics 48 (4)


Quantified Reproducibility Assessment of NLP Results

A Belz, M Popovic, S Mille (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics


Reproducing a Manual Evaluation of Simplicity in Text Simplification System Outputs

M Popovic, R Huidrom, S Castilho, A Belz (2022)

International Natural Language Generation Conference


Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System

R Huidrom, O Dusek, Z Kasner, T Castro Ferrera, A Belz (2022)

International Natural Language Generation Conference


The Human Evaluation Datasheet: A Template for Recording Details of Human Evaluation Experiments in NLP

A Shimorina, A Belz (2022)

2nd Workshop on Human Evaluation of NLP Systems


The 2022 ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results

A Belz, A Shimorina, M Popovic, E Reiter (2022)

International Natural Language Generation Conference


Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)

A Belz, M Popović, E Reiter, A Shimorina (2022)

Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)


The ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results

A Belz, A Shimorina, S Agarwal, E Reiter (2021)

Proceedings of the 14th International Natural Language Generation Conference


A Reproduction Study of an Annotation-based Human Evaluation of MT Outputs

M Popovic, A Belz (2021)

Proceedings of the 14th International Natural Language Generation Conference


Another PASS: A Reproduction Study of the Human Evaluation of a Football Report Generation System

S Mille, T Castro Ferreira, B Davis, A Belz (2021)

Procceedings of the 14th International Conference on Natural Language


A Systematic Review of Reproducibility Research in Natural Language Processing

A Belz, S Agarwal, A Shimorina, E Reiter (2021)

EACL’21


Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

A Belz, S Agarwal, Y Graham, E Reiter, A Shimorina (2020)

Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)


Twenty Years of Confusion in Human Evaluation: NLG needs evaluation sheets and standardised definitions

D Howcroft, A Belz, D Gkatzia, S Hasan, S Mahamood, S Mille, M Clinciu, et al. (2020)

International Natural Language Generation Conference 2020 (INLG’20)


Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing

A Belz, S Mille, D Howcroft (2020)

International Natural Language Generation Conference 2020 (INLG’20)


ReproGen: Proposal for a Shared Task on Reproducibility of Human Evaluations in NLG

A Belz, S Agarwal, E Reiter, A Shimorina (2020)

International Natural Language Generation Conference 2020 (INLG’20)