Introduction:
This website presents the data used in the paper "Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models" It shows the coding errors made by different large language models (LLMs) on the HumanEval dataset. Each LLM contains at most 164 incorrect code snippets. Each incorrect code snippet may contain multiple error. For each error, we analyzed its semantic and syntactic characteristics.
Instructions for Use:
The website provides a data table to present detailed information on each error task of different LLMs. In addition, the website also uses visualization to allow users to observe the difference in the proportion of different bugs in different models.
Bibtex
If you found our data useful, please cite the following paper.
@inproceedings{wang2025towards, title={Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models}, author={Zhijie Wang and Zijie Zhou and Da Song and Yuheng Huang and Shengmai Chen and Lei Ma and Tianyi Zhang}, booktitle = {Proceedings of the 47th IEEE/ACM International Conference on software Engineering (ICSE '25)}, year={2025} }
© University of Alberta, University of Illinois Urbana-Champaign, The University of Tokyo, Purdue University