LLM Code Error

Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models

Zhijie Wang^1, Zijie Zhou^2, Da Song^1*, Yuheng Huang³, Shengmai Chen⁴, Lei Ma^3,1, Tianyi Zhang⁴

¹University of Alberta, ²University of Illinois Urbana-Champaign,

³The University of Tokyo, ⁴Purdue University

Introduction:

This website presents the data used in the paper "Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models" It shows the coding errors made by different large language models (LLMs) on the HumanEval dataset. Each LLM contains at most 164 incorrect code snippets. Each incorrect code snippet may contain multiple error. For each error, we analyzed its semantic and syntactic characteristics.

Instructions for Use:

The website provides a data table to present detailed information on each error task of different LLMs. In addition, the website also uses visualization to allow users to observe the difference in the proportion of different bugs in different models.

Data Table Page

The side bar on the left allows users to select single or multiple bugs for observation. When the user does not select anything, all bugs are displayed by default.
The side bar at the top shows the LLMs that were analyzed. Users can check error tasks in single or multiple models by selecting the model provided. The default is to display error tasks for all LLMs.
The middle table shows the error tasks. Users can find more detailed information by clicking on a single error task.

Visualization Page

The middle part of the page shows the proportional difference between the observed bugs. Users can touch the bar in the chart to view the percentage number. Displayed by default is the percentage of all bugs for all analyzed models.
The side bars on the left and top allow users to select the bugs and models they want. The default is to display all bugs and models.

Bibtex

If you found our data useful, please cite the following paper.

@inproceedings{wang2025towards,
 title={Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models},
 author={Zhijie Wang and Zijie Zhou and Da Song and Yuheng Huang and Shengmai Chen and Lei Ma and Tianyi Zhang},
 booktitle = {Proceedings of the 47th IEEE/ACM International Conference on software Engineering (ICSE '25)},
 year={2025}
}

Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models

Zhijie Wang1*, Zijie Zhou2*, Da Song1*, Yuheng Huang3, Shengmai Chen4, Lei Ma3,1, Tianyi Zhang4

1University of Alberta, 2University of Illinois Urbana-Champaign,

3The University of Tokyo, 4Purdue University

Data Table Page

Visualization Page

Zhijie Wang^1, Zijie Zhou^2, Da Song^1*, Yuheng Huang³, Shengmai Chen⁴, Lei Ma^3,1, Tianyi Zhang⁴

¹University of Alberta, ²University of Illinois Urbana-Champaign,

³The University of Tokyo, ⁴Purdue University