Lingxiao Jiang

Harnessing Deep Learning Models for Automated Generation of Program Transformation Rules

Large code bases, such as the Linux kernel, have many branches and variants, and making code modifications across all relevant locations in the code bases complex. For example, a bug fix done in a source file in one branch of the code base may need be adapted and merged into all branches and versions of the code base; the idea of the fix may be repetitive but the actual code changes may contain many differences specific to each branch/version. To simplify such code modifications in large scales, this seminar delves into the potentials and intricacies of utilizing pre-trained code & language models to generate program transformation rules and utilizing program transformation engines to automate code changes in large codebases, with a particular focus on Coccinelle semantic patches.

Our work evaluates the performance of three kinds of deep learning models - generic language models (such as BART, T5), code-specific models (such as CodeBERT, CodeT5), and chatbot models (such as GPT-4) - against a traditional heuristic pattern-based rule generation method. While our results indicate that deep learning models can outperform the traditional method in some scenarios, it is clear that the traditional method overall continues to provide superior results, suggesting the need for further refinement of deep learning models. We also provide a few case analyses of the rules generated by different methods, spotlighting the limitations of using deep learning models and possible future improvements.
 

back to overview

Watch Recording
Speaker Image
 

Biography

Lingxiao Jiang is an Associate Professor in the School of Computing and Information Systems (SCIS) at Singapore Management University (SMU) and a Deputy Director of the Centre for Research on Intelligent Software Engineering at SMU. He received his Ph.D. degree in Computer Science from the University of California, Davis in 2009, a Master's and a Bachelor's degree from the School of Mathematical Sciences at Peking University in 2003. His research interests span software engineering, program analysis, automated testing & debugging, and recently deep learning of code. He explores combinations of static and dynamic analysis with deep learning techniques across languages at various abstraction levels, aiming to provide practical techniques and tools for developers to increase productivity, enhance software reliability, and reduce development & maintenance cost.