We have recently published a survey Deep Learning-based Software Engineering: Progress, Challenges, and Opportunities (, ) in SCIENCE CHINA Information Sciences. Although we have tried our best to collect related papers publicly available before the publication of our survey paper, there must be closely related papers published after the cutoff date. To construct an updated list of related papers in learning-based software engineering, we construct this website to continuously collect relevant papers in the field and translate the latest cutting-edge developments.
We warmly welcome authors of relevant papers to actively submit their work to this website. They may accomplish it by any of the following methods:
- Post an
Issue
for the website. - Submit a
Pull Request
to update the website. - Send your work via email to
yanjiejiang@pku.edu.cn
Topics:
- Requirements Engineering
- Code Generation
- Code Search
- Code Summarization
- Software Refactoring
- Code Clone Detection
- Software Defect Prediction
- Bug Finding
- Fault Localization
- Program Repair
- Bug Report Management
- Developer Collaboration
- Technical Debt
Requirements Engineering
- A large-scale survey on the usability of AI programming assistants: Successes and challenges (ICSE, 2024)
- Reqgen: Keywords-driven software requirements generation (Mathematics, 2023)
- On-demand security requirements synthesis with relational generative adversarial networks (ICSE, 2023)
- Ai-based question answering assistance for analyzing natural-language requirements (ICSE, 2023)
- Prcbert: Prompt learning for requirement classification using bert-based pretrained language models (ASE, 2023)
- A cross-level requirement trace link update model based on bidirectional encoder representations from transformers (Mathematics, 2023)
- A study about the knowledge and use of requirements engineering standards in industry (TSE, 2022)
- Detecting privacy requirements from user stories with NLP transfer learning models (Information and Software Technology, 2022)
- A software requirements ecosystem: Linking forum, issue tracker, and faqs for requirements management (TSE, 2022)
- Automated handling of anaphoric ambiguity in requirements: A multi-solution study (ICSE, 2022)
- Detecting coreferent entities in natural language requirements (Requirements Engineering, 2022)
- Information retrieval versus deep learning approaches for generating traceability links in bilingual projects ( Empirical Software Engineering, 2022)
- An end-to-end deep learning system for requirements classification using recurrent neural networks (Information and Software Technology, 2022)
- Traceability transformed: Generating more accurate links with pre-trained Bert models (ICSE, 2021)
- A deep multitask learning approach for requirements discovery and annotation from open forum (ASE, 2021)
- Automating developer chat mining (ASE, 2021)
- Bidirectional language modeling: A systematic literature review (Science Programming, 2021)
- Classifying user requirements from online feedback in small dataset environments using deep learning (RE, 2021)
- Codebert: A pre-trained model for programming and natural languages (ACL, 2020)
- Caspar: Extracting and synthesizing user stories of problems from app reviews (ICSE, 2020)
- Automated extraction of requirement entities by leveraging lstm-crf and transfer learning (ICSME, 2020)
- Automating intention mining (TSE, 2020)
- Detection of hidden feature requests from massive chat messages via deep siamese network (ICSE, 2020)
- A deep context-wise method for coreference detection in natural language requirements (RE, 2020)
- Norbert: Transfer learning for requirements classification (RE, 2020)
- Predicting how to test requirements: An automated approach (RE, 2019)
- Extraction of system states from natural language requirements (RE, 2019)
- Automatic multi-class non-functional software requirements classification using neural networks (COMPSAC, 2019)
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining (Bioinformatics, 2019)
- Using the aman-da method to generate security requirements: A case study in the maritime domain (RE, 2018)
- Semantically enhanced software traceability using deep learning techniques (ICSE, 2017)
- Easy approach to requirements syntax (ears) (RE, 2009)
- Generating natural language specifications from UML class diagrams (RE, 2008)
- Deriving requirements from process models via the problem frames approach (Inf.Softw.Technol., 2005)
- Generating requirements from systems models using patterns: a case study (RE, 2005)
- Goal-oriented requirements engineering: a roundtrip from research to practice [enginering read engineering] (RE, 2004)
- Automating software requirements generation from business process models (PRISE, 2004)
- Deriving tabular event-based specifications from goal-oriented requirements models (Requir.Eng., 2004)
- The automated extraction of requirements from UML models (RE, 2003)
- Deriving operational software specifications from system goals (SIGSOFT, 2002)
- Inferring declarative requirements specifications from operational scenarios (Software Eng., 1998)
- From organization models to system requirements: A’cooperating agents’ approach (CoopIS, 1995)
Code Generation
- Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates (arXiv, 2024)
- Measuring Code Efficiency Optimization Capabilities with ACEOB (arXiv, 2024)
- E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group (arXiv, 2024)
- IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion (ESEC/FSE, 2024)
- Knowledge-Aware Code Generation with Large Language Models (ICPC, 2024)
- KareCoder: A New Knowledge-Enriched Code Generation System (ICSE, 2024)
- Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs (LREC-COLING, 2024)
- RepoMasterEval: Evaluating Code Completion via Real-World Repositories (arXiv, 2024)
- TACO: Topics in Algorithmic COde generation dataset (arXiv, 2023)
- Exploitgen: Template-augmentedexploit code generation based on codebert (Journal of Systems and Software, 2023)
- Skcoder: A sketch-based approach for automatic code generation (arXiv, 2023)
- Practitioners’ expectations on code completion (arXiv, 2023)
- Domain adaptive code completion via language models and decoupled domain databases (arXiv, 2023)
- Codemark: Imperceptible watermarking for code datasets against neural code completion models (arXiv, 2023)
- Learning deep semantics for test completion (arXiv, 2023)
- On the robustness of code generation techniques: An empirical study on GitHub copilot (arXiv, 2023)
- In ChatGPT we trust? measuring and characterizing the reliability of ChatGPT (arXiv, 2023)
- Codefill: Multi-token code completion by jointly learning from structure and naming sequences (ICSE, 2022)
- Incorporating domain knowledge through task augmentation for front-end javascript code generation (ESEC/FSE, 2022)
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning (NeurIPS, 2022)
- Lyra: A benchmark for turducken-style code generation (IJCAI, 2022)
- In-ide code generation from natural language: Promise and challenges (TOSEM, 2022)
- Compilable neural code generation with compiler feedback (ACL, 2022)
- Synchromesh: Reliable code generation from pre-trained language models (ICLR 2022)
- Codexglue: A machine learning benchmark dataset for code understanding and generation (NeurIPS, 2021)
- Analysis of tree-structured architectures for code generation (ACL-IJCNLP, 2021)
- Embedding api dependency graph for neural code generation (Empirical Software Engineering, 2021)
- Exploring dynamic selection of branch expansion orders for code generation (ACL, 2021)
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation (ACL, 2021)
- Unified pre-training for program understanding and generation (ACL, 2021)
- Retrieval augmented code generation and summarization (ACL, 2021)
- Measuring coding challenge competence with APPS ( NeurIPS, 2021)
- Code generation from natural language with less prior knowledge and more monolingual data (ACL, 2021)
- Improving tree-structured decoder training for code generation via mutual learning (AAAI, 2021)
- Multi-task learning based pre-trained language model for code completion (ASE 2021)
- Leveraging code generation to improve code retrieval and summarization via dual learning (WWW, 2020)
- Pymt5: Multi-mode translation of natural language and python code with transformers (ACL, 2020)
- Treegen: A tree-based transformer architecture for code generation (AAAI, 2020)
- A grammar-based structural cnn decoder for code generation (AAAI, 2019)
- Coupling retrieval and meta-learning for context-dependent semantic parsing (ACL, 2019)
- Code generation as a dual task of code summarization (NeurIPS, 2019)
- Spoc: Searchbased pseudocode to code (NeurIPS, 2019)
- Learning programmatic idioms for scalable semantic parsing (ACL, 2019)
- Coarse-to-fine decoding for neural semantic parsing (ACL, 2018)
- TRANX: A transition-based neural abstract syntax parser for semantic parsing and code generation (ACL, 2018)
- Semantic parsing with syntax- and table-aware SQL generation (ACL, 2018)
- Mapping language to code in programmatic context (ACL, 2018)
- Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task (EMNLP, 2018)
- Dlpaper2code: Auto-generation of code from deep learning research papers (AAAI, 2018)
- A retrieve-and-edit framework for predicting structured outputs (NeurIPS, 2018)
- A syntactic neural model for general-purpose code generation (ACL, 2017)
- Abstract syntax networks for code generation and semantic parsing (ACL, 2017)
- Latent predictor networks for code generation (ACL, 2016)
- Language to logical form with neural attention (ACL, 2016)
- Nl-based query refinement and contextualized code search results: A user study (CSMR-WCRE 2014)
- Empirical evaluation of gated recurrent neural networks on sequence modeling (arXiv, 2014)
- SNIFF: A search engine for java using free-form queries (ETAPS, 2009)
- Source code retrieval for bug localization using latent dirichlet allocation (WCRE 2008)
Code Search
- xcodeeval: A large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval (CoRR, 2023)
- Xcos: Explainable code search based on query scoping and knowledge graph (TOSEM, 2023)
- On the importance of building high-quality training datasets for neural code search (ICSE, 2022)
- Accelerating code search with deep hashing and code classification (ACL, 2022)
- Unixcoder: Unified cross-modal pre-training for code representation (ACL 2022)
- Coderetriever: Unimodal and bimodal contrastive learning (arXiv, 2022)
- Codexglue: A machine learning benchmark dataset for code understanding and generation (NeurIPS 2021)
- Cascaded fast and slow models for efficient semantic code search (CoRR, 2021)
- Interactive cross-language code retrieval with auto-encoders (ASE, 2021)
- Cosqa: 20, 000+ web queries for code search and question answering (ACL/IJCNLP 2021)
- Cross-language code search using static and dynamic analyses (ESEC/FSE, 2021)
- Automated query reformulation for efficient search based on query logs from stack overflow (ICSE, 2021)
- Graphsearchnet: Enhancing gnns via capturing global dependency for semantic code search. (CoRR, 2021)
- Graphcodebert: Pre-training code representations with data flow (ICLR, 2021)
- Neural code search revisited: Enhancing code snippet retrieval through natural language intent (CoRR, 2020)
- Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries (SANER, 2020)
- Codebert: A pre-trained model for programming and natural languages (ACL, 2020)
- Aroma: code recommendation via structural code search (Proc. ACM Program. Lang., 2019)
- Cross-language clone detection by learning over abstract syntax trees (MSR, 2019)
- Mohammad Masudur Rahman. Supporting code search with context-aware, analytics-driven, effective query reformulation (ICSE, 2019)
- Codesearchnet challenge: Evaluating the state of semantic code search (CoRR, 2019)
- ROSF: leveraging information retrieval and supervised learning for recommending code snippets (IEEE Trans. Serv. Comput., 2019)
- Multi-modal attention network learning for semantic source code retrieval (ASE 2019)
- Neural code search evaluation dataset (CoRR, 2019)
- Neural query expansion for code search (MAPL@PLDI, 2019)
- Deep code search (ICSE 2018)
- Retrieval on source code: a neural code search (MAPL@PLDI 2018)
- Graph embedding based code search in software project (Internetware, 2018)
- Staqc: A systematically mined question-code dataset from stack overflow (WWW, 2018)
- Learning to mine aligned code and natural language pairs from stack overflow (MSR, 2018)
- Exploring API embedding for API usages and applications (ICSE, 2017)
- Query expansion via wordnet for effective code search (SANER, 2015)
- Codehow: Effective code search based on API understanding and extended boolean model (E) (ASE, 2015)
- Improving source code search with natural language phrasal representations of method signatures (ASE, 2011)
- Sourcerer: a search engine for open source code supporting structure-based search (OOPSLA, 2006)
Code Summarization
- Bcgen: a comment generation method for bytecode (ASEJ, 2023)
- Snippet comment generation based on code context expansion (TOSEM, 2023)
- Re trans: Combined retrieval and transformer model for source code summarization (Entropy, 2022)
- ATOM: commit message generation based on abstract syntax tree and hybrid ranking (TSE, 2022)
- M2TS: multi-scale multi-modal approach based on transformer for source code summarization (ICPC, 2022)
- Ast-trans: Code summarization with efficient tree-structured attention (ICSE, 2022)
- Source code summarization with structural relative position guided transformer (SANER, 2022)
- Gt-simnet: Improving code automatic summarization via multi-modal similarity networks (J. Syst. Softw., 2022)
- Keyword-guided abstractive code summarization via incorporating structural and contextual information (Inf. Softw. Technol., 2022)
- Modeling hierarchical syntax structure with triplet position for source code summarization (ACL, 2022)
- MMF3: neural code summarization based on multi-modal fine-grained feature fusion (ESEM, 2022)
- Gypsum: learning hybrid representations for code summarization (ICPC, 2022)
- Reinforcementlearning-guided source code summarization using hierarchical attention (TSE, 2022)
- Boosting code summarization by embedding code structures (ACL, 2022)
- Api2com: On the improvement of automatically generated code comments using API documentations (ICPC, 2021)
- Retrieval-augmented generation for code summarization via hybrid GNN (ICLR, 2021)
- Haconvgnn: Hierarchical attention based convolutional graph neural network for code documentation generation in jupyter notebooks (ACL, 2021)
- Ensemble models for neural source code summarization of subroutines (ICSME, 2021)
- Cosqa: 20, 000+ web queries for code search and question answering (ACL/IJCNLP, 2021)
- Project-level encoding for neural source code summarization of subroutines (ICPC, 2021)
- Exploiting method names to improve code summarization: A deliberation multi-task learning approach (ICPC, 2021)
- Secnn: A semantic CNN parser for code comment generation (J. Syst. Softw., 2021)
- Improving code summarization with block-wise abstract syntax tree splitting (ICPC, 2021)
- CAST: enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees (ACL, 2021)
- Cocogum: Contextual code summarization with multi-relational gnn on umls (Microsoft, Tech. Rep., 2020)
- Transˆ3: A transformer-based framework for unifying code summarization and code search (CoRR, 2020)
- Deep code comment generation with hybrid lexical and syntactical information (Empir. Softw. Eng., 2020)
- Towards automatically generating block comments for code snippets (Inf. Softw. Technol., 2020)
- Pymt5: Multi-mode translation of natural language and python code with transformers (ACL, 2020)
- Improved code summarization via a graph neural network (ICPC, 2020)
- A transformer-based approach for source code summarization (ACL, 2020)
- Fret: Functional reinforced transformer with BERT for code summarization (IEEE Access, 2020)
- Retrieval-based neural source code summarization (ICSE, 2020)
- Augmenting java method comments generation wit context information based on neural networks (J. Syst. Softw., 2019)
- Structured neural summarization (ICLR, 2019)
- A neural model for generating natural language summaries of program subroutines (ICSE, 2019)
- Boosting neural commit message generation with code semantic analysis (ASE, 2019)
- Automatic generation of pull request descriptions (ASE, 2019)
- Code generation as a dual task of code summarization (Advances in neural information processing systems, 2019)
- Commit message generation for source code changes (IJCAI, 2019)
- Deep code comment generation (ICPC, 2018)
- Improving automatic source code summarization via deep reinforcement learning (ASE, 2018)
- Summarizing source code with transferred API knowledge (IJCAI, 2018)
- A neural framework for retrieval and summarization of source code (ASE 2018)
- A parallel corpus of python functions and documentation strings for automated code documentation and code generation (IJCNLP, 2017)
- Automatically generating commit messages from diffs using neural machine translation (ASE, 2017)
- Towards automatic generation of short summaries of commits (ICPC, 2017)
- Summarizing source code using a neural attention model (ACL, 2016)
- A convolutional attention network for extreme summarization of source code (ICML, 2016)
Software Refactoring
- Just-in-time code duplicates extraction (Information and Software Technology, 2023)
- An automated approach to extracting local variables (ESEC/FSE, 2023)
- Deep learning based feature envy detection boosted by real-world examples (ESEC/FSE, 2023)
- Automated Software Entity Matching Between Successive Versions (ASE, 2023)
- Detecting and Refactoring Feature Envy Based on Graph Neural Network (ISSRE, 2022)
- Recommending move method refactoring opportunities using structural and semantic representations of code (ICSME, 2022)
- On the value of oversampling for deep learning in software defect prediction (TSE, 2022)
- How to improve deep learning for software analytics: (a case study with code smell detection) (MSR, 2022)
- RefactoringMiner 2.0 (TSE, 2022)
- Deep learning based code smell detection (TSE, 2021)
- A deep method renaming prediction and refinement approach for Java projects (QRS, 2021)
- Graph neural network to dilute outliers for refactoring monolith application (AAAI, 2021)
- RefDiff 2.0: A multi-language refactoring detection tool (TSE, 2021)
- Local and Global Feature Based Explainable Feature Envy Detection (COMPSAC, 2021)
- Recent advances in deep learning (International Journal of Machine Learning and Cybernetics, 2020)
- Recommendation of Move Method Refactoring Using Path-Based Representation of Code (ICSEW, 2020)
- Feature requests-based recommendation of software refactorings (Empirical Software Engineering, 2020)
- Mlcq: Industry-relevant code smell data set (EASE, 2020)
- Deep Learning Anti-patterns from Code Metrics History (ICSME, 2019)
- Code2vec: Learning distributed representations of code (POPL, 2019)
- Bert: Pre-training of deep bidirectional transformers for language understanding (arXiv, 2019)
- Learning to spot and refactor inconsistent method names (ICSE, 2019)
- On learning meaningful code changes via neural machine translation (ICSE, 2019)
- Deep learning based feature envy detection (ASE, 2018)
- A novel heuristic and tool to detect move method refactoring opportunities (Journal of Systems and Software, 2018)
- Semeval-2017 task 4: Sentiment analysis in twitter (arXiv, 2017)
- Deep learning (nature, 2015)
- Identifying renaming opportunities by expanding conducted rename refactorings (TSE, 2015)
- Distributed representations of sentences and documents (PMLR, 2014)
- Ref-Finder: A refactoring reconstruction tool based on logic query templates (FSE, 2010)
Code Clone Detection
- CLCD-I: Cross-Language Clone Detection by Using Deep Learning with InferCode (Computers, 2023)
- Using a nearest-neighbour, bert-based approach for scalable clone detection (ICSME, 2022)
- Codebert for code clone detection: A replication study (IWSC, 2022)
- Modeling functional similarity in source code with graph-based siamese networks (TSE, 2022)
- Seed: Semantic graph based deep detection for type-4 clone (Reuse and Software Quality, 2022)
- A collaborative method for code clone detection using a deep learning model (Advances in Engineering Software, 2022)
- Learning Program Semantics with Code Representations: An Empirical Study (SANER, 2022)
- Bridging pre-trained models and downstream tasks for source code understanding (ICSE, 2022)
- Software system comparison with semantic source code embeddings (Empirical Software Engineering, 2022)
- Unified abstract syntax tree representation learning for cross-language program classification (ICPC, 2022)
- Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning (TOSEM, 2022)
- BigCloneBench Considered Harmful for Machine Learning (IWSC, 2022)
- FCCA: Hybrid Code Representation for Functional Clone Detection Using Attention Networks (IEEE Transactions on Reliability, 2021)
- Two-Pass Technique for Clone Detection and Type Classification Using Tree-Based Convolution Neural Network (Applied Sciences, 2021)
- Learn To Align: A Code Alignment Network For Code Clone Detection (APSEC, 2021)
- InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees (ICSE, 2021)
- Code Representation Based on Hybrid Graph Modelling (Neural Information Processing, 2021)
- Can Neural Clone Detection Generalize to Unseen Functionalities? (ASE, 2021)
- LVMapper: A Large-Variance Clone Detector Using Sequencing Alignment Approach (IEEE Access, 2020)
- Semantic Code Clone Detection Via Event Embedding Tree and GAT Network (QRS, 2020)
- A Deep Neural Network-Based Approach to Finding Similar Code Segments (IEICE Transactions on Information and Systems, 2020)
- SCDetector: software functional clone detection based on semantic tokens analysis (ASE, 2020)
- Sia-RAE: A Siamese Network based on Recursive AutoEncoder for Effective Clone Detection (APSEC, 2020)
- Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree (SANER, 2020)
- Functional code clone detection with syntax and semantics fusion learning (ISSTA, 2020)
- Review Sharing via Deep Semi-Supervised Code Clone Detection (IEEE Access, 2020)
- A Deep Learning Approach for a Source Code Detection Model Using Self-Attention (Complexity, 2020)
- SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge (IWSC, 2020)
- Neural detection of semantic code clones via tree-based convolution (ICPC, 2019)
- A novel neural source code representation based on abstract syntax tree (ICSE, 2019)
- Fast Code Clone Detection Based on Weighted Recursive Autoencoders (IEEE Access, 2019)
- From Local to Global Semantic Clone Detection (DSA, 2019)
- Find Me if You Can: Deep Software Clone Detection by Exploiting the Contest between the Plagiarist and the Detector (AAAI, 2019)
- Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection (SANER, 2019)
- Go-clone: graph-embedding based clone detector for Golang (ISSTA, 2019)
- Vulnerable Code Clone Detection for Operating System Through Correlation-Induced Learning (IEEE Transactions on Industrial Informatics, 2019)
- Capturing source code semantics via tree-based convolution over API-enhanced AST (Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019)
- TECCD: A Tree Embedding Approach for Code Clone Detection (ICSME, 2019)
- Cross-Language Clone Detection by Learning Over Abstract Syntax Trees (MSR, 2019)
- CLCDSA: Cross Language Code Clone Detection using Syntactical Features and API Documentation (ASE, 2019)
- Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification (SANER, 2019)
- Towards Automating Precision Studies of Clone Detectors (ICSE, 2019)
- SeSaMe: A Data Set of Semantically Similar Java Methods (MSR, 2019)
- DeepSim: deep learning code functional similarity (ESEC/FSE, 2018)
- Oreo: detection of clones in the twilight zone (ESEC/FSE, 2018)
- CCDLC Detection Framework-Combining Clustering with Deep Learning Classification for Semantic Clones (ICMLA, 2018)
- Positive and unlabeled learning for detecting software functional clones with adversarial training (IJCAI, 2018)
- Deep learning similarities from different representations of source code (MSR, 2018)
- CCAligner: a token based large-gap clone detector (ICSE, 2018)
- LICCA: A tool for cross-language clone detection (SANER, 2018)
- Clone-Slicer: Detecting Domain Specific Binary Code Clones through Program Slicing (FEAST, 2018)
- A deep learning approach to program similarity (MASES, 2018)
- Clone-hunter: accelerated bound checks elimination via binary code clone detection (MAPL, 2018)
- On the Use of Machine Learning Techniques Towards the Design of Cloud Based Automatic Code Clone Validation Tools (SCAM, 2018)
- CCLearner: A Deep Learning-Based Clone Detection Approach (ICSME, 2017)
- Fast and Flexible Large-Scale Clone Detection with CloneWorks (ICSE-C, 2017)
- Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code (IJCAI, 2017)
- VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery (IEEE Symposium on Security and Privacy (SP), 2017)
- Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection (CCS, 2017)
- Sourcerercc: Scaling code clone detection to big-code (ICSE, 2016)
- Deep learning code fragments for code clone detection (ASE, 2016)
- Semantic Clone Detection Using Machine Learning (ICMLA, 2016)
- Convolutional Neural Networks over Tree Structures for Programming Language Processing (AAAI, 2016)
- Mining revision histories to detect cross-language clones without intermediates (ASE, 2016)
- Scalable Graph-based Bug Search for Firmware Images (CCS, 2016)
- Towards a Big Data Curated Benchmark of Inter-project Code Clones (ICSME, 2014)
- Qualitas.class corpus: a compiled version of the qualitas corpus (ACM SIGSOFT Software Engineering Notes, 2013)
- NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization (ICPC, 2008)
- DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones (ICSE, 2007)
- Signature Verification using a “Siamese” Time Delay Neural Network (Proceedings of the 6th International Conference on Neural Information Processing Systems, 1993)
Software Defect Prediction
- DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction (TSE, 2023)
- Defect Prediction via Tree-Based Encoding with Hybrid Granularity for Software Sustainability (IEEE Transactions on Sustainable Computing, 2023)
- Effort-Aware Just-in-Time Bug Prediction for Mobile Apps Via Cross-Triplet Deep Feature Embedding (IEEE Transactions on Reliability, 2022)
- ACGDP: An Augmented Code Graph-Based System for Software Defect Prediction (IEEE Transactions on Reliability, 2022)
- Software defect prediction employing BiLSTM and BERT-based semantic feature (Soft Computing, 2022)
- MPT-embedding: An unsupervised representation learning of code for software defect prediction (Journal of Software: Evolution and Process, 2021)
- GCN2defect : Graph Convolutional Networks for SMOTETomek-based Software Defect Prediction (ISSRE, 2021)
- Software Defect Prediction Based on Gated Hierarchical LSTMs (IEEE Transactions on Reliability, 2021)
- Joint feature representation learning and progressive distribution matching for cross-project defect prediction (Information and Software Technology, 2021)
- Software defect prediction based on stacked sparse denoising autoencoders and enhanced extreme learning machine (IET Software, 2021)
- Just-in-time software defect prediction using deep temporal convolutional networks (Neural Computing and Applications, 2021)
- Software visualization and deep transfer learning for effective software defect prediction (ICSE, 2020)
- Within-project and cross-project just-in-time defect prediction based on denoising autoencoder and convolutional neural network (IET Software, 2020)
- Deep Semantic Feature Learning for Software Defect Prediction (TSE, 2020)
- Software defect prediction via LSTM (IET Software, 2020)
- PathPair2Vec: An AST path pair-based code representation method for defect prediction (Journal of Computer Languages, 2020)
- SLDeep: Statement-level software defect prediction using deep-learning model on static code features (Expert Systems with Applications, 2020)
- How Well Do Change Sequences Predict Defects? Sequence Learning from Software Changes (TSE, 2020)
- Defect Prediction With Semantics and Context Features of Codes Based on Graph Representation Learning (IEEE Transactions on Reliability, 2020)
- Cross-project defect prediction via transferable deep learning-generated and handcrafted features (SEKE, 2019)
- DeepJIT: An End-to-End Deep Learning Framework for Just-in-Time Defect Prediction (MSR, 2019)
- Improving defect prediction with deep forest (Information and Software Technology, 2019)
- LDFR: Learning deep feature representation for software defect prediction (Journal of Systems and Software, 2019)
- Iterated feature selection algorithms with layered recurrent neural network for software fault prediction (Expert Systems with Applications, 2019)
- Lessons Learned from Using a Deep Tree-Based Model for Software Defect Prediction in Practice (MSR, 2019)
- Cross-project Defect Prediction via ASTToken2Vec and BLSTM-based Neural Network (IJCNN, 2019)
- Learning Semantic Features for Software Defect Prediction by Code Comments Embedding (ICDM, 2018)
- Connecting software metrics across versions to predict defects (SANER, 2018)
- Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning (Information and Software Technology, 2018)
- Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction (ICTAI, 2017)
- Software Defect Prediction via Convolutional Neural Network (QRS, 2017)
- Deep Learning for Just-in-Time Defect Prediction (QRS, 2015)
Bug Finding
- Software Testing With Large Language Models: Survey, Landscape, and Vision (TSE, 2024)
- Large Language Model guided Protocol Fuzzing (NDSS Symposium, 2024)
- UPBEAT: Test Input Checks of Q# Quantum Libraries (ISSTA, 2024)
- A Generative and Mutational Approach for Synthesizing Bug-Exposing Test Cases to Guide Compiler Fuzzing (ESEC/FSE 2023)
- Large Language Models for Software Engineering: A Systematic Literature Review (TOSEM, 2023)
- Learning to Boost Disjunctive Static Bug-Finders (ICSE, 2023)
- The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models (arXiv, 2023)
- A survey on neural-symbolic learning systems (Neural Networks, 2023)
- Revisiting Neural Program Smoothing for Fuzzing (ESEC/FSE, 2023)
- CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models (ICSE, 2023)
- Efficient Mutation Testing via Pre-Trained Language Models (arXiv, 2023)
- ChatUniTest: A Framework for LLM-Based Test Generation (arXiv, 2023)
- A3Test: Assertion-Augmented Automated Test case generation (Information and Software Technology, 2023)
- Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned (ESEC/FSE, 2023)
- Towards More Realistic Evaluation for Neural Test Oracle Generation (ISSTA, 2023)
- No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation (arXiv, 2023)
- Learning Seed-Adaptive Mutation Strategies for Greybox Fuzzing (ICSE, 2023)
- Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models (ISSTA, 2023)
- Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT (arXiv, 2023)
- WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models (arXiv, 2023)
- Fuzz4All: Universal Fuzzing with Large Language Models (arXiv, 2023)
- Learning Deep Semantics for Test Completion (ICSE, 2023)
- RegFuzz: A Linear Regression-Based Approach for Seed Scheduling in Directed Fuzzing (Electronics, 2023)
- Evaluating and Improving Hybrid Fuzzing (ICSE, 2023)
- Detecting JVM JIT Compiler Bugs via Exploring Two-Dimensional Input Spaces (ICSE, 2023)
- Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing (ICSE, 2023)
- Efficiency Matters: Speeding Up Automated Testing with GUI Rendering Inference (ICSE, 2023)
- Badge: Prioritizing UI Events with Hierarchical Multi-Armed Bandits for Automated UI Testing (ICSE, 2023)
- APICAD: Augmenting API Misuse Detection through Specifications from Code and Documents (ICSE, 2023)
- Enhancing REST API Testing with NLP Techniques (ISSTA, 2023)
- Adaptive REST API Testing with Reinforcement Learning (ASE, 2023)
- On the Structure of the Boolean Satisfiability Problem: A Survey (ACM Computing Surveys, 2023)
- Machine Learning Methods in Solving the Boolean Satisfiability Problem (Machine Intelligence Research, 2023)
- LeanDojo: Theorem Proving with Retrieval-Augmented Language Models (NeurIPS, 2023)
- Ranking LLM-Generated Loop Invariants for Program Verification (arXiv, 2023)
- Do bugs lead to unnaturalness of source code? (ESEC/FSE, 2022)
- Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing (arXiv, 2022)
- Evaluating and improving neural program-smoothing-based fuzzing (ICSE, 2022)
- Generating accurate assert statements for unit test cases using pretrained transformers (AST, 2022)
- Call Me Maybe: Using NLP to Automatically Generate Unit Test Cases Respecting Temporal Constraints (ASE, 2022)
- TOGA: a neural method for test oracle generation (ICSE, 2022)
- Neuroevolution-Based Generation of Tests and Oracles for Games (ASE, 2022)
- Fuzzing: A Survey for Roadmap (ACM Computing Surveys, 2022)
- Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning-guided Fuzzing (ASE, 2022)
- Avgust: automating usage-based test generation from videos of app executions (ESEC/FSE, 2022)
- SymTuner: maximizing the power of symbolic execution by adaptively tuning external parameters (ICSE, 2022)
- HyperTree Proof Search for Neural Theorem Proving (NeurIPS, 2022)
- Autoformalization with Large Language Models (NeurIPS, 2022)
- Diversity-driven automated formal verification (ICSE, 2022)
- Self-Supervised Bug Detection and Repair (NeurIPS, 2021)
- A Survey on Machine Learning Techniques for Source Code Analysis (arXiv, 2021)
- Learning type annotation: is big data enough? (ESEC/FSE, 2021)
- Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection (arXiv, 2021)
- The Art, Science, and Engineering of Fuzzing: A Survey (TSE, 2021)
- Reinforcement Learning-based Hierarchical Seed Scheduling for Greybox Fuzzing (UC Riverside, 2021)
- Automated conformance testing for JavaScript engines via deep compiler fuzzing (PLDI, 2021)
- Graph-Based Fuzz Testing for Deep Learning Inference Engines (ICSE, 2021)
- Automatic Web Testing Using Curiosity-Driven Reinforcement Learning (ICSE, 2021)
- FIGCPS: Effective Failure-inducing Input Generation for Cyber-Physical Systems with Deep Reinforcement Learning (ASE, 2021)
- Deep GUI: Black-box GUI Input Generation with Deep Learning (ASE, 2021)
- Learning to Explore Paths for Symbolic Execution (CCS, 2021)
- Synthesize solving strategy for symbolic execution (ISSTA, 2021)
- Boosting symbolic execution via constraint solving time prediction (experience paper) (ISSTA, 2021)
- Have You been Properly Notified? Automatic Compliance Analysis of Privacy Policy Text with GDPR Article 13 (WWW, 2021)
- Safe systems programming in Rust (Communications of the ACM, 2021)
- Learning graph-based heuristics for pointer analysis without handcrafting application-specific features (OOPSLA, 2020)
- Learning fast and precise numerical analysis (PLDI, 2020)
- MTFuzz: fuzzing with a multi-task neural network (ESEC/FSE, 2020)
- Unit Test Case Generation with Transformers and Focal Context (arXiv, 2020)
- On learning meaningful assert statements for unit test cases (ICSE, 2020)
- Reinforcement learning based curiosity-driven testing of Android applications (ISSTA, 2020)
- Efficient multiplex symbolic execution with adaptive search strategy (ASE, 2020)
- Making symbolic execution promising by learning aggressive state-pruning strategy (ESEC/FSE, 2020)
- API-misuse detection driven by fine-grained API-constraint knowledge graph (ASE, 2020)
- RTFM! Automatic Assumption Discovery and Verification Derivation from Library Document for API Misuse Detection (CCS, 2020)
- Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer (USENIX, 2020)
- Zeror: speed up fuzzing with coverage-sensitive tracing and scheduling (ASE, 2020)
- FuzzGuard: Filtering out Unreachable Inputs in Directed Grey-box Fuzzing through Deep Learning (USENIX, 2020)
- Resource-Aware Program Analysis Via Online Abstraction Coarsening (ICSE, 2019)
- NL2Type: Inferring JavaScript Function Types from Natural Language Information (ICSE, 2019)
- NEUZZ: Efficient Fuzzing with Neural Program Smoothing (SP, 2019)
- Machine Learning Applied to Software Testing: A Systematic Mapping Study (IEEE Transactions on Reliability, 2019)
- Neufuzz: Efficient fuzzing with deep neural network (IEEE Access, 2019)
- Learning-Guided Network Fuzzing for Testing Cyber-Physical System Defences (ASE, 2019)
- Learning to Fuzz from Symbolic Execution with Application to Smart Contracts (CCS, 2019)
- A Survey of Symbolic Execution Techniques (ACM Computing Surveys, 2019)
- Concolic testing with adaptively changing search heuristics (ESEC/FSE, 2019)
- A Large-Scale Empirical Study on Code-Comment Inconsistencies (ICPC, 2019)
- Ares: Inferring Error Specifications through Static Analysis (ASE, 2019)
- DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing (AAAI, 2019)
- Full-Speed Fuzzing: Reducing Fuzzing Overhead through Coverage-Guided Tracing (SP, 2019)
- Fuzzing: a survey (Cybersecurity, 2018)
- Compiler fuzzing through deep learning (ISSTA, 2018)
- Automatically generating search heuristics for concolic testing (ICSE, 2018)
- Learning to Accelerate Symbolic Execution via Code Transformation (ECOOP, 2018)
- GSP: an automatic programming technique with gravitational search algorithm (Applied Intelligence, 2018)
- Fuzzing for software security testing and quality assurance (2018)
- Angora: Efficient Fuzzing by Principled Search (SP, 2018)
- Automatically generating features for learning program analysis heuristics for C-like languages (OOPSLA, 2017)
- Machine-Learning-Guided Selectively Unsound Static Analysis (ICSE, 2017)
- Data-driven context-sensitivity for points-to analysis (OOPSLA, 2017)
- SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits (CCS, 2017)
- Learn&Fuzz: Machine learning for input fuzzing (ASE, 2017)
- On the naturalness of software (Communications of the ACM, 2016)
- Automated Analysis of Privacy Requirements for Mobile Apps (AAAI, 2016)
- ICON: Inferring Temporal Constraints from Natural Language API Descriptions (ICSME, 2016)
- APISan: Sanitizing API Usages through Semantic Cross-Checking (USENIX, 2016)
- APEx: automated inference of error specifications for C APIs (ASE, 2016)
- In defense of soundiness: A manifesto (Communications of the ACM, 2015)
- Learning to Execute (arXiv, 2014)
- Enhancing symbolic execution with veritesting (ICSE, 2014)
- Regression testing minimization, selection and prioritization: a survey (ISSTA, 2013)
- Distributed Representations of Words and Phrases and their Compositionality (NIPS, 2013)
- AddressSanitizer: A Fast Address Sanity Checker (USENIX, 2012)
- Software Abstractions: logic, language, and analysis (2012)
- @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies (Software Testing, Verification and Validation, 2012)
- Expect the unexpected: error code mismatches between documentation and the real world (PASTE, 2010)
- Formal verification of a realistic compiler (Communications of the ACM, 2009)
- seL4: formal verification of an OS kernel (SOSP, 2009)
- ThreadSanitizer: data race detection in practice (WBIA, 2009)
- /icomment: bugs or bad comments?/ (SOSP, 2007)
- CP-Miner: finding copy-paste and related bugs in large-scale software code (TSE, 2006)
- PEP 8–style guide for python code (2001)
- Bugs as deviant behavior: a general approach to inferring errors in systems code (ACM SIGOPS Operating Systems Review, 2001)
- Java coding style guide (2000)
- A study of effective regression testing in practice (ISSRE, 1997)
- Structure and Interpretation of Computer Programs (1996)
- On the approximate realization of continuous mappings by neural networks (Neural Networks, 1989)
- Natural deduction as higher-order resolution (The Journal of Logic Programming, 1986)
- Lint, a C program checker (1977)
- Classes of Recursively Enumerable Sets and Their Decision Problems (Transactions of the American Mathematical Society, 1953)
Fault Localization
- Gnet4fl: effective fault localization via graph convolutional neural network (ASE, 2023)
- Context-aware neural fault localization (TSE, 2023)
- Automatic bug localization using a combination of deep learning and model transformation through node classification (SQJ, 2023)
- Gmbfl: Optimizing mutation-based fault localization via graph representation. (ICSME), 2023)
- A light-weight data augmentation method for fault localization (IST, 2023)
- Mitigating the effect of class imbalance in fault localization using context-aware generative adversarial network (ICPC, 2023)
- Influential global and local contexts guided trace representation for fault localization (TSEM, 2023)
- Fault localization to detect co-change fixing locations (ESEC, 2022)
- Context-based cluster fault localization (ICPC, 2022)
- Graph neural network based two-phase fault localization approach (APSI, 2022)
- Fast changeset-based bug localization with bert (ICSE, 2022)
- Bcl-fl: A data augmentation approach with between-class learning for fault localization (SANER, 2022)
- Learning to construct better mutation faults (ICASE, 2022)
- Improving fault localization using model-domain synthesized failing test generation (ICSME, 2022)
- A universal data augmentation approach for fault localization (ICSE, 2022)
- Boosting coverage-based fault localization via graph-based representation learning. (ESEC, 2021)
- Agfl: a graph convolutional neural network-based method for fault localization (QRS, 2021)
- A study of effectiveness of deep learning in locating real faults (IST, 2021)
- Improving deep-learning-based fault localization with resampling (JSEP, 2021)
- Hierarchically localizing software faults using dnn (TR, 2020)
- Bugpecker: Locating faulty methods with deep learning on revision graphs (ASE, 2020)
- Learning a graph-based classifier for fault localization (SCIS, 2020)
- Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization (ISSTA, 2019)
- Cnn-fl: An effective approach for localizing faults using convolutional neural networks (SANER, 2019)
- Fault localization with code coverage representation learning (ICSE, 2019)
- Bears: An extensible java bug benchmark for automatic program repair studies (SANER, 2019)
- Deep learning-based fault localization with contextual information (TIS, 2017)
- Fault localization analysis based on deep neural network (MPE, 2016)
- Defects4j: A database of existing faults to enable controlled testing studies for java programs (ISSTA, 2014)
- The manybugs and introclass benchmarks for automated repair of c programs (TSE, 2015)
- Effective software fault localization using an rbf neural network (TR, 2012)
- Bp neural network-based effective fault localization (International Journal of Software Engineering and Knowledge Engineering, 2009)
- Supporting controlled experimentation with testing techniques:An infrastructure and its potential impact (ESE, 2005)
Program Repair
- ThinkRepair: Self-Directed Automated Program Repair (ISSTA, 2024)
- MarsCode Agent: AI-native Automated Bug Fixing (arXiv, 2024)
- Synshine: Improved fixing of syntax errors (TSE, 2023)
- Learning approximate execution semantics from traces for binary function similarity (TSE, 2023)
- Program repair with repeated learning (TSE, 2023)
- Neural transfer learning for repairing security vulnerabilities in C code (TSE, 2023)
- Seqtrans: Automatic vulnerability fix via sequence to sequence learning (TSE, 2023)
- Seq2parse: neurosymbolic parse error repair (PACMPL, 2022)
- Transrepair: Context-aware program repair for compilation errors. (ASE, 2022)
- DEAR: A novel deep learning-based approach for automated program repair (ICSE, 2022)
- M3V: multi-modal multi-view context embedding for repair operator prediction (CGO, 2022)
- Improving fault localization and program repair with deep semantic features and transferred knowledge (ICSE, 2022)
- Impact of defect instances for successful deep learning-based automatic program repair (ICSME, 2022)
- Deepdiagnosis: Automatically diagnosing faults and recommending actionable fixes in deep learning programs (ICSE, 2022)
- Bug-transformer: Automated program repair using attention-based deep neural network (CSC, 2022)
- Crex:Predicting patch correctness in automated repair of C programs through transfer learning of execution semantics (IST, 2022)
- CODIT: code editing with tree-based neural models (TSE, 2022)
- Neural program repair with execution-based backpropagation (ICSE, 2022)
- Automated classification of overfitting patches with statically extracted code features (TSE, 2022)
- Selfapr: Self-supervised program repair with test execution diagnostics (ASE, 2022)
- Less training, more repairing please: revisiting automated program repair via zero-shot learning (ESEC, 2022)
- An empirical study of deep transfer learning-based program repair for kotlin projects (ESEC, 2022)
- Predicting patch correctness based on the similarity of failing test cases (TOSEM, 2022)
- CIRCLE:continual repair across programming languages (ISSTA, 2022)
- Gui-guided test script repair for mobile apps (TSE, 2022)
- Automated patching for unreproducible builds (ICSE, 2022)
- Styler: learning formatting conventions to repair checkstyle violations (ESE, 2022)
- SPVF: security property assisted vulnerability fixing via attention-based models (ESE, 2022)
- Repairing security vulnerabilities using pre-trained programming language models (DSN-W, 2022)
- Bugbuilder: An automated approach to building bug repository (TSE, 2022)
- Vul4j: A dataset of reproducible java vulnerabilities geared towards the study of program repair techniques (MSR, 2022)
- Self-supervised bug detection and repair (NIPS, 2021)
- Samplefix: Learning to generate functionally diverse fixes (CCIS, 2021)
- Break-it-fix-it: Unsupervised learning for program repair (ICLM, 2021)
- Learning lenient parsing & typing via indirect supervision (ESE, 2021)
- A robustly optimized BERT pre-training approach with post-training (CCL, 2021)
- Varfix: balancing edit expressiveness and search effectiveness in automated program repair (ESEC, 2021)
- Sequencer:Sequence-to-sequence learning for end-to-end program repair (TSE, 2021)
- CURE: code-aware neural machine translation for automatic program repair (ICSE, 2021)
- Grammar-based patches generation for automated program repair. (ACL, 2021)
- Application of seq2seq models on code correction (Frontiers Artif. Intell, 2021)
- A bidirectional LSTM language model for code evaluation and repair (Symmetry, 2021)
- Tfix: Learning to fix coding errors with a text-to-text transformer (ICML, 2021)
- Grasp: Graph-to-sequence learning for automated program repair (QRS, 2021)
- Detecting and fixing nonidiomatic snippets in python source code with deep learning (ISA, 2021)
- Extracting concise bug-fixing patches from human-written patches in version control systems (ICSE, 2021)
- Crossvul: A cross-language vulnerability dataset with commit data (ESEC, 2021)
- C-3PR: A bot for fixing static analysis violations via pull requests (SANER, 2020)
- Dlfix: context-based code transformation learning for automated program repair (ICSE, 2020)
- GGF: A graph-based method for programming language syntax error correction (ICPC, 2020)
- Graph-based, self-supervised program repair from diagnostic feedback (ICLM, 2020)
- Patching as translation: the data and the metaphor (ASE, 2020)
- Applying deep learning algorithm to automatic bug localization and repair (SAC, 2020)
- Coconut: combining contextaware neural translation models using ensemble for program repair (ISSTA, 2020)
- Evaluating representation learning of code changes for predicting patch correctness in program repair (ASE, 2020)
- Hoppity: Learning graph transformations to detect and fix bugs in programs (ICLR, 2020)
- On learning meaningful code changes via neural machine translation (ICSE, 2019)
- Deepdelta: learning to repair compilation errors (ESEC, 2019)
- Deep reinforcement learning for syntactic error repair in student programs (AAAI, 2019)
- Progress on software crash research (SCIENTIA SINICA Informationis, 2019)
- Harnessing evolution for multi-hunk program repair (ICSE, 2019)
- Tbar: revisiting template-based automated program repair (ISSTA, 2019)
- Sorting and transforming program repair ingredients via deep learning code similarities (SANER, 2019)
- An empirical investigation into learning bug-fixing patches in the wild via neural machine translation. (ASE, 2019)
- An automatic semantic code repair service based on deep learning for programs with single error (SERVICES, 2019)
- History-driven build failure fixing: how far are we? (ISSTA, 2019)
- Neuro-symbolic program corrector for introductory programming assignments (ICSE, 2018)
- Compilation error repair: for the student programs, from the student programs (SEET, 2018)
- Syntax and sensibility: Using language models to detect and correct syntax errors (SANER, 2018)
- Bugs.jar: a large-scale, diverse dataset of real-world java bugs (MSR, 2018)
- Visual web test repair (ESEC, 2018)
- Hirebuild: an automatic approach to history-driven repair of build scripts (ICSE, 2018)
- Learning to repair software vulnerabilities with generative adversarial networks (NIPS, 2018)
- Blackbox, five years on: An evaluation of a largescale programming data collection project (ICER 2018)
- Do automated program repair techniques repair hard and important bugs? (ICSE, 2018)
- Deepfix: Fixing common C language errors by deep learning (AAAI, 2017)
- Automatically diagnosing and repairing error handling bugs in C (ESEC, 2017)
- Nopol: Automatic repair of conditional statement bugs in java programs (TSE, 2017)
- Seqgan: Sequence generative adversarial nets with policy gradient (AAAI, 2017)
- Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset (ESE, 2017)
- Vurle: Automatic vulnerability detection and repair by learning from examples (Computer Security -ESORICS, 2017)
- Automatic repair of real bugs in java: A large-scale experiment on the defects4j dataset (ESE,2017)
- Angelix: scalable multiline program patch synthesis via symbolic analysis (ICSE, 2016)
- Prutor: A system for tutoring cs1 and collecting student programs for analysis (ARXIV, 2016)
- Automatic patch generation by learning correct code (ACM SIGPLAN Notices, 2016)
- relifix: Automated repair of software regressions (ICSE, 2015)
- Blackbox: a large scale repository of novice programmers’ activity (SIGCSE, 2014)
- Defects4j: a database of existing faults to enable controlled testing studies for java programs. (ISSTA, 2014)
- Semfix: program repair via semantic analysis (ICSE, 2013)
- Genprog: A generic method for automatic software repair (TSE, 2012)
- A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each (ICSE, 2012)
- Automatic error correction of java prnizaograms (FMICS 2010)
- A Practical Method for Lr and Ll Syntactic Error Diagnosis and Recovery (TOPLAS, 1987)
- Locally least-cost error recovery in early’s algorithm (TOPLAS, 1981)
- Practical syntactic error recovery (Communications of the ACM, 1975)
- A minimum distance error-correcting parser for context-free languages (SIAM, 1972)
# Bug Report Management
- Duplicate bug report detection: How far are we? (TOSEM, 2023)
- Leveraging multi-level embeddings for knowledge-aware bug report reformulation. (JSS, 2023)
- Does deep learning improve the performance of duplicate bug report detection? (JSS, 2023)
- Deep learning and gradient-based extraction of bug report features related to bug fixing time (Front.Comput.Sci., 2023)
- Gen-FL: Quality prediction-based filter for automated issue title generation (JSS, 2023)
- bjXnet: an improved bug localization model based on code property graph and attention mechanism (Automated Software Engineering, 2023)
- Fast changeset-based bug localization with bert (ICSE, 2022)
- Automatic bug triaging via deep reinforcement learning (APPL.SCI, 2022)
- Modeling function-level interactions for file-level bug localization (ESE, 2022)
- Duplicate bug report detection by using sentence embedding and fine-tuning (ICSME, 2021)
- How to cherry pick the bug report for better summarization? (ESE, 2021)
- Automatically recommending components for issue reports using deep learning (ESE, 2021)
- Automatically matching bug reports with related app reviews (ICSE, 2021)
- Automating intention mining (TSE, 2020)
- Bugpecker: Locating faulty methods with deep learning on revision graphs (ASE, 2020)
- Duplicate bug report detection using dual-channel convolutional neural networks (ICPC, 2020)
- Hindbr: Heterogeneous information network based duplicate bug report prediction (ISSRE, 2020)
- Bugsum: Deep context understanding for bug report summarization (ICPC, 2020)
- Stay professional and efficient:automatically generate titles for your bug reports (ASE, 2020)
- Multi-dimension convolutional neural network for bug localization (TSC, 2020)
- A similarity integration method based information retrieval and word embedding in bug localization (QRS, 2020)
- Cooba: Cross-project bug localization via adversarial transfer learning (IJCAI, 2020)
- An empirical assessment of machine learning approaches for triaging reports of a Java static analysis tool (ICST, 2019)
- Deeptriage: Exploring the effectiveness of deep learning for bug triaging (CODS-COMAD, 2019)
- Bug report severity level prediction in open source software: A survey and research opportunities (IST, 2019)
- Improving bug localization with word embedding and enhanced convolutional neural networks (IST, 2019)
- Deep transfer bug localization (TSE, 2019)
- DeepLink: Recovering issue-commit links based on deep learning (JSS, 2019)
- Deeplink: A code knowledge graph based deep learning approach for issue-commit link recovery (SANER, 2019)
- Exploring word embedding techniques to improve sentiment analysis of software engineering texts (MSR, 2019)
- How practitioners perceive automated bug report management techniques (TSE, 2018)
- Unsupervised deep bug report summarization (ICPC, 2018)
- DWEN: deep word embedding network for duplicate bug report detection in software repositories (ICSE-COMPAION, 2018)
- Detecting duplicate bug reports with convolutional neural networks (APSEC, 2018)
- Bug localization by learning to rank and represent bug inducing changes (CIKM, 2018)
- An effective approach for routing the bug reports to the right fixers (APSI, 2018)
- Automatic approval prediction for software enhancement requests (Automated Software Engineering, 2018)
- Bridging semantic gaps between natural languages and APIs with word embedding (TSE, 2018)
- Superneurons:Dynamic GPU memory management for training deep neural networks (ACM SIGPLAN Not, 2018)
- Machine learning-based prototyping of graphical user interfaces for mobile apps (TSE, 2018)
- Improving automatic source code summarization via deep reinforcement learning (ASE, 2018)
- Towards accurate duplicate bug retrieval using deep learning techniques (ICSME, 2017)
- Parallel implementation of a bug report assignment recommender using deep learning (ICANN, 2017)
- Applying deep learning based automatic bug triager to industrial projects (ESEC, 2017)
- Learning to predict severity of software vulnerability using only vulnerability description (ICSME, 2017)
- Improving bug localization with an enhanced convolutional neural network (APSEC, 2017)
- Bug localization with combination of deep learning and information retrieval (ICPC, 2017)
- Applying deep learning based automatic bug triager to industrial projects (ESEC, 2017)
- Easy over hard: A case study on deep learning (ESEC, 2017)
- Exploring API embedding for API usages and applications (ICSE, 2017)
- VDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design (MICRO, 2016)
- Combining deep learning with information retrieval to localize buggy files for bug reports (n) (ASE, 2015)
Developer Collaboration
- Using knowledge units of programming languages to recommend reviewers for pull requests: An empirical study (ESE, 2023)
- Competencies for code review (Proceedings of the ACM on Human-Computer Interaction, 2023)
- Code recommendation for open source software developers (WWW, 2023)
- Dual analysis for helping developers to find collaborators based on co-changed files: An empirical study (CAPSE, 2023)
- A collaboration-aware approach to profiling developer expertise with cross-community data (QRS, 2022)
- Dev2vec: Representing domain expertise of developers in an embedding space. (IST, 2022)
- Context- and fairness-aware in-process crowdworker recommendation (TOSEM, 2022)
- Using large-scale heterogeneous graph representation learning for code review recommendations at microsoft (ICSE-SEIP, 2022)
- Modeling review history for reviewer recommendation: A hypergraph approach (ICSE, 2022)
- Recommending good first issues in github oss projects (ICSE, 2022)
- Supporting the task-driven skill identification in open source project issue tracking systems (SEN, 2022)
- Coopfinder: Finding collaborators based on co–changed files (VL/HCC, 2022)
- Mining the technical roles of github users (IST, 2021)
- Context-aware personalized crowdtesting task recommendation (TSE, 2021)
- Recommending participants for collaborative merge sessions (TSE, 2021)
- Representation of developer expertise in open source software (ICSE, 2020)
- Discovering software developer’s coding expertise through deep learning (IET SOFTWARE, 2020)
- Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations (Automated Software Engineering, 2020)
- Does reviewer recommendation help developers (TSE, 2020)
- Best answerers prediction with topic based gat in q&a sites (INTERNETWARE, 2020)
- World of code: An infrastructure for mining the universe of open source vcs data (MSR, 2019)
- Developer recommendation for topcoder through a meta-learning based policy model (ESE, 2019)
- Cross-domain developer recommendation algorithm based on feature matching (CSC, 2019)
- Towards a theory of software development expertise. (ESEC, 2018)
- Towards quantifying the development value of code contributions (ESEC, 2018)
- Profiling developer expertise across software communities with heterogeneous information network analysis (INTERNETWARE, 2018)
- Personalized teammate recommendation for crowdsourced software developers (ASE, 2018)
- I know what you coded last summer: Mining candidate expertise from github repositories (CSCW, 2017)
- Github and stack overflow: Analyzing developer interests across multiple social collaborative platforms (Social Informatics, 2017)
- Recommending crowdsourced software developers in consideration of skill improvement (ASE, 2017)
- Who should comment on this pull request? analyzing attributes for more accurate commenter recommendation in pull-based development (IST, 2017)
- Cpdscorer: Modeling and evaluating developer programming ability across software communities (SEKE, 2016)
- leveraging expertise and authority for pull-request reviewer recommendation in github (CSI-SE, 2016)
- Automatically recommending peer reviewers in modern code review (TSE, 2016)
- Automatically recommending code reviewers based on their expertise: An empirical comparison (ASE, 2016)
- From developer networks to verified communities: a fine-grained approach (ICSE, 2015)
- Tbil: A tagging-based approach to identity linkage across software communities (APSEC, 2015)
- Distributed representations of sentences and documents (ICML, 2014)
- Degree-of-knowledge: Modeling a developer’s knowledge of code (TOSEM, 2014)
- Hydra: Large-scale social identity linkage via heterogeneous behavior modeling (SIGMOD, 2O14)
- Mining software repositories for accurate authorship (ICSM, 2013)
- Discovery of technical expertise from open source code repositories (WWW, 2013)
- Who’s who in gnome: Using lsa to merge software repository identities (ICSM, 2012)
- Who is going to mentor newcomers in open source projects? (FSE, 2012)
- Recommending people in developers’ collaboration network (WCRE, 2011)
- Developer fluency: Achieving true mastery in software projects (FSE, 2010)
- Expert recommendation with usage expertise (ICSM, 2009)
- Who should fix this bug? (ICSE, 2006)
- Expertise browser: A quantitative approach to identifying expertise (ICSE, 2002)
Technical Debt
Contributors
Page Traffic Analysis
Template by Cheng Wen.