Research Statement
My research interests lie in the tools and techniques aspects of programming languages -- driven by what can be discovered about a software project by studying its change history.
Dissertation Summary
Source code revision repositories hold a wealth of information that is not only useful for managing and building source code, but also as a detailed log of how, exactly, the source code has evolved during development. If a piece of the source code is refactored, evidence of this will be in the repository. The code describing how to use the software pre- and post-refactoring will exist in the repository. As bugs are fixed in the code, the fixes are added to the repository alongside the buggy code. As the code evolves and new rules develop detailing how to use internal functions, they are implicitly documented in the source code, no matter if they are ever formally documented. My hypothesis was that by examining the changes made to the source code over time, interesting properties that describe the source code could be discovered.
Future Work
I would like to take the ideas used in the function relation miner and apply them to an object oriented language like Java. In applying this to an object oriented language, you gain a bit more knowledge of the context of the method call by investigating the class and instance on which it is called and which it is called from.
Another interesting aspect of study is to determine how having access to this mined data affects how developers write code. I plan to investigate how mining source code change histories could benefit students as they implement a class project. This can be done in a number of ways. The mined data could be provided to the students in real time. This could provide to the students a list of areas in the code that have frequently been changed, objects or classes that are especially volatile, or a list of code snippets that the student has often added to the code.
To make data available from my tools readily available to the developers in near real time would require infrastructure to trigger my analyses when each source code change is made. For inexpensive analyses, it may be useful to present their results to the developer before the change is accepted into the repository. This way, if a change is found to break a previously identified pattern the developer may be given a chance to fix it before finalizing the commit. Additionally, the developer could inform the system that the violated pattern is not a correct pattern. Based on this feedback, the tool may reduce the ranking of such warnings or mark the pattern as incorrect.
Instructors can get a good feel of which parts of the projects students are having trouble with by the questions they ask and the parts of their projects that fail. However, they may not know what problem the student struggled with but solved after six hours of programming one evening. I would like to explore providing real time feed back to the instructor denoting which methods, objects or APIs are being changed most frequently by the students. The interesting question is does this match the instructor's expectation? Does this reflect the relative amount of time spend on a particular aspect in class?
|