Research Statement            
         
      My research  interests lie in the tools and techniques aspects of programming languages --  driven by what can be discovered about a software project by studying its  change history.  
      Dissertation Summary
      Source code revision repositories  hold a wealth of information that is not only useful for managing and building  source code, but also as a detailed log of how, exactly, the source code has  evolved during development.  If a piece  of the source code is refactored, evidence of this will be in the  repository.  The code describing how to  use the software pre- and post-refactoring will exist in the repository.  As bugs are fixed in the code, the fixes are  added to the repository alongside the buggy code.  As the code evolves and new rules develop  detailing how to use internal functions, they are implicitly documented in the  source code, no matter if they are ever formally documented.  My hypothesis was that by examining the  changes made to the source code over time, interesting properties that describe  the source code could be discovered. 
        
      Future Work
        
        I would like to take the ideas  used in the function relation miner and apply them to an object oriented  language like Java.  In applying this to  an object oriented language, you gain a bit more knowledge of the context of  the method call by investigating the class and instance on which it is called  and which it is called from.  
                  Another  interesting aspect of study is to determine how having access to this mined  data affects how developers write code.   I plan to investigate how mining source code change histories could  benefit students as they implement a class project.  This can be done in a number of ways.  The mined data could be provided to the  students in real time.  This could  provide to the students a list of areas in the code that have frequently been  changed, objects or classes that are especially volatile, or a list of code  snippets that the student has often added to the code. 
                   To make  data available from my tools readily available to the developers in near real  time would require infrastructure to trigger my analyses when each source code  change is made.  For inexpensive  analyses, it may be useful to present their results to the developer before the  change is accepted into the repository.   This way, if a change is found to break a previously identified pattern  the developer may be given a chance to fix it before finalizing the  commit.  Additionally, the developer  could inform the system that the violated pattern is not a correct  pattern.  Based on this feedback, the  tool may reduce the ranking of such warnings or mark the pattern as incorrect. 
                 Instructors  can get a good feel of which parts of the projects students are having trouble  with by the questions they ask and the parts of their projects that fail.  However, they may not know what problem the  student struggled with but solved after six hours of programming one evening.  I would like to explore providing real time  feed back to the instructor denoting which methods, objects or APIs are being  changed most frequently by the students.   The interesting question is does this match the instructor's  expectation?  Does this reflect the  relative amount of time spend on a particular aspect in class?       
       
     |