Grime-Injector

Modeling Modular Grime Growth



Grime-Injector

In order to model grime growth, we take a clean Java project and then create a modified copy for each of the different types of modular grime defined by Schanz and Izurieta [Object Oriented Design Pattern Decay: A Taxonomy]. The details of this modification process are described in the following sections, but at a high level we model modular grime by creating couplings between classes that represent that grime type. The process of injecting these couplings and how these couplings differ for each type of modular grime are discussed in the Injection section.

A brief video walk through of the Grime-Injector in action can be viewed here. If you are interested in using the Grime-Injector for your own research purposes, I welcome an email at melissa.r.dale@gmail.com. I am happy to help you get it up and running for yourself. I am not maintaining it, and there are things I can change so that it will work on machines that are not my own.

Construction

Javassist

Javassist is used to modify Java programs. It is a class library that allows a developer to edit bytecodes in Java. Using Javassist, we developed a java injector program to modify a given class file’s bytecode. Javassist files need modification before they can be analyzed, we describe those modifications and then describe how the grime injector manipulates class files to represent grime growth.

When a program is written in Java it is saved to a .java file. When that code is compiled, it is compiled to bytecode for the Java virtual machine (JVM) to execute. This bytecode is saved in a class file (.class) that is executed by the JVM.

To edit a specific class, Javassist examines the JVM path to locate the bytecode of that class. Once it finds the bytecode, the Javassist API can be used to modify the class. For example if you wanted to edit a class named HelloWorld.java, you can use the get() method API of Javassist to locate HelloWorld.class. Once Javassist has a reference to the class file, it is possible to modify the bytecode, including changing existing methods or adding new methods and variables.

The modified bytecode class file can be decompiled back to a .java file. JAD (Given website is no longer maintained, new web page can be found here) is a freeware Java decompiler which takes class files and decompiles them back to java files, which can be analyzed using tools such as SonarQube. JAD is discussed further in the JAD subsection.

Figure given below is a diagram of the process described above. We start with a HelloWorld.java source file, and compile it to bytecode (.class), which if executed by the JVM would print “Hello World” to the terminal. However, if we modify the file HelloWorld.class with the injector, we can produce modified bytecode that can be executed on the JVM and would now print “Hello Universe” to the terminal. To analyze the equivalent source (.java) file of a modified bytecode file, we must run it through a decompiler to produce the modHelloWorld.java file.

Java compile and decompile process


Input

To model grime growth, the user of the injector must provide the following information. These items may be specified through the injector GUI.

Pattern Class Names and Non Pattern Class Names

The injector uses an arraylist of strings that describe the pattern class names and an arraylist of strings that describe non-pattern classes. Once the string arrays are passed to the injector, Javassist uses the names of these classes to select the corresponding bytecode and create an arraylist of pattern class bytecode files and an arraylist of non-pattern class bytecode files.

Number of Grime Instances

The injector uses an array of integers to specify the number of grime instances to be injected. The array has size six, where each indexed value represents a different type of modular grime (modular grime types are defined in the BACKGROUND section). For example, if the user wants 10 instances of each type of modular grime, then they would pass in an array of 10s [10,10,10,10,10,10]. Values are given in alphabetical order, so if a user wanted to only model 10 instances of PEEG grime type, they would pass in an array with only one 10 in the third index and the rest 0’s [0,0,10,0,0,0]. Using the GUI, the user can explicitly state the numbers of each grime type (or a number for each). The GUI will then pass the appropriate array to the injector.

Number of Runs (Repeats)

This is an optional parameter integer that specifies the number of times to repeat the injections. This is useful when running experiments and multiple sets of modified projects need to be obtained, such as for running statistical analysis to determine means or determining statistical differences. The default value of this parameter is 1.

Number of Versions (Iterations)

The version option is intended to represent the growth of grime over iterations of software. The injection begins by performing the expected number of injections and outputting the injected bytecode into the appropriate directory (the directory structure is explained below). Before exiting the program, the injector will feed the outputted bytecode back into the injection process and inject over the previously injected code thus compounding the grime. It continues this process for the number of specified iterations before moving onto the next run. If no number of versions is specified, the default value is 1.


Initialization

The injector performs a series of initialization steps. First an integer variable is injected into every class file. This variable in injected so that when performing temporary grime injections, the program can inject a variable that is guaranteed to exist.

Because the grime injector cannot at this time handle classes with non-empty constructors, the injector catches the exception that arises when attempting to inject a persistent grime type and it will add an empty constructor to the class. This works because Java allows constructors to be overloaded. For example, a java class with a constructor like: Foo(int bar) would throw an exception if Javassist attempted to initialize an instance of that class because it does not have the required parameters to initialize it. To avoid this exception, another constructor may be added to class Foo so that it may be initialized by simply calling Foo().

Six copies of the pattern and non-pattern initialized bytecode arrays are made, one for each modular grime type. These six identical copies serve as the clean foundations for the modular grime to be modelled. The injector has been designed such that it will be possible in the future to have the option to overlay all the different types of grime on top of each other in a program.


Injection

This processes makes use of the grime taxonomy described in the background section; coupling strength (temporary or persistent), the scope of the grime (internal or external), and the direction of the grime (efferent or afferent). All the types of grime are injected with the same method: couple (class to, class from, char strength).

The strength of the grime is handled through a char variable in the couple method. If a “t” or “T” is passed in, the coupling is temporary and a local variable of the “from” class type will be injected into the “to” class, creating a temporary coupling. If a “p” or “P” is passed in, the coupling is persistent and an attribute of type “from” class will be injected into the “to” class. The figure below depicts the strength relationship between the ‘from’ and ‘to’ class.

Strength of Coupling

The scope and direction can both be handled with the “to” and “from” classes in the couple method. The coupling is performed by taking an instance of the “from” class and injecting it into the “to” class file. This coupling will either be created by using an attribute of type “from” class or a local variable of the “from” class depending on the strength defined in the couple method (as described above).

If the scope is internal, the origin and the destination are irrelevant because both are in the pattern itself. If the direction is afferent, a pattern class is randomly chosen and injected as a “from” class into a randomly selected “to” class from the non-pattern arraylist. If the direction is efferent, the “to” class is randomly selected from the pattern class array and injected (depending on the strength defined in the couple method) into a class randomly selected from the non-pattern class array. The figure below displays the scope and direction relationships for each strength type.

Scope and Direction

Overview of Injection Process

The Figure below depicts the coupling process for couple(to, from, strength) for each grime type. For each instance of grime, the couple method:

Couple Method Overview

  1. Randomly selects a “to” class (from the pattern-class array if direction is internal or afferent, otherwise from the not-pattern-class array if direction is efferent).
  2. Randomly selects a “from” class (from the not pattern class array if direction is afferent, otherwise from the pattern class array if direction is efferent or if the scope is internal).
  3. If strength is persistent, an attribute of type “from” class will be inserted into the “to” class. Else if the strength is temporary, a local variable of the “from” class will be inserted into the “to” class.


Output

The grime injector uses a graphical user interface (GUI) to allow the user to specify the desired details of modeling grime growth. The user enters the pattern and non-pattern class names, and the GUI will confirm if it is able to discover the requested classes by displaying the class names in green if it was able to find them and in red if it was unable to locate them. The user can then specify the specific numbers representing each type of grime, or give one number for each grime type. Lastly, the user specifies the number of runs and versions. If these fields are left blank, the default values are set to 1.

Once the user has specified all parameters, they simply click the “Inject” button, and the injector launches. The bytecode is modified and outputted in accordance to the methodology described above. Once the bytecode has been manipulated, the JAD script is automatically launched to decompile the modified bytecode to produce .java files. A Results folder is now in the top level directory of the grime injector and is ready to be used to for analysis. The “Results” directory contains several layers of subdirectories based on the variables passed into the injector.

The first level of subdirectories is the run directories. Each time the injection is repeated (specified by the parameter number of runs) a separate directory is created for the results of each run. Within each run directory, there are versions subdirectories (if more than one version is specified). Lastly, each array of the project’s modified bytecode is written to the appropriate grime type directory, where it is ready to be decompiled by JAD. For each manipulated project, a sonar-properties properties file is generated so that SonarQube may be launched against all the results with a script (a full explanation of this process is given in the SonarQube subsection of the Methodologies section). A diagram of the described directory hierarchy is displayed in the figure below. Note a SonarQube properties file is also automatically generated for each modified project. This allows us to run a SonarQube script to collect all technical debt scores.

Output


Example

Let’s say a user wishes to model the growth of TEAG on a program modeled on the science fiction television series Star Trek. The user plans to model grime growth over 3 version releases and then run SonarQube against the modified projects to see if the technical debt score reported increases after the injection of 5 TEAG grime instances on each version.

The user wants to repeat this experiment 5 times to obtain an average technical debt score. Repeating the injection process 5 times will result in 5 modified projects. Each modified project starts from the same clean foundation and will have the same number of grime instances injected into it, but because the “to” and “from” classes are randomly selected for each grime instance, there may be variability between each of the 5 modified projects.

First the user places a copy of the StarTrek program into the injector’s “analyze_this” package in Eclipse, and then runs GUI.java to specify the details of their desired grime growth model.

The first step is setting up the array of pattern classes and array of non-pattern classes. The user successfully enters Kirk and Romulan (the injector is able to locate Kirk.java and Romulan.java as indicated by the green font), but when the user attempts to enter Klingon as a non-pattern class, the GUI echoes Klingon in a red font, which indicates it is not able to locate Klingon.java and will ignore this entry. Next the user specifies the number of TEAG instances to be injected (per version) while leaving the rest of the fields as blank, indicating they should be 0, and enters 5 into the runs field and 3 into the versions field.

Once the fields are entered, the user clicks the “Inject” button and the grime injector takes over. For simplicity, we exemplify the process using only one pattern class (Kirk.java) and one non pattern class (Romulan.java). The injector will first load the Kirk.class file into the pattern class array and the Romulan.class file into the non pattern class array.

Next, the injector will perform the initialization steps described in the Initialization subsection. Only one copy is created because the user has specified they are only interested in investigating TEAG. If the user had desired to investigate all types of modular grime, 6 copies would have been created.

For each instance of TEAG we intend to model, a pattern class is randomly chosen and a non pattern class is randomly chosen by the injector. In this case, the user has stated there is 5 instances of TEAG modeled. Because the strength of TEAG is temporary, and there is only one pattern class (Kirk.class) and one non-pattern class (Romulan.class), the injector will use the local variable of Romulan class that was created in the initialization steps and inject it into the Kirk class. This action will be performed 5 times – one time for each TEAG instance specified by the user. To keep collisions from occurring, the injected variable is given the name v#grimed#, with the first # representing the current version number and the second # representing the grime instance number.

After the first round of injections, the following variables are injected: v1grimed1, v1grimed2, v1grimed3, v1grimed4, and v1grimed5. Once injection for this version is complete and written to the Version1 directory of the Results directory, the modified bytecode is inserted into the injector again, and 5 new instances of TEAG couplings are injected overtop of the previously injected code.

Table 3 shows the all the variables created during this process for a single run. Each run will produce the same variable names for each version because each run starts from the clean foundation and there is no danger of collisions between variables of the same name.

Version Table

Now that all the instances for each version has been injected, the injector reverts back to the original unmodified bytecode and performs all the above steps again for the next run. This will happen 5 times in this example, as the user specified this injection process to repeat 5 times.

To perform analysis on the modified bytecode, the user will open the Results folder and see the following hierarchy:

Output