Chapter 27

Refactoring

In the second half of this book, as you have worked with longer pieces of code, there are a few things that you should have realized about working on bigger programs. For example, the first time that you write a piece of code, you probably do not do it in an optimal way. Also, when you first lay out how to put something together, you likely miss details that have to be added on. What is more, sticking pieces onto code in a haphazard way often leads to really brittle code that is hard to work with. If you were writing software in a professional environment, there would be another rule that could be added on here, the customer often does not really know what he/she wants.

When you put all of these together, it leads to the general rule that code needs to be dynamic and adaptable. You have to be willing and able to modify how things are done over time. It is also ideal for this to be done in a good way so that the results do not just pile mistaken approach upon mistaken approach in a crudely rigged solution to a final goal.

In this chapter, we will look at one of the primary tools in the toolkit of the software developer for effectively managing code over time as old approaches are replaced by newer ones and as features get piled up. This tool is called refactoring and it is the alternation of existing code without changing functionality. At first the idea of changing code without altering the functionality might seem odd, but there are good reasons for why it should be done.

27.1 Smells

Not all code is created equal. While there are effectively an infinite number of ways to build correct solutions to a problem, you have hopefully learned that many solutions that are technically correct, are less than ideal. In some situations this is obvious. In others, it depends on the application or even who you ask. Code is very much an art and, as such, beauty is often in the eye of the beholder. Even when code is not quite ideal, that does not mean it is worth the effort to rewrite to a better form. The technical term for code that has a property that makes it less than ideal is that it smells.

The term smell carries significant imagery. Smells can be strong or mild. Even if something smells bad, if it is really mild then you probably will not go to any lengths to correct it unless there is a good reason to. In addition, not all programmers will be equally sensitive to different smells. In his book Refactoring: Improving the Design of Existing Code, Martin Fowler, along with Kent Beck, list 22 different smells that can afflict code [5]. We will list them all here with brief explanations along with possible examples of where they have potentially popped up in code we have written.

Alternative Classes with Different Interfaces - When you have methods in different classes that do the same thing, but call it different names. This makes code very hard to work with. You should pick a name that fits and rename everything to match that.

Comments - Some comments are definitely good. Too many, especially when they do not really add extra information, can be smelly. Comments that are out of date with the code are also horribly smelly. Programmers will read comments to help them figure out what is happening in the code. If the comments have not been updated with the code, this can lead to significant confusion and waste a lot of time. You are better off with no comments than bad comments.

Data Class - These are classes that do nothing but store data and have no real functionality. Littering your code with individual files that contain these makes it very hard for people to navigate the code. You might consider having these nested inside of objects that have a meaningful relationship to the case class.

Data Clumps - Pieces of data that are almost always found together and should be united into a class. A simple example of this is information for a data point. Instead of keeping x, y, and z separate and passing them as three separate arguments into many different methods, use a class that can store the three values. One benefit of this is that calling code explicitly knows the data is a point, not three unrelated Doubles.

Divergent Change - When you change the same piece of code (often a class) at different times for different reasons. This can imply that code is taking on overly broad responsibilities in the code. Try to break it into smaller pieces that are more focused in their tasks.

Duplicated Code - The name says it all here. We have been talking about ways to avoid this smell since very early in the book. The reader should also by this point be well aware of the pitfalls of duplicate code, such as duplication of errors and the need to modify all copies for certain changes. Multiple approaches have been introduced to reduce code duplication, from function to collections and loops, then finally to abstraction and polymorphism, you should strive to write any particular piece of logic as few times as possible.

Feature Envy - This is when a method in one class makes lots of calls to another class, especially if it does so just to get data for doing a calculation. This likely implies that the method is in the wrong class and should be moved closer to the data it is using.

Inappropriate Intimacy - An object is supposed to encapsulate an idea. If you have two classes that are constantly referring to elements of one another, it implies that things have not been broken up appropriately. Consider moving things around so that each class stands more on its own.

Incomplete Library Class - This smell occurs when a library that you are using lacks functionality that you need. The significance of it being in a library is that you typically can not modify library code. In Scala, you can often get around this using implicit conversions. This is a more advanced technique that is discussed in appendix B.

Large Class - Classes that get too large become hard to manage. You want to have the functionality of any given class be specific to a particular task. If a class gets too large it is an indication that it should probably be broken into parts.

Lazy Class - If you have a class that does not do much of anything it is lazy and you should consider getting rid of it. At the very least, put classes like this inside of an appropriate other class or object so they do not take up their own file.

Long Method - As was discussed in the first half of this book, long functions are error prone. The same goes for methods. If a method starts to get too long it should be broken up into pieces.

Long Parameter List - Methods that have extremely long parameter lists are hard to use.1 Often, long parameter lists go along with the "Data Clumps" smell. The parameter list can be shortened by passing in a smaller number of objects containing the needed values.

Message Chains - This smell occurs when one class goes through many objects to get to what it needs. For example, val stu = school.teacher.course.student. The problem with this is that whatever class you write this in new depends on the exact name and general contract for the methods teacher, course, and student in three different classes. If any of those things change, this code will break. That is bad.

Middle Man - This smell arises when too many methods in one class do nothing but forward calls to another class. The one doing the forwarding is the "middle man." If this happens too much, and is not providing other benefits, you should consider going straight to the object that does the work.

Parallel Inheritance Hierarchies - This is a special case of "Shutgun Surgery." In this case, you have two inheritance hierarchies that mirror one another. When a class is added to one, it must be added to the other as well.

Primitive Obsession - In Java and many other languages, types like Int, Double, Char, and Boolean are called primitive types and they are treated differently than objects.2 Scala does not maintain this distinction, but this smell can still apply. Code has this smell when you use basic or built-in types too much instead of creating your own types. Before you learned about case classes, all of your data grouping was done with tuples. This made for smelly code because references to _1 and _2 tell you nothing about the meaning of the value. Similarly, an (Int, Int) could be a month/year combination or coordinates on a planar grid. Using types specific to the task not only makes code easier to read, it allows the type checker to help you determine when you have made an error by producing a syntax error on a type mismatch.

Refused Bequest - This smell occurs when a class inherits from another class or trait, but does not "want" some of the methods in that supertype. This is typically a sign of a poor design choice. The motivation behind inheritance should be to become a subtype of the supertype. That means that any method calls that could be made on an instance of the supertype should work on the subtype. If this is not the case, then inheritance is probably the wrong approach.

Shotgun Surgery - This is when a particular change in your code has a tendency to force you to make edits in several different places. This is smelly because it is too easy to forget one of those edits, which leads to bugs. Adding a new Drawable has something of a shotgun surgery feel to it. We could fix this by moving the makeDrawable method from the DrawTransform object into the Drawing object next to the makeOptionsFor method. Then adding a new drawable would only require adding code to those two methods that are adjacent to one another instead of altering code in two separate files.

Speculative Generality - We often want to have a lot of flexibility in our code, but code that is far more general than the usage can be artificially hard to understand, use, and maintain. This happens when a code writer expects to need certain flexibility that the final usage does not take advantage of.

Switch Statements - Scala does not have switch statements. The closest approximation is the match, which is far more powerful than a normal switch statement. This smell can still occur in Scala code if matches are used in particular ways. This smell occurs when you have multiple match statements/expressions that have exactly the same cases and have whenever a case has to be added to one, it must be added to all. This turns the adding of a case into a shotgun surgery. Generally, this should be dealt with making a trait that has one method for each of the different match occurrences and a different concrete subtype for each case. Each match in the code then becomes a call to the appropriate method of an object that keeps track of the state.

Temporary Field - This is when you have a field of a class that is only needed at certain times. This makes code hard to understand because the field will have a value at other times, but that value might not really have any meaning for the object.

The goal of refactoring, and what you should be thinking about when considering smells in code, is that you want to keep your code readable and maintainable as well as keeping it correct and sufficiently optimized for your needs. Some of these smells almost compete with one another, pushing you in opposite directions. Before you make changes, consider the motivation. Will the change improve the code and make it easier to work with? Would the side-effects of a cure for a particular smell be worse than the original smell? Just because you see something in your code that seems like it matches one of these smells does not mean the code is really smelly. That judgment requires really thinking about how the code behaves and how the programmer interacts with it.

27.2 Refactorings

Fowler's book on refactoring lists 72 different refactoring methods. Only a few will be discussed here. Not only should interested readers turn to the original book for a full description, many of the refactoring methods presented in the book involve concepts that are beyond the scope of this book or what the reader is expected to understand at this point in his/her Computer Science education.

27.2.1 Built-in Refactoring Methods

Many of the refactoring methods are algorithmic in nature. For this reason, Integrated Development Environments (IDEs), such as Eclipse, have a number of them written in. Using built-in refactoring tools can make things go much faster. They are also set up so that they have no impact on the behavior of the code. They change the structure without changing the functionality.

As of this writing, the following options are available under the refactoring menu when you right click on code in Eclipse under Scala.3

  • Rename - This can be applied to any programmer given name such as variable names, method names, or class names. Just as comments that no longer fit the code are smelly, using names in your programs that are hard to understand or, worst yet, are misleading, should have you detecting a rank stench. Of course, you can change the name of something on your own, but that means finding all the places it is used and changing them as well. For something like the name of a public method in a large project, this can mean tracking down invocations in many different files. This refactoring option does all that for you. So if you find that a name you chose early in the development of a project no longer fits, select this option and change it to something that does.
  • Inline Local - This will remove a local val and change all references to it to the value it is initialized to. This should only be done with immutable values as this operation can change behavior if the object is mutated in some of the usages. You would use this if you put something in a variable expecting it would be long or used many times and then come to find out neither is true.
  • Organize Imports - This very handy option, which has the keystroke Ctrl-Alt-O, will automatically add imports for any types that are lacking them. It also organizes the import statements at the top of the file. It should be noted that this uses Java-style imports that go at the top of the file and have one class per line. If you need an import to be local, you should add that yourself. The fact that it uses a longer format is not that important as you do not have to type it.
  • Extract Local - This is the inverse of Inline Local. It can be used if there is an expression in a method that either appears multiple times, or which would make more sense if a name were associated with it. This is sometimes also called Introduce Explaining Variable.
  • Extract Method - This remarkably convenient refactoring should be used to break up large methods or to extract code into a method when you find that you need to use it more than you originally believed. Simply highlight the code you want in the method and select this option. It will figure out the parameters that are needed and you just pick a name.

The built-in refactoring tools in IDEs like Eclipse can make certain types of changes remarkably easy. The most important aspect of this is that they reduce the barrier so that you really have no excuse to not use them. If you leave a poorly named variable or method in your code, or if you decide to live with a method that is 200 lines long, you are simply being lazy.

27.2.2 Introduce Null Object

Not all refactoring methods are quite so easy to perform. Some will require a bit of effort on your part, but when they are called for, they are definitely worth it. To see this, we can consider one of the exercises from chapter 25 where you were asked to write a doubly-linked list without using a sentinel. If you actually went through that exercise you found that the code wound up being a nest of conditionals set up to deal with the boundary cases where a prev or next reference was null. This type of special case coding to deal with null references is fairly common. So common, in fact, that there is a refactoring method specifically for dealing with it called, "Introduce Null Object." As the name implies, the goal is to introduce an object that is used in place of null references. This object should have methods on it that are basically stubs with limited functionality. You want to do this if you find that your code has many lines that look something like this.

if(ref==null) ...

The sentinel played the role of a Null Object in the doubly-linked list. An even better example is Nil for the immutable singly-linked list. You could theoretically make an immutable singly-linked list without making a Nil object. Instead, the lists would be terminated with null references as they were for our mutable singly-linked list. Such code tends to be far messier to write and to work with, making it harder to debug and to maintain. We will see another usage of a Null Object in chapter 29 when we make our immutable binary search tree.

27.2.3 Add and Remove Parameter

The code we wrote back in chapter 25 used the Null Objects initially, we did not have to refactor for it. Describing such a refactoring would be challenging. To make it more clear how this might work, consider the "Add Parameter" refactoring. The idea here is that you have written a method that takes a certain list of parameters. As an example, consider a method that logs employee pay. This might be part of some larger application used by a company. The current version looks like this.

def logHours(employeeID:ID, hoursWorked:Int, wagesPaid:Int) = {
 // Stuff to do logging
}

It works and everything is fine until it is determined that the logs need to include additional information to comply with some new government regulation. For example, maybe this is a large, multinational corporation and they have to also keep records of what location the employee was working at for those hours. Clearly this is going to require changes to thelogHours method. The idea of refactoring is that you break those changes into two parts. The first part is to perform the Add Parameter refactoring. That gives you this code.

def logHours(employeeID:ID, hoursWorked:Int, wagesPaid:Int,
  location:OfficeLocation) = {
 // Same stuff as before
} 

Note that the code in the method is unchanged. Adding this parameter forces changes at the call sites, but it does not actually change the behavior of the code. You should make this change, get the code to compile again, then run all existing tests and verify that things work. Once you have confirmed that the code is still functional, you can proceed to the second step where you actually alter the functionality of the method.

def logHours(employeeID:ID, hoursWorked:Int, wagesPaid:Int,
  location:OfficeLocation) = {
 // Modified stuff to include the location.
}

The whole goal of refactoring is to formalize your approach to change so that when code needs to change, you are more willing to do it and more confident that it will work. Changing existing code can be a very scary thing to do. Often students will wind up with large segments of their code commented out, but not deleted, because they are afraid to really commit to changes. Combining the techniques of refactoring and automated unit testing can help to give you the courage to make needed changes. Code should be dynamic and flexible. That only happens if programmers are willing to change things.

The counterpart to Add Parameter is "Remove Parameter". The motivation for this refactoring would be that alterations to a method or the code around it have made it so that parameters, which at one point provided needed information, are no longer being used. Perhaps alterations in the system have made it so that the value of the wages paid should be calculated using the employee information and how many hours were worked instead of being passed in. In that case, you might have code that still has wagesPaid as a parameter, but that parameter is never used. That is smelly as it is not only overhead for the code, it is likely to confuse people looking at it. Such a parameter should be removed to make the code easier to understand and work with.

27.2.4 Cures for Switch Statements

As was mentioned above, Scala does not have a switch statement in the traditional sense. However, the smell that is generally associated with switch statements can be produced in any conditional execution including not only match, but also if. Running with the example from above, consider that we need to calculate a number of different values associated with employees and the nature of the calculation depends on the type of employee under consideration. This could be done with code that looks something like this.

val monthlyPay = employeeType match {
 case Wage => ...
 case Salary => ...
 case Temp => ...
}

This same pattern might be repeated in different parts of the code for calculating benefits and leave time as well. The values Wage, Salary, and Temp could be vals defined in some object, or objects themselves. They could also be enums, a topic discussed in appendix B. This could also appear in code using if expressions.

val monthlyPay = if(employeeType == Wage) ...
 else if(employeeType == Salary) ...
 else ...
}

The key to this being a smell is that the pattern above is repeated in multiple locations. If a new category or employee were added, all of those would need to be modified. Missing one or more occurrences will lead to difficult to track errors.

There are two refactoring methods that can be employed to remove this smell. They are "Replace Conditional with Polymorphism" and "Replace Type Code with Subclasses". This would appear in the code as an abstract supertype that has methods for each of the different types of actions/calculations that are required. That might look like the following.

trait EmployeeType {
 def monthlyPay(hours:Int, wage:Int):Int
 def logBenefits(employee:Employee, hours:Int):Unit
 def leaveTime(hours:Int):Double
}

Using this, the match or if constructs shown above would be replaced by this simple line.

val monthlyPay = employeeType.monthlyPay(hours,wage)

This works because the employeeType will be a reference to an appropriate subtype of EmployeeType. That subtype will have the appropriate code in it for that type. The code might look like the following with proper implementations of all the methods. Then the parts of the code that give a value to employeeType would provide the appropriate subtype.

class WageEmployee extends EmployeeType {
 def monthlyPay(hours:Int, wage:Int):Int = ...
 def logBenefits(employee:Employee, hours:Int):Unit = ...
 def leaveTime(hours:Int):Double = ...
}


  
class SalaryEmployee extends EmployeeType {
 def monthlyPay(hours:Int, wage:Int):Int = ...
 def logBenefits(employee:Employee, hours:Int):Unit = ...
 def leaveTime(hours:Int):Double = ...
}


  
class TempEmployee extends EmployeeType {
 def monthlyPay(hours:Int, wage:Int):Int = ...
 def logBenefits(employee:Employee, hours:Int):Unit = ...
 def leaveTime(hours:Int):Double = ...
}

Note that if each of these had no state, fields that differ between different instances, they could be declared as objects instead of classes.

There are two main benefits to this change that improve upon the switch statement smell. The first is that this new structure makes it significantly easier to add new types of employees. If it is decided that any of these categories needs to be resolved into different classifications, that can be done easily. It also localizes code from any given type of employee and makes it much easier to find the code when changes need to be made.

27.2.5 Consolidate Conditional Expression

In the calculation of how much you should pay wage employees, it is likely that you will have to include expressions for normal overtime and perhaps even double overtime. These might look something like hours>=40 && hours<60 and hours>=60. The first problem that you notice with these expressions is that they include "magic numbers" in the form of 40 and 60. As a general rule, you do not want to have these magic numbers spread through your code. The other potential problem is that these expressions do not have obvious meaning and it is possible that the requirements might need to be changed in the future. All of these can be fixed, to one degree or another, by pulling the Boolean expression out into a method like this.

def overtime(hours:Int):Boolean = hours>=40 && hours<60

Then your condition in other parts of the code becomes overtime(hours). This is easy to read, highly informative, and makes it easy to change the nature of the condition. However, if the expression is not used multiple times, this approach is overkill and only adds to the complexity of the code.

27.2.6 Convert Procedural Design to Objects

In part I of this book, the focus was functional decomposition of problems. We did not get into constructing classes or proper object-oriented design. This was for educational purposes. Object-orientation introduces additional concepts that are not essential for solving small problems. This approach where data and functionality are viewed as being independent is called the procedural approach. Bigger programs, however, gain a lot from being built in a proper object-oriented manner. There is also a benefit to consistent design across a project. For this reason, if you have aspects of a large program that contains occasional segments that are more procedural, you should refactor them to use an object-oriented approach. This is done by grouping the functionality with the data that it should operate on and putting the two together into objects. This can be done in an incremental way by extracting methods from long procedures and moving them into the appropriate class, trait, or object.

27.2.7 Encapsulate Collection

Picture a class that has a data member that is a mutable collection. This might be something like an Array or a Buffer. For example, in our company you have a type for a Department that has a collection filled with the Employee type. Now imagine that the class has a method like the following.

 def employees:mutable.Buffer[Employee] = lEmployees

Note that this method is public and gives any outside code access to the local employee collection. This is a risky thing to do with a mutable collection. The reason being that any code can call this method and then start making whatever changes it desires to the employees in that department without the Department class having any control over it. That is not something one would expect to allow in real life and it is best avoided in code as well. Even if it seems like a reasonable thing to do when you write it, even if the collection is not that critical to the working of the class, giving up that much control is likely to cause problems down the line.

The alternative is to return an immutable copy from this method and then write other methods for adding or removing elements as needed. This could be implemented in the simplest terms with the following code.

 def employees:Seq[Employee] = lEmployees.toSeq
 def addEmployee(e:Employee) = lEmployee += e
 def removeEmployee(index:Int) = lEmployee.remove(index,1)

With just this code there is not much of a benefit. However, if at some point you decide that you need to check conditions on new employees or to log when employees are removed, that will be easy to do. Using the original implementation, that would have been nearly impossible.

The general rule of thumb here is that mutable data should not be publicly viewable and any changes to mutable data should go through methods that have a chance to limit access or do something in response to it. While the ability to define special member assignment methods in Scala makes it reasonable to have public var members, care should still be taken in regards to how mutable members are shared with outside code.

27.2.8 Push Down or Pull Up Field or Method

Image for a moment that the drawing program were originally constructed without writing a DrawLeaf class. In that situation, the types such as DrawRectangle and DrawEllipse would include all the methods currently located in DrawLeaf. This would be a significant amount of code duplication. Somewhere between the second and third time that the many methods of TreeNode were all implemented in exactly the same way, we should have started to look for a better way to lay things out. That better way would have been to create and insert the DrawLeaf type between Drawable and all of its subtypes other than DrawTransform. After that class was created, the common method and fields should have been pulled up into that class. This action would have utilized the "Pull Up Field" and "Pull Up Method" refactoring methods.

These methods provide a simple way of dealing with any duplication of code that you find between different types inheriting from a common subtype. If all the subtypes have the same implementation, it is clear that the shared method implementation should be moved up to the common supertype. If only some share those methods, you should consider making a new intermediate type and moving the methods into that type.

There are also times when you want to do the opposite. You originally feel that a method will be implemented the same way for an entire hierarchy, only to realize later that most subtypes wind up overriding that method. This starts to have the smell of a refused bequest. In that situation, you should perform "Push Down Field" or "Push Down Method" to minimize the odor. If there are some common implementations, introduced intermediate types to hold that code. If a large number of the subtypes are going to override a method, you should strongly consider having it be abstract in the highest level type that contains the method. The advantage of this is that because you know many implementations are going to be different, you force anyone who inherits from your type to do something with that method or member. This forces them to think about it and reduces the odds that they will accidentally inherit an inappropriate implementation that leads to challenging to find bugs.

27.2.9 Substitute Algorithm

The last refactoring to be considered in this chapter is "Substitute Algorithm." There are many different reasons why your first choice of an algorithm for some part of a problem might not wind up being ideal. A simple example that the author dealt with was a sort algorithm for a small number of data values. By default, the built-in sort from the libraries was selected. It was easy to use, and it was a well-tuned, general purpose merge sort algorithm. One of the language library writers had certainly worked hard to make it efficient. However, when the full application was completed, performance turned out to be quite poor. Profiling indicated that an inordinate amount of time was being spent in the sort. A simple insertion sort written with the specific comparison required for that code was put in place instead. This switch of algorithm did not alter the final results of the program at all. However, it led to a remarkable improvement in speed.

When you first solve a problem you are likely to use whatever is easiest to write correctly. That is exactly what you should do because most of the time that will work best and be the easiest to maintain. There are situations where your initial guesses about the nature of the data will be incorrect and different approaches turn out to be significantly faster. Optimizing for speed is one of the last things you should do in developing a program and even then it should only be done when you know that it is required. A well-known quote from Donald Knuth is that "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil". [7] When you do come to realize that an early choice of algorithm is slowing things down in an unacceptable way, do not be afraid to refactor the code and put in a different algorithm that is better suited for the specific task at hand.

27.3 End of Chapter Material

27.3.1 Summary of Concepts

  • Smell
    • Something in the code that makes it less than ideal.
    • Sometimes related to a specific piece of code, but often deals with structure and includes many distributed elements.
    • They often build up over time as code is modified.
    • Make code harder to work with and maintain.
  • Refactoring
    • The act of changing the structure of code or how it works without changing the end behavior.
    • Can be done to remove smells.
    • Also done to help change the structure of code to allow new features to be implemented.
    • Many IDEs include certain options for automatic refactoring.

27.3.2 Exercises

  1. Test each of the different built-in refactoring options in Eclipse.
  2. The website for the book has some examples of smelly code. Go through each one, figure out why it is smelly, and refactor it.

27.3.3 Projects

Instead of giving individual descriptions for each of the projects, this chapter has more of a general description. Go through the code you have written and find things that smell. When you find them, refactor them so that they do not smell.

Going forward, there will likely be projects that will force you to change some of the design aspects of your project. When those things come up, make sure that you refactor the code first, then add the new features. Strive to not mix the steps. Do the refactoring and verify that the code works as it did before, then add to it.

1Having default values and named parameters in Scala does minimize this problem a bit.

2Primitive types are built into the hardware of the machine so they execute much more quickly and take up less space than object. When you use an Int in Scala, the compiler will try to compile it to a Java primitive int for better performance.

3Given the rate at which work is being done on Scala and the Eclipse plug-in, it is inevitable that this list will likely be longer for you.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset