Category Archives for Programming Tips

Two is Too Many

There is a key rule that I personally operate by when I’m doing incremental development and design, which I call “two is too many.” It’s how I implement the “be only as generic as you need to be” rule from the Three Flaws of Software Design.

Essentially, I know exactly how generic my code needs to be by noticing that I’m tempted to cut and paste some code, and then instead of cutting and pasting it, designing a generic solution that meets just those two specific needs. I do this as soon as I’m tempted to have two implementations of something.

For example, let’s say I was designing an audio decoder, and at first I only supported WAV files. Then I wanted to add an MP3 parser to the code. There would definitely be common parts to the WAV and MP3 parsing code, and instead of copying and pasting any of it, I would immediately make a superclass or utility library that did only what I needed for those two implementations.

The key aspect of this is that I did it right away—I didn’t allow there to be two competing implementations; I immediately made one generic solution. The next important aspect of this is that I didn’t make it too generic—the solution only supports WAV and MP3 and doesn’t expect other formats in any way.

Another part of this rule is that a developer should ideally never have to modify one part of the code in a similar or identical way to how they just modified a different part of it. They should not have to “remember” to update Class A when they update Class B. They should not have to know that if Constant X changes, you have to update File Y. In other words, it’s not just two implementations that are bad, but also two locations. It isn’t always possible to implement systems this way, but it’s something to strive for.

If you find yourself in a situation where you have to have two locations for something, make sure that the system fails loudly and visibly when they are not “in sync.” Compilation should fail, a test that always gets run should fail, etc. It should be impossible to let them get out of sync.

And of course, the simplest part of this rule is the classic “Don’t Repeat Yourself” principle—don’t have two constants that represent the same exact thing, don’t have two functions that do the same exact thing, etc.

There are likely other ways that this rule applies. The general idea is that when you want to have two implementations of a single concept, you should somehow make that into a single implementation instead.

When refactoring, this rule helps find things that could be improved and gives some guidance on how to go about it. When you see duplicate logic in the system, you should attempt to combine those two locations into one. Then if there is another location, combine that one into the new generic system, and proceed in that manner. That is, if there are many different implementations that need to be combined into one, you can do incremental refactoring by combining two implementations at a time, as long as combining them does actually make the system simpler (easier to understand and maintain). Sometimes you have to figure out the best order in which to combine them to make this most efficient, but if you can’t figure that out, don’t worry about it—just combine two at a time and usually you’ll wind up with a single good solution to all the problems.

It’s also important not to combine things when they shouldn’t be combined. There are times when combining two implementations into one would cause more complexity for the system as a whole or violate the Single Responsibility Principle. For example, if your system’s representation of a Car and a Person have some slightly similar code, don’t solve this “problem” by combining them into a single CarPerson class. That’s not likely to decrease complexity, because a CarPerson is actually two different things and should be represented by two separate classes.

This isn’t a hard and fast law of the universe—it’s a more of a strong guideline that I use for making judgments about design as I develop incrementally. However, it’s quite useful in refactoring a legacy system, developing a new system, and just generally improving code simplicity.

-Max

Immutability, State, and Functions

Let’s start with the obligatory call to authority:

In functional programming, programs are executed by evaluating expressions, in contrast with imperative programming where programs are composed of statements which change global state when executed. Functional programming typically avoids using mutable state.

https://wiki.haskell.org/Functional_programming

Well, that seems pretty definitive. “Functional programming typically avoids mutable state.” Seems pretty clearcut.

But it’s wrong.

Explaining why I thing that will involve a trip down the path I’ve been exploring over the last year or so, as I have tried to crystalize my thinking on the new styles of programming, and the role of transformation as both a top-down and bottom-up coding and design technique.

Let’s start by thinking about state.

Where Does a Program Keep Its State?

Programs run on computers, and at the lowest level their model of computation is tied to that of the machines on which the execute. Down at that low level, the state of a program is the state of the computer—the values in memory and the values in registers.1 Some of those registers are used internally by the processor for housekeeping. Perhaps the most important of these is the program counter (PC). You can think of the PC as a pointer to the next instruction to execute.

We can take this up a level. Here’s a simple program:

"Cat"
|> String.downcase # => "cat"
|> String.codepoints # => [ "c", "a", "t" ]
|> Enum.sort # => [ "a", "c", "t" ]

The |> notation is syntactic sugar for passing the result of a function as the first parameter of the next function. The preceding code is equivalent to

Enum.sort(String.codepoints(String.downcase("Cat")))

Thrilling stuff, eh?

Let’s image we’d just finished executing the first line. What is our state?

Somewhere in memory, there’s a data structure representing the string “Cat”. That’s the first part of our state. The second part is the value of the program counter. Logically, it’s pointing to the start of line 2.

Execute one more line. String.downcase is passed the string “Cat”. The result, another string containing “cat”, is stored in a different place in our computer. The PC now points to the start of line 3.

And so it goes. With each step, the state of the computer changes, meaning that the state of our program changes.

State is not immutable.

Is This Splitting Hairs?

Yes and no.

Yes, because no one would argue that the state of a computer is unchanged during the execution of a program.

No, because people still say that immutable state is a characteristic of functional programming. That’s wrong. Worse, that also leads us to model programming wrongly. And that’s what the rest of this post is about.

What Is Immutable?

Let’s get this out of the way first. In a functional program, values are immutable. Look at the following code.

person = get_user_details("Dave")
debug_dump(person)
do_something_with(person)
debug_dump(person)

Let’s assume that get_user_details returns some structured data, which we dump out to some log file on line two. In a language with immutable values, that data can never be changed. We know that nothing in the function do_something_with can change the data referenced by the person variable, and so the debugging we write on line 4 is guaranteed to be the same as that created on line 2.

If we wanted to change the information for Dave, we’d have to create copy of Dave’s data:

person1 = change_subscription_status(person, :active)

Now we have the variable person bound to the initial value of the Dave person, and person1 references the version with a changed subscription status.

If you’ve been using languages with mutable data, at this point you’ll have intuitively created a mental picture where person and person1 reference different chunks of memory. And you might be thinking that this is remarkably inefficient. But in an immutable world, it needn’t be. Because the runtime knows that the original data will never be changed, it can reuse much of it in person1. In principle, you could have a runtime that represented new values as nothing more that a set of changes to be applied to the original.

Anyway, back to state.

person = get_user_details("Dave")
do_something_with(person)
person1 = change_subscription_status(person, :active)
IO.inspect person1

Let’s represent the state using a tuple containing the pseudo program counter and the values bound to variables.

Line person person1
1
2 value1
3 value1
4 value1 value2

 

How to Handle Code Complexity in a Software Company

Here’s an obvious statement that has some subtle consequences:

Only an individual programmer can resolve code complexity.

That is, resolving code complexity requires the attention of an individual person on that code. They can certainly use appropriate tools to make the task easier, but ultimately it’s the application of human intelligence, attention, and work that simplifies code.

So what? Why does this matter? Well, to be clearer:

Resolving code complexity usually requires detailed work at the level of the individual contributor.

If a manager just says “simplify the code!” and leaves it at that, usually nothing happens, because (a) they’re not being specific enough, (b) they don’t necessarily have the knowledge required about each individual piece of code in order to be that specific, and (c) part of understanding the problem is actually going through the process of solving it, and the manager isn’t the person writing the solution.

The higher a manager’s level in the company, the more true this is. When a CTO, Vice President, or Engineering Director gives an instruction like “improve code quality” but doesn’t get much more specific than that, what tends to happen is that a lot of motion occurs in the company but the codebase doesn’t significantly improve.

It’s very tempting, if you’re a software engineering manager, to propose broad, sweeping solutions to problems that affect large areas. The problem with that approach to code complexity is that the problem is usually composed of many different small projects that require detailed work from individual programmers. So, if you try to handle everything with the same broad solution, that solution won’t fit most of the situations that need to be handled. Your attempt at a broad solution will actually backfire, with software engineers feeling like they did a lot of work but didn’t actually produce a maintainable, simple codebase. (This is a common pattern in software management, and it contributes to the mistaken belief that code complexity is inevitable and nothing can be done about it.)

So what can you do as a manager, if you have a complex codebase and want to resolve it? Well, the trick is to get the data from the individual contributors and then work with them to help them resolve the issues. The sequence goes roughly like this:

  1. Ask each member of your team to write down a list of what frustrates them about the code. The symptoms of code complexity are things like emotional reactions to code, confusions about code, feeling like a piece will break if you touch it, difficulties optimizing, etc. So you want the answers to questions like, “Is there a part of the system that makes you nervous when you modify it?” or “Is there some part of the codebase that frustrates you to work with?”Each individual software engineer should write their own list. I wouldn’t recommend implementing some system for collecting the lists—just have people write down the issues for themselves in whatever way is easiest for them. Give them a few days to write this list; they might think of other things over time.

    The list doesn’t just have to be about your own codebase, but can be about any code that the developer has to work with or use.

    You’re looking for symptoms at this point, not causes. Developers can be as general or as specific as they want, for this list.

  2. Call a meeting with your team and have each person bring their list and a computer that they can use to access the codebase. The ideal size for a team meeting like this is about six or seven people, so you might want to break things down into sub-teams.In this meeting you want to go over the lists and get the name of a specific directory, file, class, method, or block of code to associate with each symptom. Even if somebody says something like, “The whole codebase has no unit tests,” then you might say, “Tell me about a specific time that that affected you,” and use the response to that to narrow down what files it’s most important to write unit tests for right away. You also want to be sure that you’re really getting a description of the problem, which might be something more like “It’s difficult to refactor the codebase because I don’t know if I’m breaking other people’s modules.” Then unit tests might be the solution, but you first want to narrow down specifically where the problem lies, as much as possible. (It’s true that almost all code should be unit tested, but if you don’t have any unit tests, you’ll need to start off with some doable task on the subject.)

    In general, the idea here is that only code can actually be fixed, so you have to know what piece of code is the problem. It might be true that there’s a broad problem, but that problem can be broken down into specific problems with specific pieces of code that are affected, one by one.

  3. Using the information from the meeting, file a bug describing the problem (not the solution, just the problem!) for each directory, file, class, etc. that was named. A bug could be as simple as “FrobberFactory is hard to understand.”If a solution was suggested during the meeting, you can note that in the bug, but the bug itself should primarily be about the problem.
  4. Now it’s time to prioritize. The first thing to do is to look at which issues affect the largest number of developers the most severely. Those are high priority issues. Usually this part of prioritization is done by somebody who has a broad view over developers in the team or company. Often, this is a manager.That said, sometimes issues have an order that they should be resolved in that is not directly related to their severity. For example, Issue X has to be resolved before Issue Y can be resolved, or resolving Issue A would make resolving Issue B easier. This means that Issue A and Issue X should be fixed first even if they’re not as severe as the issues that they block. Often, there’s a chain of issues like this and the trick is to find the issue at the bottom of the stack. Handling this part of prioritization incorrectly is one of the most common and major mistakes in software design. It may seem like a minor detail, but in fact it is critical to the success of efforts to resolve complexity. The essence of good software design in all situations is taking the right actions in the right sequence. Forcing developers to tackle issues out of sequence (without regard for which problems underlie which other problems) will cause code complexity.

    This part of prioritization is a technical task that is usually best done by the technical lead of the team. Sometimes this is a manager, but other times it’s a senior software engineer.

    Sometimes you don’t really know which issue to tackle first until you’re doing development on one piece of code and you discover that it would be easier to fix a different piece of code first. With that said, if you can determine the ordering up front, it’s good to do so. But if you find that you’d have to get into actually figuring out solutions in order to determine the ordering, just skip it for now.

    Whether you do it up front or during development, it’s important that individual programmers do realize when there is an underlying task to tackle before the one they have been assigned. They must be empowered to switch from their current task to the one that actually blocks them. There is a limit to this (for example, rewriting the whole system into another language just to fix one file is not a good use of time) but generally, “finding the issue at the bottom of the stack” is one of the most important tasks a developer has when doing these sorts of cleanups.

  5. Now you assign each bug to an individual contributor. This is a pretty standard managerial process, and while it definitely involves some detailed work and communication, I would imagine that most software engineering managers are already familiar with how to do it.One tricky piece here is that some of the bugs might be about code that isn’t maintained by your team. In that case you’ll have to work appropriately through the organization to get the appropriate team to take responsibility for the issue. It helps to have buy-in from a manager that you have in common with the other team, higher up the chain, here.

    In some organizations, if the other team’s problem is not too complex or detailed, it might also be possible for your team to just make the changes themselves. This is a judgment call that you can make based on what you think is best for overall productivity.

  6. Now that you have all of these bugs filed, you have to figure out when to address them. Generally, the right thing to do is to make sure that developers regularly fix some of the code quality issues that you filed along with their feature work.If your team makes plans for a period of time like a quarter or six weeks, you should include some of the code cleanups in every plan. The best way to do this is to have developers first do cleanups that would make their specific feature work easier, and then have them do that feature work. Usually this doesn’t even slow down their feature work overall. (That is, if this is done correctly, developers can usually accomplish the same amount of feature work in a quarter that they could even if they weren’t also doing code cleanups, providing evidence that the code cleanups are already improving productivity.)

    Don’t stop normal feature development entirely to just work on code quality. Instead, make sure that enough code quality work is being done continuously that the codebase’s quality is always improving overall rather than getting worse over time.

If you do those things, that should get you well on the road to an actually-improving codebase. There’s actually quite a bit to know about this process in general—perhaps enough for another entire book. However, the above plus some common sense and experience should be enough to make major improvements in the quality of your codebase, and perhaps even improve your life as a software engineer or manager, too.

-Max

P.S. If you do find yourself wanting more help on it, I’d be happy to come speak at your company. Just let me know.

The Secret of Fast Programming: Stop Thinking

When I talk to developers about code complexity, they often say that they want to write simple code, but deadline pressure or underlying issues mean that they just don’t have the time or knowledge necessary to both complete the task and refine it to simplicity.

Well, it’s certainly true that putting time pressure on developers tends to lead to them writing complex code. However, deadlines don’t have to lead to complexity. Instead of saying “This deadline prevents me from writing simple code,” one could equally say, “I am not a fast-enough programmer to make this simple.” That is, the faster you are as a programmer, the less your code quality has to be affected by deadlines.

Now, that’s nice to say, but how does one actually become faster? Is it a magic skill that people are born with? Do you become fast by being somehow “smarter” than other people?

No, it’s not magic or in-born at all. In fact, there is just one simple rule that, if followed, will eventually solve the problem entirely:

Any time you find yourself stopping to think, something is wrong.

Perhaps that sounds incredible, but it works remarkably well. Think about it—when you’re sitting in front of your editor but not coding very quickly, is it because you’re a slow typer? I doubt it—“having to type too much” is rarely a developer’s productivity problem. Instead, the pauses where you’re not typing are what make it slow. And what are developers usually doing during those pauses? Stopping to think—perhaps about the problem, perhaps about the tools, perhaps about email, whatever. But any time this happens, it indicates a problem.

The thinking is not the problem itself—it is a sign of some other problem. It could be one of many different issues:

Understanding

The most common reason developers stop to think is that they did not fully understand some word or symbol.

This happened to me just the other day. It was taking me hours to write what should have been a really simple service. I kept stopping to think about it, trying to work out how it should behave. Finally, I realized that I didn’t understand one of the input variables to the primary function. I knew the name of its type, but I had never gone and read the definition of the type—I didn’t really understand what that variable (a word or symbol) meant. As soon as I looked up the type’s code and docs, everything became clear and I wrote that service like a demon (pun partially intended).

This can happen in almost infinite ways. Many people dive into a programming language without learning what (, ), [, ], {, }, +, *, and % really mean in that language. Some developers don’t understand how the computer really works. Remember when I wrote The Singular Secret of the Rockstar Programmer? This is why! Because when you truly understand, you don’t have to stop to think. It’s also a major motivation behind my book—understanding that there are unshakable laws to software design can eliminate a lot of the “stopping to think” moments.

So if you find that you are stopping to think, don’t try to solve the problem in your mind—search outside of yourself for what you didn’t understand. Then go look at something that will help you understand it. This even applies to questions like “Will a user ever read this text?” You might not have a User Experience Research Department to really answer that question, but you can at least make a drawing, show it to somebody, and ask their opinion. Don’t just sit there and think—do something. Only action leads to understanding.

Drawing

Sometimes developers stop to think because they can’t hold enough concepts in their mind at once—lots of things are relating to each other in a complex way and they have to think through it. In this case, it’s almost always more efficient to write or draw something than it is to think about it. What you want is something you can look at, or somehow perceive outside of yourself. This is a form of understanding, but it’s special enough that I wanted to call it out on its own.

Starting

Sometimes the problem is “I have no idea what code to start writing.” The simplest solution here is to just start writing whatever code you know that you can write right now. Pick the part of the problem that you understand completely, and write the solution for that—even if it’s just one function, or an unimportant class.

Often, the simplest piece of code to start with is the “core” of the application. For example, if I was going to write a YouTube app, I would start with the video player. Think of it as an exercise in continuous delivery—write the code that would actually make a product first, no matter how silly or small that product is. A video player without any other UI is a product that does something useful (play video), even if it’s not a complete product yet.

If you’re not sure how to write even that core code yet, then just start with the code you are sure about. Generally I find that once a piece of the problem becomes solved, it’s much easier to solve the rest of it. Sometimes the problem unfolds in steps—you solve one part, which makes the solution of the next part obvious, and so forth. Whichever part doesn’t require much thinking to create, write that part now.

Skipping a Step

Another specialized understanding problem is when you’ve skipped some step in the proper sequence of development. For example, let’s say our Bike object depends on the Wheels, Pedals, and Frame objects. If you try to write the whole Bike object without writing the Wheels, Pedals, or Frame objects, you’re going to have to think a lot about those non-existent classes. On the other hand, if you write the Wheels class when there is no Bike class at all, you might have to think a lot about how the Wheels class is going to be used by the Bike class.

The right solution there would be to implement enough of the Bike class to get to the point where you need Wheels. Then write enough of the Wheels class to satisfy your immediate need in the Bike class. Then go back to the Bike class, and work on that until the next time you need one of the underlying pieces. Just like the “Starting” section, find the part of the problem that you can solve without thinking, and solve that immediately.

Don’t jump over steps in the development of your system and expect that you’ll be productive.

Physical Problems

If I haven’t eaten enough, I tend to get distracted and start to think because I’m hungry. It might not be thoughts about my stomach, but I wouldn’t be thinking if I were full—I’d be focused. This can also happen with sleep, illness, or any sort of body problem. It’s not as common as the “understanding” problem from above, so first always look for something you didn’t fully understand. If you’re really sure you understood everything, then physical problems could be a candidate.

Distractions

When a developer becomes distracted by something external, such as noise, it can take some thinking to remember where they were in their solution. The answer here is relatively simple—before you start to develop, make sure that you are in an environment that will not distract you, or make it impossible for distractions to interrupt you. Some people close the door to their office, some people put on headphones, some people put up a “do not disturb” sign—whatever it takes. You might have to work together with your manager or co-workers to create a truly distraction-free environment for development.

Self-doubt

Sometimes a developer sits and thinks because they feel unsure about themselves or their decisions. The solution to this is similar to the solution in the “Understanding” section—whatever you are uncertain about, learn more about it until you become certain enough to write code. If you just feel generally uncertain as a programmer, it might be that there are many things to learn more about, such as the fundamentals listed in Why Programmers Suck. Go through each piece you need to learn until you really understand it, then move on to the next piece, and so on. There will always be learning involved in the process of programming, but as you know more and more about it, you will become faster and faster and have to think less and less.

False Ideas

Many people have been told that thinking is what smart people do, thus, they stop to think in order to make intelligent decisions. However, this is a false idea. If thinking alone made you a genius, then everybody would be Einstein. Truly smart people learn, observe, decide, and act. They gain knowledge and then use that knowledge to address the problems in front of them. If you really want to be smart, use your intelligence to cause action in the physical universe—don’t use it just to think great thoughts to yourself.

Caveat

All of the above is the secret to being a fast programmer when you are sitting and writing code. If you are caught up all day in reading email and going to meetings, then no programming happens whatsoever—that’s a different problem. Some aspects of it are similar (it’s a bit like the organization “stopping to think,”) but it’s not the same.

Still, there are some analogous solutions you could try. Perhaps the organization does not fully understand you or your role, which is why they’re sending you so much email and putting you in so many meetings. Perhaps there’s something about the organization that you don’t fully understand, such as how to go to fewer meetings and get less email. :-) Maybe even some organizational difficulties can be resolved by adapting the solutions in this post to groups of people instead of individuals.

-Max

Make It Never Come Back

When solving a problem in a codebase, you’re not done when the symptoms stop. You’re done when the problem has disappeared and will never come back.

It’s very easy to stop solving a problem when it no longer has any visible symptoms. You’ve fixed the bug, nobody is complaining, and there seem to be other pressing issues. So why continue to do work on it? It’s fine for now, right?

No. Remember that what we care about the most in software is the future. The way that software companies get into unmanageable situations with their codebases is not really handling problems until they are done.

This also explains why some organizations cannot get their tangled codebase back into a good state. They see one problem in the code, they tackle it until nobody’s complaining anymore, and then they move on to tackling the next symptom they see. They don’t put a framework in place to make sure the problem is never coming back. They don’t trace the problem to its source and then make it vanish. Thus their codebase never really becomes “healthy.”

This pattern of failing to fully handle problems is very common. As a result, many developers believe it is impossible for large software projects to stay well-designed–they say, “All software will eventually have to be thrown away and re-written.”

This is not true. I have spent most of my career either designing sustainable codebases from scratch or refactoring bad codebases into good ones. No matter how bad a codebase is, you can resolve its problems. However, you have to understand software design, you need enough manpower, and you have to handle problems until they will never come back.

In general, a good guideline for how resolved a problem has to be is:

A problem is resolved to the degree that no human being will ever have to pay attention to it again.

Accomplishing this in an absolute sense is impossible–you can’t predict the entire future, and so on–but that’s more of a philosophical objection than a practical one. In most practical circumstances you can effectively resolve a problem to the degree that nobody has to pay attention to it now and there’s no immediately-apparent reason they’d have to pay attention to it in the future either.

Example

Let’s say you have a web page and you write a “hit counter” for the site that tracks how many people have visited it. You discover a bug in the hit counter–it’s counting 1.5 times as many visits as it should be counting. You have a few options for how you could solve this:

You could ignore the problem.
The rationale here would be that your site isn’t very popular and so it doesn’t matter if your hit counter is lying. Also, it’s making your site look more successful than it is, which might help you.

The reason this is a bad solution is that there are many future scenarios in which this could again become a problem–particularly if your site becomes very successful. For example, a major news publication publishes your hit numbers–but they are false. This causes a scandal, your users lose trust in you (after all, you knew about the problem and didn’t solve it) and your site becomes unpopular again. One could easily imagine other ways this problem could come back to haunt you.

You could hack a quick solution.
When you display the hits, just divide them by 1.5 and the number is accurate. However, you didn’t investigate the underlying cause, which turns out to be that it counts 3x as many hits from 8:00 to 11:00 in the morning. Later your traffic pattern changes and your counter is completely wrong again. You might not even notice for a while because the hack will make it harder to debug.
Investigate and resolve the underlying cause.
You discover it’s counting 3x hits from 8:00 to 11:00. You discover this happens because your web server deletes many old files from the disk during that time, and that interferes with the hit counter for some reason.

At this point you have another opportunity to hack a solution–you could simply disable the deletion process or make it run less frequently. But that’s not really tracing down the underlying cause. What you want to know is, “Why does it miscount just because something else is happening on the machine?”

Investigating further, you discover that if you interrupt the program and then restart it, it will count the last visit again. The deletion process was using so many resources on the machine that it was interrupting the counter two times for every visit between 8:00 and 11:00. So it counted every visit three times during that period. But actually, the bug could have added infinite (or at least unpredictable) counts depending on the load on the machine.

You redesign the counter so that it counts reliably even when interrupted, and the problem disappears.

Obviously the right choice from that list is to investigate the underlying cause and resolve it. That causes the problem to vanish, and most developers would believe they are done there. However, there’s still more to do if you really want to be sure the problem will never again require human attention.

First off, somebody could come along and change the code of the hit counter, reverting it back to a broken state in the future. Obviously the right solution for that is to add an automated test that assures the correct functioning of the hit counter even when it is interrupted. Then you make sure that test runs continuously and alerts developers when it fails. Now you’re done, right?

Nope. Even at this point, there are some future risks that have to be handled.

The next issue is that the test you’ve written has to be easy to maintain. If the test is hard to maintain–it changes a lot when developers change the code, the test code itself is cryptic, it would be easy for it to return a false positive if the code changes, etc.–then there’s a good chance the test will break or somebody will disable it in the future. Then the problem could again require human attention. So you have to assure that you’ve written a maintainable test, and refactor the test if it’s not maintainable. This may lead you down another path of investigation into the test framework or the system under test, to figure out a refactoring that would make the test code simpler.

After this you have concerns like the continuous integration system (the test runner)–is it reliable? Could it fail in a way that would make your test require human attention? This could be another path of investigation.

All of these paths of investigation may turn up other problems that then have to be traced down to their sources, which may turn up more problems to trace down, and so on. You may find that you can discover (and possibly resolve) all your codebase’s major issues just by starting with a few symptoms and being very determined about tracing down underlying causes.

Does anybody really do this? Yes. It might seem difficult at first, but as you resolve more and more of these underlying issues, things really do start to get easier and you can move faster and faster with fewer and fewer problems.

Down the Rabbit Hole

Beyond all of this, if you really want to get adventurous, there’s one more question you can ask: why did the developer write buggy code in the first place? Why was it possible for a bug to ever exist? Is it a problem with the developer’s education? Was it something about their process? Should they be writing tests as they go? Was there some design problem in the system that made it hard to modify? Is the programming language too complex? Are the libraries they’re using not well-written? Is the operating system not behaving well? Was the documentation unclear?

Once you get your answer, you can ask what the underlying cause of that problem is, and continue asking that question until you’re satisfied. But beware: this can take you down a rabbit hole and into a place that changes your whole view of software development. In fact, theoretically this system is unlimited, and would eventually result in resolving the underlying problems of the entire software industry. How far you want to go is up to you.