Disclosure: This post contains affiliate links.
- Understand the internals of your OS
- Learn to use more advanced debugging tools
- Expose yourself to a greater variety of code
- Explain your code out loud
- Learn to identify code smell
Bill told his team that he would have the bug fixed by the end of the day. That was 9 hours ago. He can’t reproduce the crash reliably, and when it does happen, his code is not even in the call stack. “Surely the built-in socket class that keeps showing up in the stack trace is not broken,” he keeps telling himself. He pries his face off the keyboard and starts the tedious process of commenting out code, method-by-method, until the bug disappears. He doesn’t know what’s wrong with that 10 line method, but he rewrites it. Now everything works. He commits his fix with the message “Fixed strange crash caused by initialization method” and goes home without learning that his problem was caused by heap corruption due to some code that interops with an unmanaged DLL. If he knew how to use WinDbg and PageHeap, he would have fixed the issue within a few hours.
The first debugging technique most of us learn is setting breakpoints in the IDE at suspicious locations and stepping through the code until noticing a problem. In my experience, like Bill, a lot of developers never get any better. When they are unable to find the bug with breakpoints, they resort to removing code until the bug goes away. When they run out of time, they hack a workaround. This is unfortunate, because being unable to track down and understand problems in your own product is not a sign of professional craftsmanship. The more challenging bugs start out as mysteries that need solving. When a criminal investigator is solving a murder, she doesn’t have only one tool for doing so. She has interrogation skills, profiling skills, forensic science, and experience from past cases. She does not randomly pick up things at the scene hoping it will turn out to be evidence. She knows where to start looking. Just like a criminal investigator, a defect investigator should have tools at his disposal and skills that will guide him to an answer.
The purpose of this article is to point you in the direction of acquiring new tools and learning new skills, so that you can improve at tracking down mysterious bugs. Although some of my examples and suggestions assume you spend most of your time debugging on Windows, these concepts can be generalized to other systems.
1. Understand the internals of your OS
If you are developing code that runs on Windows and you don’t understand what a handle is or what the different types of handles are, then take some time and learn. You may think that because all of your code runs on the JVM or is managed by .NET that these things are irrelevant. This is not true. Don’t forget that at some point there is a running process making system calls as a result of your code.
A coworker approached me one day asking for advice about this memory leak he was trying to track down in some legacy .NET code. Whenever there was a network failure, the server application’s memory usage would explode until it finally crashed with an OutOfMemoryException. I noticed in task manager that the handle count was growing out of control, so I launched Process Explorer to take a deeper look. It turned out that the offending handles were all file handles named “\device\afd.” I knew from experience that these were objects related to the Winsock Ancillary Function Driver, therefore the application appeared to not be cleaning up sockets. He showed me the class that constructs the sockets, and we quickly realized exceptions were handled incorrectly causing a socket creation spiral. The mystery was solved in under 30 minutes thanks to a bit of knowledge of the internals of Windows.
Having some knowledge about things such as the network stack, the security model, system calls, memory management, and kernel objects will give you a significant advantage when strange bugs occur because you will have an idea of where to start looking. Awareness of these things will also help prevent you from writing those bugs in the first place.
If you are looking for a crash course on Windows Internals, I highly recommend the Pluralsight lecture series by Pavel Yosifovich. It is about 15-17 hours of lectures and examples and will introduce you to all kinds of useful tools you will then use to smash bugs. For me, this course was well worth the subscription. It guided me through topics I probably would not have found on my own and introduces them in an order that felt natural.
If you want an even deeper understanding, or you just prefer books over lectures. I recommend the Windows Internals series of books. Co-author Mark Russinovich is the author of many popular diagnostic tools that we will still rely on heavily today.
2. Learn to use more advanced debugging tools
Imagine a world where users submit bug reports and then you get to go back in time and attach a debugger to the process on the production machine and solve the problem. Being able to perform a crash dump analysis is very close to that; yet, this is a skill that many developers never acquire. Instructing users to submit crash dumps with their bug reports is not impossible. If they can get the task manager open, they can handle it. In many cases you could even get away with some kind of global exception handler that prompts the user to select whether or not he wants the process to send a dump to the developers before terminating.
If you don’t know how to analyze a running process (or a memory dump of a process) to a level of being able to locate memory leaks, performance bottlenecks, and access violations; it is time to get started. If you are a .NET developer I highly recommend you go through Tess Ferrandez’s debugging labs.
If you develop for Windows, but not necessarily .NET, I still recommend you get familiar with the “Debugging Tools For Windows” that are included in the Windows SDK. Most importantly, familiarize yourself with WinDbg. It gives you the power to investigate just about anything you can think of in a running process or a crash dump. I also recommend learning how to use GFlags (with PageHeap) simply because I have fixed many access violation crashes with that technique.
3. Expose yourself to a greater variety of code
There is more to fixing bugs than tools. It also helps to have experience. Need practice? Pick an open source project you use, or find interesting, and start contributing bug fixes. Even if someone else gets to a solution before you do, you still learn something new.
One of the reasons this is so important, is that unless you always work in isolation, you will be in situations that force you to understand code you did not write. The better you get at this, the more efficient you will be at getting things done.
When debugging code, especially code written by someone else, try to avoid telling yourself “this is impossible, this makes no sense.” Instead, ask yourself if there is anything about the surrounding code you don’t fully understand and make the effort to learn it. For example, I was helping someone debug a problem and I noticed there was an incorrect usage of the C++ Standard Library. The person I was assisting overlooked the bug, because it was a standard method and decided it was not worth investigating. Don’t do that. Read the code. If it doesn’t make sense, take the effort to figure it out.
4. Explain your code out loud
I cannot remember how many times I have asked a colleague to help me fix a bug, only for me to find the cause while explaining the code to him/her. Since that person turned out to not be needed in the first place, we can save man-minutes by testing it out on an inanimate object (this technique is often described as “rubber ducking”). Out of all the debugging advice given in The Pragmatic Programmer (a nice read, despite it’s age), this is my favorite. If talking to a rubber duck is too embarrassing for you, you can start typing up your problem as if you were asking for help over email or stackoverflow. This takes longer than rubber ducking, but if it ends up not giving you any leads, you still have something useful to attach to the bug in your bug reporting system and a good presentation of the problem if you do end up having to call your colleague.
5. Learn to identify code smell
If you can imagine source code having an odor, you might have encountered modules reeking of Limburger cheese. However, not all troublesome code has such a strong scent. Sometimes it takes a trained nose to be able to sniff it out. Code smell is a term that was introduced to me in Refactoring by Martin Fowler. It refers to code that needs to be fixed because it has grown too complicated, or too coupled, or for any other reason that causes it to be difficult to maintain. The book is a bit old and written with Java in mind, but I still encounter the code smells listed there (excessive usage of switch statements, inappropriate subclassing, etc…) very frequently during code reviews, and many of the prescribed solutions remain appropriate for any object oriented language. It is for this reason, I tend to recommend this book to new developers struggling with code quality.
In order to be efficient at solving bugs, you have to know which parts of the code to spend most of your time studying and questioning. When in doubt, search for anti-patterns or code smells because your bug is far more likely to be related to that code. Identifying the code smell is only the start, you have to learn how to fix them as well! When you find out the bug exists because of a missing a conditional in a 100 case switch statement, you know that changing it to a 101 case switch statement is usually a bad idea. Good developers make fixes that help ensure a repeat bug will not occur in the future.
Although reading about good and bad coding practices is a good start, it will only get you so far. Attending code reviews with more experienced developers, or pair programming with a developer who has already mastered this technique is a great way to improve. Treat each bug as a learning experience. When you find the solution be sure to question what changes could have been made to prevent this bug from occurring in the first place. You will start to notice recurring patterns. You will start to notice code smell.
Now get out there and fix all the things!
CLICK HERE to check out some of our other posts!
Curtis Humphrey says
Great post. Another couple of points to add: add tests they reduce the need to debug. Make more code pure functions which makes it easier to test. Also the sooner you see your bug from when it was created and the quicker the feedback loop between changing code and see if it fixed the bug the better too.