I saw this question about technical debt on hacker news. I wanted to write a reply there, but I realized what I wanted to say was long enough to be a blog post, so here it is.

In my experience, technical debt is the fault of just two groups of people: The managers and the developers, and between the two I might assign more blame to the later than to the former. Managers don’t understand the problem and ultimately don’t care about it. Developers created the problem because they generally don’t know how not to avoid it.

Managers

Manager pushes the team. It doesn’t matter if the push is too hard or the pace is unsustainable in the long run. What matters is always the here and now: The next release, the next deadline, the next demo. There’s an angry customer on the phone whose issue needs to be dealt with now. The sales team has promised a feature to a big customer that we need to deliver now. If we can just have the apperance of success at the next opportunity, we can figure out everything else later. And, of course, “later” quickly becomes “never”, because problems compound and the amount of time the developers quote for a course correction just keeps going up and up.

Maybe the manager gets lucky, and the team can keep muddling through. The building pile of problems isn’t an insurmountable one and a few plucky developers are able to fix a few things here and there en passant. The pace of things slows down and the amount of time quoted for even basic tasks goes up, but that’s just the way that things are and we probably just need to squeeze those lazy developers a little harder.

On the other hand, maybe the manager gets unlucky and the team completely implodes. That’s actually not such a bad place either, because the manager amends his resume with the line “helped a software development team acheive a fast pace of development and hit important milestones” leaving off the part about the pace being unsustainable or the period of hitting milestones being finite and leading to catastrophy. A resume should tell the truth, but it doesn’t need to tell the whole truth. Now your manager has jumped ship on the strength of his updated resume, along with a lot of your development team who are frustrated by the mire and looking for greener pastures. But, look on the bright side, you did have a pretty good stretch of successes along the way, right?

Managers are often motivated in perverse ways. Hitting the development milestones or budget targets 10 times in a row is pretty good, but blowing past them once is a terrible disaster. Technical debt and long-term sustainability of the pace and the team probably don’t come up at all in the annual performance review. The company wants long-term success, so they create a roadmap with small milestones, and then managers prioritize the short-term milestones over the long-term strategic goals. Every day we sweep a little bit of dust under the rug, and eventually somebody trips on the lump of dusk and breaks a leg.

Before you ask, no I don’t know how to fix this problem. I’m not sure a fix is possible. The company can’t wait until the end of a 10-year roadmap to determine if they were on pace, or if management has been doing a good job.

Developers

Developers, from your most respected architects and tech leads down do your lowliest junior developers created this problem. Managers don’t sit down at the keyboard and screw up the code directly, the developers have done that. The problem is that “rightness” is considered a separate process, a second step, beyond “doneness”. The code is declared “done” first, with the expectation that they can go back and make it “right” later. Later, in development as in management, never comes. There’s never a lull in the work. The ticket queue is never empty. There’s a constant stream of incoming bug fixes and new feature requests to contend with. The idea that a developer will have time to go back and correct a mistake later is a laughable one.

Developers appeal to the manager to “build some time into the schedule” to fix all those past mistakes, but that’s going to blow the release schedule and the manager can’t have that (see above). From the manager’s perspective, they see the developers screw it up the first time and then promise that, cross our hearts and hope to die, they’ll get it right the second time. For realsies this time. And when things get extra bad the developers start clamboring for a “complete rewrite” which sounds awfully expensive. What manager in her right mind will trust development of a completely new system to the same developers who so completely screwed up the first one?

The problem, in a nutshell, is that developers mark a feature as being “complete” when the code “just works” but before it’s considered to be “right”. The fact that the developers think they’re forced to do this because of time pressure from an oblivious and uncaring manager is inconsequential. As a developer myself, it’s my responsibility to provide the right solutions to the provided problems (in the domain where I work, it’s frequently expected that the solutions I provide will be software-based, but they don’t always need to be. This is another discussion for another time). If I abdicate on part of this responsibility by allowing my own work not to be correct, I have failed the team and failed myself.

Technical debt doesn’t happen in discrete events, it’s a build up of failures in the process over time. It’s choosing the ignore a problem when you see it, put in a work-around instead of a proper fix, or (more devastatingly) failing to recognize problems when they occur. It’s about walking blindly into a bad situation, and then complaining that you don’t have the time to walk back out. Managers are not typically allies in this, the company wants them to be but then provides motivations against it. The only way to make sure that things are done right is to actually do them right in the first place. Code doesn’t leave your workstation until it’s “correct”, whatever that means to you.

Avoidance

If developers don’t know how to avoid technical debt, it is a problem that can never be solved. The fix is going to create more problems. The big rewrite is going to turn out just as bad if not worse. The average developer created the problem of technical debt, and half of all developers are worse than average.

Developers really need to study: Learn your principles, patterns and anti-patterns. Study of anti-patterns teaches you to recognize when code is not going well and a refactor is required. Study of patterns can teach you some goals to get things organized again. Study of development principles can provide good rules of thumb to follow to avoid the most serious problems. Learn what refactoring is and how to do it. It’s not just “generally making things better” as some developers seem to think, there are specific transformations, often with descriptive names, which can be followed in sequence to produce better code. “I’m just going to copy and paste some things around until it looks better to me” isn’t refactoring and isn’t improvement, and your manager is right not to allocate extra time for this.

With experience a developer should learn to identify bad situations and be able to confidently improve them. With even more experience, a good developer should be able to anticipate bad situations and avoid them from the start. A “senior developer” who is racking up technical debt on a daily basis and claiming the only way out is a rewrite, is a failure and a serious liability.

Managers need to listen when the developers are saying that the pace is unsustainable or when a small success may lead to a large failure later. I don’t know that this will ever become the norm because the motivation framework isn’t there to support this behavior. “Skip-Level” meetings between the development team and the mangers’s manager may help to ensure that the long-term isn’t being sacrificed for the short-term. The manager also needs to be able to recognize real experience and real value. Maybe you need to get rid of developers who are causing more harm than benefit, or maybe you just need to your least capable developers away from the most complicated and mission-critical code.