Things We Still Do, Twenty Years Onward

Previously posted on Dzone
Joel Spolsky’s once prolific blogging output dried up years ago, but Things You Should Never Do, Part I is still a classic after 22 years. He wrote it as an outsider’s postmortem following the first beta release (6) of Netscape’s browser, three years after the previous major release 4. There never was a version 5. The team had decided on a full rewrite, and the resulting delay probably cost them their competitive advantage over Microsoft’s Internet Explorer. 

If Netscape actually had some adult supervision with software industry experience, they might not have shot themselves in the foot so badly”, he closes. 

I like reading computing history and consider to what degree those articles are relevant today. Here’s the gist of the original argument in my own words.

By stalling development on their current product, Netscape appeared to have shut for business, at least from the end user’s perspective. That was strategically disastrous. It made no business sense to abandon their flagship project like that. From a technical standpoint, it was even worse to throw away all working code and start from scratch. Most code is harder to read than to write, but you should always prefer refactoring to rewriting. Full rewrites are no guarantee that you will not introduce the same bugs again, that were so painstakingly discovered and fixed in the old base.

I cannot find fault with the commercial rationale to keep working on the old version while you build the new one. You cannot leave existing customers waiting for three years, even when adding features and performing fixes on the legacy is a duplication of effort. It’s a pricy option for which you need deep pockets or other cash cows in your stable. The technical objections to doing a full rewrite in favor of careful refactoring is more interesting and contentious. 

Let’s concentrate on that.

Joel didn’t have access to Netscape’s source code. Neither do I, so we can only speculate whether it truly was irredeemable. But it probably was. Why throw away working software otherwise? It is important here to distinguish unworkable as a developer experiences it as opposed to a slowly degrading user experience. I believe there was no indication of the latter. Netscape users didn’t notice the architectural train wrecks underneath while the developers cursed and suffered. If you want management buy-in for a rigorous clean-up, customer complaints carry more weight. So, it appears the developers were vocal enough to get their way and optimistically thought they could do a good job in less time than it took to do it badly. They couldn’t.

Can a large body of source code be so unmaintainable as to warrant a full rewrite? They thought so at Netscape. I think so too. If you have been in software long enough you will have experienced the projects I refer to. The time it takes to grasp a large codebase to safely extend or modify it is never linear to lines of code. In poorly designed, untested, and undocumented projects it works exponentially, especially after the original developers have left. You can only expect to bring everything up to snuff solely with renaming and reshuffling if the team collectively has enough grasp of requirements and the business domain. Absent that, and you may as well start over. A hopeless situation is when you understand fully what the code does but have no clue why. 

Whether you try to improve on the old code base or write all new code, there’s a messiness threshold beyond which you are always forced back to the drawing board. You must first understand what to fix. During that process, it rarely makes sense to settle for a functionally identical copy. You will draw from lessons learned. You will improve awkward features, add useful new ones, and perhaps get rid of unused bloat.

Experience of the product in the field is valuable input towards incremental improvements, but these user findings don’t justify a rewrite. A prominent cause is the short-term driven, ship-asap mentality, which can only result in a death spiral of crushing technical debt. You have no one to blame but yourselves and yet we can’t seem to prevent it. Maybe because it touches on many aspects of the human psyche not taught in CS programs.

There is however a legitimate cause for a rewrite, which is not your fault and happens a lot. It’s having a bet on the wrong horse. Tools, frameworks, even entire languages become obsolete, and ever faster so. Java Swing and Google Web Toolkit were once competing products for building complex user interfaces with Java. They are going/have gone the way of Betamax video: once good and mature technologies, but the world prefers other ways of doing it. You cannot refactor a GWT app to Angular or React. It must be a rewrite.

There is some solace. Porting a well-designed legacy system that is not riddled with bad habits to a different ecosystem is not the same as starting completely from scratch. Even when writing all-new source code is strictly speaking a rewrite, it needn’t be a full re-think. Most web frameworks, like in the previous example, have a component-based architecture separating control logic from view templates. You can probably retain parts of the conceptual model, package structure, and naming schemes. 

Pessimists will say that if you messed up mightily before, you are bound to make the same mistakes next time around. I take a middle ground. There’s no guarantee that you won’t, but neither is it certain that you will. Just don’t have the same people at the helm who were responsible for running the ship aground, unless you can be sure that they have learned from their mistakes.

Even if you fear that history can only repeat itself, it still means greater job satisfaction and employee retention. That has to count for something. Good developers walk out if all they can do is stick duct tape on other people’s leaking contraptions. The industry is full of Frankenstein prototypes where 70% of the lines are still traceable to a single pioneer developer who either burned out, got bored, or was let go.

Let me close by saying something that all you clean code mavens will shudder at. In some markets, it makes eminent sense to amass market share on the cheap asap and to do it by cutting quality corners. The old sales maxim still holds. We can give you fast, cheap, and good: pick two. Doing it well means someone else may beat you to it, so move fast and make broken things. This first wave of caffeinated coders are your commando troops. Robert Cringely drew a fitting military analogy in Accidental Empires:

A start-up’s biggest advantage is speed […] [Commandos] work hard, fast and cheap, though often with a low level of professionalism, which is okay, too, because professionalism is expensive. […] Ideally, they do this by building a prototype of a product that is so creative, so exactly correct for its purpose that by its very existence it leads to the destruction of other products.

The duct tape wizards shouldn’t run the show for too long. Their mode of working is not sustainable. But if they hit the jackpot, then you should invest your winnings in a new team of responsible adults that take the time to do things well.