I agree that consistency is important — but what about when the existing codebase is already inconsistent? Even worse, what if the existing codebase is both inconsistent and the "right way to do things" is undocumented? That's much closer to what I've experienced when joining companies with lots of existing code.
In this scenario, I've found that the only productive way forward is to do the best job you can, in your own isolated code, and share loudly and frequently why you're doing things your new different way. Write your code to be re-used and shared. Write docs for it. Explain why it's the correct approach. Ask for feedback from the wider engineering org (although don't block on it if they're not directly involved with your work.) You'll quickly find out if other engineers agree that your approach is better. If it's actually better, others will start following your lead. If it's not, you'll be able to adjust.
Of course, when working in the existing code, try to be as locally consistent as possible with the surrounding code, even if it's terrible. I like to think of this as "getting in and out" as quickly as possible.
If you encounter particularly sticky/unhelpful/reticent team members, it can help to remind them that (a) the existing code is worse than what you're writing, (b) there is no documented pattern that you're breaking, (c) your work is an experiment and you will later revise it. Often asking them to simply document the convention that you are supposedly breaking is enough to get them to go away, since they won't bother to spend the effort.
Hopefully, you have a monorepo or something with similar effects, and a lack of fiefdoms. In that case, if the current way is undocumented and/or inconsistent, you make it better before or while adding in your new approach. If there are 4 ways to do the same thing and you really want to do it a different way, then replace one of those ways with your new one in the process of adding it. For extra credit, get to the point where you understand why you can't replace the other 3. (Or if you can, do it! Preferably in followups, to avoid bundling too much risk at once.)
A lot of inconsistency is the result of unwillingness to fix other people's stuff. If your way is better, trust people to see it when applied to their own code. They probably have similar grievances, but it has never been a priority to fix. If you're willing to spend the time and energy, there's a good chance they'll be willing to accept the results even if it does cause some churn and require new learning.
(Source: I have worked on Firefox for a decade now, which fits the criteria in the article, and sweeping changes that affect the entire codebase are relatively common. People here are more likely to encourage such thinking than to shoot it down because it is new or different than the status quo. You just can't be an ass about it and ignore legitimate objections. It is still a giant legacy codebase with ancient warts, but I mostly see technical or logistical obstacles to cleaning things up, not political ones.)
Thats not how software engineering works in a business setting though? Not a single company I have been in has the time to first fix the existing codebase before adding a new feature. The new feature is verbally guaranteed to the customers by project managers and then its on the dev to deliver within the deadline or you'll have much greater issues than a inconsistent codebase. I'd love to work in a fantasy company that allows for fixing legacy code, but that can take months to years with multi million line codebases.
> I'd love to work in a fantasy company that allows for fixing legacy code
You're not supposed to ask. It's like a structural engineer asking if it's okay to spend time doing a geological survey; it's not optional. Or a CFO asking if it's okay to pay down high interest debt. If you're the 'engineer', you decide the extent it's necessary
No. You discuss it with your manager, and you do it at the appropriate time. Having both created, refactored and deleted lots of technical debt over the past 25 years, trust me: you just don't get to go rogue because "you're the engineer". If you do that, it might turn into "you were the engineer".
What if you spend a week or month refactoring something that needs a quick fix now and is being deleted in 1-2 years? That's waste, and if you went rogue, it's your fault. Besides, you always create added QA burden with large refactoring (yes even if you have tests), and you should not do that without a discussion first--even if you're the founder.
Communicate with your manager and (if they agree) VP if needed, and do the right thing at the right time.
> No. You discuss it with your manager, and you do it at the appropriate time.
Sure, if you're not sure if it's the right thing to do, talk to your manager or TL. A good engineering manager can help. If your manager "would never allow" it, they're not a good manager. Even for jobs much more menial than engineering, a good manager recognizes that autonomy/trust are critical for satisfaction and growth.
If you're working someplace where you're "not allowed" to make the changes you "wish you could," you're doing yourself a disservice. Find someplace where you're not only "allowed," but expected to have (or develop) the judgement required to make these decisions.
To be clear: "the business" expects (and in the medium/long term requires) engineers to make these decisions themselves. That is the job
> If you do that, it might turn into "you were the engineer".
The correct solution would of course rather be "you were the manager". :-(
It's why Scotty was always giving longer estimates that Kirk wanted, but Kirk was also able to require an emergency fix to save the ship.
The estimate was building in the time to get it done without breaking too much other stuff. For emergency things, Scotty would be dealing with that after the emergency.
If your captain is always requiring everything be done as an emergency with no recovery time, you've got bigger problems.
Scotty manages his tech debt
Scotty is a fictional character.
His followers though, are REAL. (Unless declared integer.)
Thats also not applicable in a business setting. If you have multi million line codebase, you simply cant refactor within reasonable time. Also refactoring can cause issues wich then need further fixing and refactoring.
If I touch code that I am not supposed to touch or that does not relate to my direct task I will have HR talks. I'd be lucky to not get laid off for working on things that do not relate to my current task.
The Linux kernel is a multi-million line codebase, and refactoring still happens when needed. Let's not extrapolate from your limited data points as if they are representative for all codebases.
What field in software are you working in ?
Came from ERP development and now I am in webdev.
Totally no. Structural engineers also have to consider real life constraints, including cost. We are talking about working with existing structures, it's too late for geological surveys.
That totally depends on the surrounding configuration. If land subsidence was as common as security patches, they would totally be doing monthly surveys.
Structural engineers also don't commonly change structural features after initial delivery; realistically, I would expect changing a two-lane bridge to a four-lane bridge to be more expensive than constructing a four-lane bridge where none exists.
And then at some point the codebase becomes so unusable that new features take too long and out of frustration management decides to hire 500 extra programmers to fix the situation, which makes the situation even more slow.
As I understand, there is a balance between refactoring and adding new features. It’s up to the engineers to find a way to do both. Isn’t it also fair if engineers push sometimes back on management? Shouldn’t a civil engineer speak up if he/she thinks the bridge is going to collapse with the current design?
Often the problem with companies running the Feature Factory production treadmill too long is you have code supporting unused features and business logic, but nobody knows any more which features can be dropped or simplified (particularly after lots of employee churn and lack of documentation). So the problem is not so much technical debt, but product debt.
You can refactor, but you're also wasting time optimizing code you don't need. A better approach is to sit down with rest of the company and start cutting away the bloat, and then refactor what's left.
I was involved with a big rewrite. Our manager had on his desk the old system with a sign "[managers name]'s product owner". Nearly every time someone wanted to know how to do something the answer was load that old thing up and figure out what it did.
Eventually we did retire the old system - while the new code base is much cleaner I'm convinced it would have been cheaper to just clean that code up in place. It still wouldn't be as clean as the current is - but the current as been around long enough to get some cruft of its own. Much of the old cruft was in places nobody really touched anymore anyway so there was no reason to care.
> while the new code base is much cleaner I'm convinced it would have been cheaper to just clean that code up in place
I saw one big rewrite from scratch. It was a multi-year disaster, but ended up working.
I was also told about an earlier big rewrite of a similar codebase which was a multi-year disaster that was eventually thrown away completely.
I did see one big rewrite that was successful, but in this case the new codebase very intentionally only supported a small subset of the original feature set, which wasn't huge to begin with.
All of this to say that I agree with you: starting from scratch is often tempting, but rarely smooth. If refactoring in place sounds challenging, you need to internalize that a full rewrite will be a few times harder, even if it doesn't look that way.
I stayed at a place that was decades old, in part to decipher how they’d managed to get away with not only terrible engineering discipline but two rewrites without going out of business. I figured it would be good for me to stick around at a place that was defying my predictions for once instead of fleeing at the first signs of smoke. I’ve hired onto places that failed before my old employer did at least twice and I feel a bit silly about that.
I wasted a lot of my time and came away barely the wiser, because the company is spiraling and has been for a while. Near as I can figure, the secret sauce was entirely outside of engineering. If I had to guess, they used to have amazing salespeople and whoever was responsible for that fact eventually left, and their replacement’s replacement couldn’t deliver. Last I heard they got bought by a competitor, and I wonder how much of my code is still serving customers.
> I saw one big rewrite from scratch. It was a multi-year disaster, but ended up working.
90% of large software system replacements/rewrites are disasters. The size and complexity of the task is rarely well understood.
The number of people that have the proper experience to guide something like that to success is relatively small because they happen relatively rarely.
> "I'm convinced it would have been cheaper to just clean that code up in place"
Generally agreed. I'm generally very bearish on large-scale rewrites for this reason + political/managerial reasons.
The trick with any organization that wants to remain employed is demonstrating progress. "Go away for 3 years while we completely overhaul this." is a recipe for getting shut down halfway through and reassigned... or worse.
A rewrite, however necessarily, must always be structured as multiple individual replacements, each one delivering a tangible benefit to the company. The only way to stay alive in a long-term project is to get on a cadence of delivering visible benefit.
Importantly doing this also improves your odds of the rewrite going well - forcing yourself to productionize parts of the rewrite at a a time validates that you're on the right track.
Part of our issue with the rewrite is we went from C++ to C++. For an embedded system in 2010 C++ was probably the right choice (rust didn't exist, though D or Ada would have been options and we can debate better elsewhere). Previous rewrites went from 8 bit assembly to C++, which is the best reason to do a rewrite: you are actually using a different language that isn't compatible for an in place rewrite (D supports importing C++ and so could probably be done in place)
Rewrites are much like any act of self improvement. People think grand gestures and magical dates (like January 1 or hitting rock bottom) are the solution to turn your life around. But it’s little habits compounding that make or break you. And it’s new habits that kill old ones, not abstinence.
I worked with another contractor for a batshit team that was waiting for a rewrite. We bonded over how silly they were being. Yeah that’s great that you have a plan but we have to put up with your bullshit now. The one eyed man who was leading them kept pushing back on any attempts to improve the existing code, even widely accepted idioms to replace their jank. At some point I just had to ask him how he expected all of his coworkers to show up one day and start writing good code if he won’t let them do it now? He didn’t have an answer to that, and I’m not even sure the question landed. Pity.
The person who promised him the rewrite got promoted shortly before my contract was up. This promotion involved moving to a different office. I would bet good money that his replacement did not give that team their rewrite. They’re probably either still supporting that garbage or the team disappeared and someone else wrote a replacement.
That whole experience just reinforced my belief that the Ship of Theseus scenario is the only solution you can count on working. Good code takes discipline, and discipline means cleaning up after yourself. If you won’t do that, then the rewrite will fall apart too. Or flame out.
Whether you rewrite or refactor the code is not so much the point of my comment - it's more that you should first determine what you actually need, in consultation with the project stakeholders, get rid of whatever you don't need, and then you can decide whether you need to rewrite or refactor. Cutting away the bloat will give you a better perspective on that decision.
Personally, I would lean towards refactoring - a rewrite is the "declare bankrupcy" stage of technical debt and should only be considered in extremis. For example, the original codebase was written in ColdFusion and in 2025 you can't find any ColdFusion developers (or anyone in their right mind who wants to become a ColdFusion developer). But in any case, rewriting a trimmed down codebase is easier than trying to replicate features you don't need any more.
In the same way people go to their doctor or dentist or mechanic too late and prevention and sometimes even the best treatments are off the table, software developers (particularly in groups vs individually) love to let a problem fester until it’s nearly impossible to fix. I’m constantly working on problems that would have been much easier to address 2 years ago.
The issue is that management usually doesn't care. Personally I usually have about 3-4 days to implement something. If I can't deliver they will just look for more devs, yes. Quantity is what matters for management. New New New is what they want, who cares about the codebase (sarcasm). Management doesn't even know how a good codebase looks. A bridge that is missing support probably wouldn't have been opened to the public in the first place. Thats not correct for codebases.
It depends.
Most shacks built in one's backyard do not pass any building codes. Or throwing a wooden plank over a stream somewhere.
Just like most software doesn't really risk anyone's life: the fact that your web site might go down for a bit is not at all like a bridge collapsing.
Companies do care about long term maintenance costs, and I've mostly been at companies really stressing over some quality metrics (1-2 code reviews per change, obligatory test coverage for any new code, small, iterative changes, CI & CD...), but admittedly, they have all been software shops (IOW, management understood software too).
That’s why consistent messaging matters. If everyone agrees to make features take as long as they take, then management can’t shop things around. Which they shouldn’t be doing but we all know That Guy.
When children do this it’s called Bidding. It’s supposed to be a developmental phase you train them out of. If Mom says no the answer is no. Asking Dad after Mom said no is a good way to get grounded.
> The issue is that management usually doesn't care.
Neither do customers.
The product is an asset. Code is a liability.
Can't be held accountable for work conditions engineers dont have power over. If I dont have time to write tests, I cant be blamed for not writing tests. Especially now with hallucinating bs AI there is a whole load of more output expected from devs.
Recently I got an email that some severe security defects were found in a project, so I felt compelled to check. A bot called “advanced security AI” by Github raised two concerns in total, both indeed marked as “high severity”:
— A minimal 30 LoC devserver function would serve a file from outside the current directory on developer’s machine, if said developer entered a crafty path in the browser. It suggested a fix that would almost double the linecount.
— A regex does not handle backslashes when parsing window.location.hostname (note: not pathname), in a function used to detect whether a link is internal (for statically generated site client-side routing purposes). The suggested fix added another regular expression in the mix and generally made that line, already suffering from poor legibility due to involving regular expressions in the first place, significantly more obscure to the human eye.
Here’s the fun thing: if I were concerned about my career and job security, I know I would implement every damn fix the bot suggested and would rate it as helpful. Even those that I suspect would hurt the project by making it less legible and more difficult to secure (and by developers spending time on things of secondary importance) while not addressing any actual attack vectors or those that are just wrong.
Security is no laughing matter, and who would want to risk looking careless about it in this age? Why would my manager believe that I, an ordinary engineer, know (or can learn) more about security than Github’s, Microsoft’s most sophisticated intelligence (for which the company pays, presumably, some good money)? Would I even believe that myself?
If all I wanted was to keep my job another year by showing increased output thanks to all the ML products purchased by the company, would I object to free code (especially if it is buggy)?
Don't check in any code, only prompts. The product is reconfabulated on every build.
There will be companies founded on executing this idea.
Well engineered code that closely models the business is an asset.
Only a small percentage of code that’s ever written matches that criteria.
That asset also requires you to have good relationships with people who know how to maintain it properly.
Sounds like a dogma that got us (as industry) into this mess.
I’d rather say it’s an observation of real behavior. Customers of yours don’t buy code (unless it is your product) - they buy solutions to their problems. Thus, management and sales want to sell solutions, because that gets you paid.
Engineering is fulfilling requirements within constraints. Good custom code might fit the bill. Bad might, too - unless it’s a part of requirements that it shouldn’t be bad. It usually isn’t.
> A bridge that is missing support probably wouldn't have been opened to the public in the first place.
That's not always been the case and came to be because people have died... Is anyone going to die if your codebase is an unmaintainable mess?
Companies die because nobody is willing to work on the code anymore.
If VCs ever came to expect less than 90% of their investments to essentially go to zero, maybe that would change. But they make enough money off of dumb luck not leading to fatal irreversible decisions often enough to keep them fat and happy.
That doesn't sound nearly as bad or serious as people dying.
One: Have you ever tried to take a narcissists' money or power away from them? You would think you were committing murder (and some of them will do so to 'defend' themselves)
Two: All the stuff we aren't working on because we're working on stupid shit in painful ways is substantial.
Can you name some examples? I never heard of this before.> Companies die because nobody is willing to work on the code anymore.
Or it becomes so unusable that the customers become disenchanted and flee to competitors.
If management keeps making up deadlines without engineering input, then they get to apologize to the customer for being wrong. Being an adult means taking responsibility for your own actions. I can’t make a liar look good in perpetuity and it’s better to be a little wrong now than to hit the cliff and go from being on time to six months late practically overnight.
> As I understand, there is a balance between refactoring and adding new features.
True, but has drifted from the TFA's assertion about consistency.
As the thread has implied, it's already hard enough to find time to make small improvements. But once you do, get ready for them to be rejected in PR for nebulous "consistency" reasons.
They don't think they have the time, but that's because they view task completions as purely additive.
Imagine you're working on this or that feature, and find a stumbling block in the legacy codebase (e.g., a poorly thought out error handling strategy causing your small feature to have ripple effects you have to handle everywhere). IME, it's literally cheaper to fix the stumbling block and then implement the feature, especially when you factor in debugging down the line once some aspect of the kludgy alternative rears its ugly head. You're touching ten thousand lines of code anyway; you might as well choose do it as a one-off cost instead of every time you have to modify that part of the system.
That's triply true if you get to delete a bunch of code in the process. The whole "problem" is that there exists code with undesirable properties, and if you can remove that problem then velocity will improve substantially. Just do it Ship of Theseus style, fixing the thing that would make your life easier before you build each feature. Time-accounting-wise, the business will just see you shipping features at the target rate, and your coworkers (and ideally a technical manager) will see the long-term value of your contributions.
I am a solo dev for my company which is a semi profitable startup. Before I was hired the codebase was built by a hobbyist and remote workers. I was hired due to a language barrier with the remote staff. I barely have 4 years of experience so I really really dont have the time to fix shit. Currently I have to ship reckless without looking back. Thats upcoming tech companies for ya, will get worse with AI.
I am confused. In my first few years of my career, I did lots of refactoring because it helped me to learn different codebases and learn the skill of refactoring. Your experience is somewhat unrelated to your ability to fix old code. Desire is required.> I barely have 4 years of experience so I really really dont have the time to fix shit.
Expecting a developer with 4 YoE to be able to handle a messy legacy codebase by themselves is a suboptimal business decision I’d say.
New features are the right time to refactor. If you can't make the code not complete shit you don't have time to add the feature. Never refactor code to make it prettier or whatever, refactor it when it becomes not-fit-for-purpose for what you need to do. There's obviously exceptions (both ways) but those are exceptions not rules.
At least, that's what I teach our devs.
My company didn't even have time to keep the dependencies up to date so now we are stuck with Laravel 5 and Vue 2. Refactoring/Updating can be an incredible workload. Personally I'd say rewriting the whole thing would be more efficient but that's not my choice to make. If you have plenty of time for a task, I fully agree with you.
I was in an organisation that made decent money on a system built on Laravel 3, I think. The framework was written in an only static classes style, which they over ten years had run with while building the business so everything was static classes. Once you have a couple of million lines of that, rewrite is practically impossible because you need to take the team of two OK devs and a junior off firefighting and feature development for years and that will hurt reputation and cashflow badly.
My compromise was to start editing Laravel and implementing optimisations and caching, cutting half a second on every request within a month of starting, and then rewriting crude DIY arithmetic on UNIX epoch into standard library date/time/period functions and similar adjustments. I very openly pushed that we should delete at least two hundred thousand lines over a year which was received pretty poorly by management. When I left in anger due to a googler on the board fucking up the organisation with their annoying vision where this monster was to become "Cloud Native" on GCP credits he had, a plan as bad as a full rewrite, it only took a few months until someone finally convinced them to go through with deletions and cut LoC in half in about six months.
I don't think they do containers or automatic tests yet, probably never will, but as of yet the business survives.
I usually am in favor of a complete rewrite. I'd also prefer to not grow projects into multi million line monoliths. Just make multiple smaller ones that can interact independently with each other. Much simpler structure. Also safer in the long run.
You actually believe that? Distributed systems are simpler than monolithic? In PHP?
This business wouldn't exist if they attempted to follow your advice, because they weren't able and anyway didn't have the money to hire that many developers. There were a couple of subsystems they tried to implement the way you suggest, e.g. one for running certain background jobs.
It was a database table with one row per type of job and a little metadata like job status and a copy of the input. They started jobs by sending a HTTP request. This was a constant source of manual handling, because things started jobs and then crashed and never reset the status and things like that. You could respond that they should have used a message queue instead and so on, but the thing is, they didn't know how to build reliable distributed systems. Few developers do.
It's still also often the right business choice, especially for small businesses which aren't making a profit yet.
I don't disagree with what you've written at all, but let me just say:
> Hopefully, you have a monorepo or something with similar effects, and a lack of fiefdoms
ah to be so lucky...
It's a bit of a non-issue in this context. If you don't have a mono-repo, you should maintain reasonable consistency within each repository (and hope they're consistent between each other, but that's probably less important here).
Great points, I'd just add:
> A lot of inconsistency is the result of unwillingness to fix other people's stuff
Agree, so we find it best to practice "no code ownership" or better yet "shared code ownership." So we try to think of it all as "our stuff" rather than "other people's stuff." Maybe you just joined the project, and are working around code that hasn't been touched in 5 years, but we're all responsible for improving the code and making it better as we go.
That requires a high trust environment; I don't know if it could work for Firefox where you may have some very part-time contributors. But having documented standards, plus clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
> That requires a high trust environment; I don't know if it could work for Firefox where you may have some very part-time contributors.
Ironically, that's why it works for Firefox. Contributors follow a power law. There are a lot of one-shot contributors. They'll be doing mostly spot fixes or improvements, and their code speaks for itself. Very little trust is needed. We aren't going to be accepting binary test blobs from them. There are relatively few external contributors who make frequent contributions, and they've built up trust over time -- not by reporting to the right manager or being a friend of the CTO, but through their contributions and discussions. Code reviews implicitly factor in the level of trust in the contributor. All in all, the open nature of Firefox causes it to be fundamentally built on trust, to a larger extent than seems possible in most proprietary software companies. (There, people are less likely to be malicious, but for large scale refactoring it's about trusting someone's technical direction. Having a culture where trust derives from contribution not position means it's reasonable to assume that trusted people have earned that trust for reasons relevant to the code you're looking at.)
There are people who, out of the blue, submit large changes with good code. We usually won't accept them. We [the pool of other contributors, paid or not] aren't someone's personal code maintenance team. Code is a liability.
> But having documented standards, plus clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
100% agree. It's totally worth it even if you disagree with the specific formatting decisions made.
> All in all, the open nature of Firefox causes it to be fundamentally built on trust, to a larger extent than seems possible in most proprietary software companies.
Nice! We're still small so we somehow can keep that level of trust, but I always worry about how things may change for the worse as we grow. Mimicking the open source model as much as we can, even within a small private company, has worked well for us so far.
> > ... clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
> 100% agree. It's totally worth it even if you disagree with the specific formatting decisions made.
So true! 5-6 years ago we had to make open source contributions to both clang-format and clang-tidy for several months to get them to support closer to our preferred style before we could get the "ok, close enough" buy-in across the company to implement automated formatting. (Mostly bug fixes for evidently rare flag combinations, but also a few small new features.)
In retrospect it was completely unnecessary - simply relying on automated formatting is sooo much better than any specifics of the formatting. I'm still glad we did though, as it made both tools better. We earned the maintainers' trust with a few early PRs, and remained active contributors for a while, but haven't contributed much lately.
(Posted on Firefox mobile... Thanks!)
I agree, but this presupposes a large comprehensive test suite giving you enough confidence to do such sweeping changes. I don't doubt Firefox has it, but most (even large, established projects) will not. A common case I've seen is that newer parts are relatively well covered, but older, less often touched parts don't have good coverage, which makes it risky to do such sweeping changes.
> Hopefully, you have a monorepo or something with similar effects, and a lack of fiefdoms. In that case, if the current way is undocumented and/or inconsistent, you make it better before or while adding in your new approach.
Unfortunately, this is how you often get even more inconsistent codebases that include multiple tenures' worth of different developers attempting to make it better and not finishing before they move on from the organization.
> In that case, if the current way is undocumented and/or inconsistent, you make it better before or while adding in your new approach.
Sometimes, but oftentimes that would involve touching code that you don't need to touch in order to get the current ticket done, which in turn involves more QA effort.
The need is quite widely interpretable though.
I worked on a Drupal site once where somebody had put business logic and database querying inside of template files.
Just because you can implement something without touching any other part of the codebase doesn’t mean that’s a good decision.
> If it's actually better, others will start following your lead.
Not really my experience in teams that create inconsistent, undocumented codebases... but you might get 1 or 2 converts.
It depends on the day but generally I believe that most engineers want to write good code, want to improve their own skills, and like learning and critiquing with other engineers. Sometimes a small catalyst is all it takes to dramatically improve things. Most of the times I've thought that individual contributors were the problem, the real issue was what the company's leaders were punishing/rewarding/demanding.
> I believe that most engineers want to write good code
But the opinion what makes code good differ a lot between software developers. This exactly leads to many of the inconsistencies in the code.
And that’s why you talk about it and agree on stuff. I call that being a professional.
Exactly this. I (relatively recently) joined a team with a handful of developers all sort of doing things their own way. No docs, no shared practices, just individuals doing their own thing. After reviewing the code, submitted PRs with fixes, putting together docs for best practices, the entire team shifted their stance and started working closer together in terms of dev practices, coding styles, etc.
Not to say I got everyone to march to my drum -- the "best practices" was a shared effort. As you said, sometimes it just takes someone to call things out. We can do things better. Look at how things improve if you approach X problem in Y manner, or share Z code this way. Maybe the team was overwhelmed before and another voice is enough to tip the scales. If you don't try, you'll never know.
doing some recent contract work I discovered someone putting this into a PR (comments my own)
```
let susJsonString = '...' // we get this parseable json string from somwhere but of course it might not be parseable. so testing seems warranted...
try { // lets bust out a while loop!
while(typeof susJsonString === 'string') { susJsonString = JSON.parse(susJsonString) }
} catch { susJsonString = {} }
// also this was a typescript codebase but all the more reason to have a variable switch types! this dev undoubtedly puts typescript at the top of their resume
```
I suppose this works?! I haven't thought it through carefully, it's just deciding to put your shoes on backward, and open doors while standing on your head. But I decided to just keep out of it, not get involved in the politics. I guess this is what getting old is like seriously you just see younger people doing stuff that makes your jaw drop from the stupidity (or maybe its just me) but you can't say anything because reasons. Copilot, ai assisted coding only further muddies the waters imo.
This is totally fine. If you're given shit data this seems like a reasonable way to try to parse it (I would personally bound the loop).
Typescript is not going to make it better.
The problem is whoever is producing the data.
I think the complaint here is they have a string, which even has the word string in the variable name, and they turn it into an object at the end. Hence references to Typescript.
I suppose what is wanted is something like
let parsedJSON = {}
try { parsedJSON = JSON.parse(susJsonString) } catch { //maybe register problem with parsing. }
That's quite different though. It looks to be dealing with the case that a serialised object gets serialiased multiple times before it reaches that point of code, so it needs to keep deserialising until it gets a real object. E.g:
I'd guess the problem is something upstream.JSON.parse(JSON.parse("\"{foo: 1}\""))
Hmm, yeah ok, didn't pick this out of the
let susJsonString = '...'
example
but evidently it is not just that it is serialized multiple times, otherwise it shouldn't need the try catch (of course one problem with online discussion of code examples is you must always assume, contra obvious errors, that the code actually needs what it has)
Something upstream, sure, but often not something "fixable" either, given third parties and organizational headaches some places are prone to.
Yeah. I imagine that's a bandaid around having to consume a dodgy api that they didn't have access/permission to fix.
The blanket catch is odd though, as I'd have thought that it would still be outputting valid json (even if it has been serialized multiple times), and if you're getting invalid json you probably want to know about that.
probably some times the api comes out with an empty string.
The code is either going to loop once and exit or loop forever no
Putting this in my web console:
I see:let susJsonString=JSON.stringify(JSON.stringify(JSON.stringify({foo:1}))) console.log("initial:", susJsonString); try { while(typeof susJsonString==='string') { susJsonString = JSON.parse(susJsonString); console.log("iteration:", typeof susJsonString, susJsonString); } } catch { susJsonString = {}; }
A comment explaining the sort of "sus" input it was designed to cope with may have been helpful.initial: "\"{\\\"foo\\\":1}\"" iteration: string "{\"foo\":1}" iteration: string {"foo":1} iteration: object {foo: 1}
It will stop when it gets something that's not a string due to
as it'll keep reassigning and parsing until gets a non string back (or alternatively error out if the string is not valid json)while(typeof susJsonString==='string') { susJsonString = JSON.parse(susJsonString);
Now two of you are misunderstanding.
It’s applying the operation recursively.
Why the while loop
Because some upstream idiot is calling JSON.stringify several times.
I've seen this happen when someone's not familiar with their api framework and instead of returning an object for the framework to serialize, they serialize it on their own and return a string. Which then gets serialized again by the framework.
You came in so confident it was wrong, but it turns out you don’t really know what it does.
Please take a lesson from this. Good code is not the one that follows all the rules you read online. Your coworker you dismissed understood the problem.
I didn't know that JSON.stringify could be called multiple times on the same object and then unpacked via repeated calls to JSON.parse. So I was wrong on that. I think definitely this warrants a comment in the code, at the least explaining why this was taking place. The likely reason for the nesting was I think calling an LLM for a valid json object and somewhere in that workflow the json object was getting stringified more than once. I suspect this is the fault of the codebase and not the LLM itself, but it was typical of these devs to not ever investigate the api to understand what it was returning and rather just apply bandaid after bandaid.
I reserve my general opinion on the quality of this coder's work, as evidenced by the quality of the app itself among other things. But I guess you'd have to just trust (or not trust) me on that.
So no lessons learned?
Did you reply to the wrong comment?
I think asking questions is ideal. Even when I'm 99% sure a line is blatantly wrong, I will ask something like, "What is this for?". Maybe I missed something - wouldn't be the first time.
Darepublic originally posted his coworker’s code to make fun of above.
Sure, but that does not imply they will follow whatever you found out to be the best for the piece of code you are working on right now.
>Not really my experience in teams that create inconsistent, undocumented codebases... but you might get 1 or 2 converts.
This has also been my experience. Usually there is a "Top" sticky/unhelpful/reticent person. They are not really a director or exec but they often act like it and seem immune from any repercussions from the actual higher ups. This person tends to attract "followers" that know they will keep their jobs if they follow the sticky person for job security. There usually are a few up and coming people that want better that will kinda go along with you for their own skill building benefit but its all very shaky and you can't count on them supporting you if resistance happens.
I've literally had the "I was here before you and will be after" speech from one of the "sticky's" before.
All these HN how to do better write ups seem to universally ignore the issues of power and politics dynamics and give "in a vacuum" advice. Recognizing a rock and a hard place and saving your sanity by not caring is a perfectly rational decision.
Well HN was created as a forum for discussing start up best practices, which is all about disrupting big companies weighed down by internal politics.
The linked article is about dealing with legacy codebases with millions of lines of code.
The response is accurate - anyone that's had to deal with a legacy code base has had to deal with the creators of said birds nest (who proudly strut around as though the trouble it causes to maintainability makes them "clever").
There are however some people who think they are sticky but aren’t really. Some but not all of them use Impostor Syndrome to keep their followers in line. You can recruit most easily from people they’ve left twisting in the wind when their suggestions and ideas turned out to not work, but only if you always deal with the poor consequences of your own decisions. People will follow ideas they don’t quite understand if they know they won’t be working alone at 7 pm on a Thursday fixing it.
These sort of people will vote for you publicly. However some lot them will still take the path of least resistance when you aren’t looking.
It was sort of a nasty surprise when I figured out one day that there are people in this industry that will agree with high minded sentiments in public but not lift a finger to get there. I ended up in a group that had two or three of them. And one day due to a requirements process fuckup we had a couple weeks with nothing to do. They just did the Hands Are Tied thing I’d been seeing for over a year (yes we should do X but we have to do Y for reasons) and I saw red. Luckily I was on a conference call instead of sitting in front of them at that moment. But I’m sure they heard the anger lines over the phone.
If the boss doesn’t give you an assignment, you work on tech debt they haven’t previously insisted that you work on. Simple as that. At most places if my boss disappeared, I could keep busy for at least three months without any direction. And keep several other people busy as well. If you don’t know what to work on then I don’t know what’s wrong with you.
I tried my best to offer a pragmatic recommendation for dealing with those sorts of people. I'd love to know what you would recommend instead?
IME it's politics, so you need to find someone that the sticky person fears/respects, and get them onboard.
The only other way I have succeeded is to appeal to the sticky person's ego, make them think that it's their idea.
Note: I have also had to deal with
Sticky person: Do it this way
Me: But X
Sticky Person: No, do it the way I have decreed
[...]
Three hours later (literally)
Sticky Person: Do it X way
You are most likely giving the only real answer. However I just say its better to not care. Look how much energy and mental strife you are going through to line someone else's pockets against their will. That only is not even including the actual work just the unnecessary work around the work. Its not worth it. If the tech is this dysfunctional then so is the rest of the organization, you breaking your back to fix one support structure is just masochism.
This exactly. I worked at a place one time with a terrible code base. They based it on open source and slapped on additions with no style or documentation.
My first day, I couldn't even stand the code base up on my local dev environment, because there were so many hard-coded paths throughout the application, it broke (they were unwilling to fix this or have me fix it).
I tried to accept their way of coding and be part of the team, but it got too much for me. They were staunch SVN supporters. This isn't much of a problem, but we had constant branching problems that Git would have resolved.
As I got assigned work, I noticed I would have to fix more bugs and bad coding, before I could even start the new addition/feature. It was riddled with completely obvious security vulnerabilities that were never fixed. Keep in mind that this was the new product of the entire company with paying customers and real data.
The team lead was also very insecure. I couldn't even nicely mention or suggest fixes in code that he had written. The interesting thing is that he didn't even really have a professional coding background. He went straight from tech support to this job.
I lasted about a year. I got let go due to 'money issues'. Shortly before this, they wanted me to merge my code into my branch with the Jr. developer's code right before my vacation (literally the day before).
I merged it and pushed it up to the repo (as instructed) and the team lead sent me nasty emails throughout my vacation about how various parts of my code 'didn't work'. Not only were these parts the Jrs code, it wasn't ready for production.
The other thing to know about the team lead is that he was extremely passive aggressive and would never give me important project details unless I asked (I'm not talking details, just high-level, what needs to be completed).
We had a call where he told me I 'wasn't a senior developer'. I wanted to tell him to fuck off, but I needed the job. The company went out of business 2 months later.
I found out their entire business model relied only on Facebook Ads, and they got banned for violating their rules.
ahh, there's a lot of scenarios here.
in my scenario, those people were gone.
> In this scenario, I've found that the only productive way forward is to do the best job you can, in your own isolated code, and share loudly and frequently why you're doing things your new different way.
Now you have N+1 ways.
It can work if you manage to get a majority of a team to support your efforts, create good interfaces into the legacy code paths, and most importantly: write meaningful and useful integration tests against that interface.
Michael Feathers wrote a wonderful book about this called, Working Effectively with Legacy Code.
I think what the author is trying to say with consistency is to avoid adding even more paths, layers, and indirection in an already untested and difficult code base.
Work strategically, methodically, and communicate well as you say and it can be a real source of progress with an existing system.
I’ll check out that book, thanks for the reference.
I rarely see large 10m+ LOC codebases with any sort of strong consistency. There are always flavors of implementations and patterns all over the place. Hell, it's common to see some functionality implemented multiple times in different places
And it's fine, right? Honestly I think people need to realize that part of being a good engineer is being able to deal with inconsistency. Maybe submodule A and submodule B do network requests slightly differently but if both ways are reasonable, working, and making the company money, it's probably not worth delaying product improvements in order to make things "more consistent."
On the other hand if no one in your company cares about consistency, at some point everything becomes so awful you basically won't be able to retain engineers or hire new ones, so this is a place where careful judgement is needed.
>and it's fine, right?
The hard part of being an engineer is realizing that sometimes even when something is horribly wrong people may not actually want it fixed. I've seen systems where actual monetary loss was happening but no one wanted it brought to light because "who gets blamed"
That’s always a strong signal to start polishing your resume. Layoffs are probably just around the corner.
That's crazy, is there no opportunity to get credit for preventing monetary loss?
I've been somewhere that no there was no rewards for fixing monetary loss. However thats not really what I was getting at, I was more hinting at embezzlement going on at LOTS of places.
Yeah 100%. Honestly style / technique / language consistency are implementation details, it helps with engineer fungibility and ramp up, but it also works against engineers applying local judgement. This is something to briefly consider when starting new services/features, but definitely not something to optimize for in an existing system.
On the other hand, data and logic consistency can be really important, but you still have to pick your battles because it's all tradeoffs. I've done a lot of work in pricing over the decades, and it tends to be an area where the logic is complex and you need consistency across surfaces owned by many teams, but at the same time it will interact with local features that you don't want to turn pricing libraries/services into god objects as you start bottlenecking all kinds of tangentially related projects. It's a very tricky balance to get right. My general rule of thumb is to anchor on user impact as the first order consideration, developer experience is important as a second order, but many engineers will over-index on things they are deeply familiar with and not be objective in their evaluation of the impact / cost to other teams who pay an interaction cost but are not experts in the domain.
A common experience (mostly in the Pacific North West) I have had is to implement a feature in a straightforward manner that works with minimal code, for some backlog issue. Then I'm told the PR will be looked at.
A couple days later I am told this is not the way to do X. You must do it Y? Why Y? Because of historical battles won and lost why, not because of a specific characteristic. My PR doesn't work with Y and it would be more complicated...like who knows what multiplier of code to make it work. Well that makes it a harder task than your estimate, which is why nobody ever took it up before and was really excited about your low estimate.
How does Y work? Well it works specifically to prevent features like X. How am I supposed to know how to modify Y in a way that satisfies the invisible soft requirements? Someone more senior takes over my ticket, while I'm assigned unit tests. They end up writing a few hundred lines of code for Y2.0 then implement X with a copy paste of a few lines.
I must not be "a good fit". Welcome to the next 6-12 months of not caring about this job at all, while I find another job without my resume starting to look like patchwork.
Challenging people's egos by providing a simpler implementation for something someone says is very hard, has been effective at getting old stagnant issues completed. Unnaturally effective. Of course, those new "right way" features are just as ugly as any existing feature, ensuring the perpetuation of the code complexity. Continually writing themselves into corners they don't want to mess with.
This sounds like you are missing important context. Here is a similar conversation:
"Why do I have to use the system button class. I implemented my own and it works."
"Because when the OS updates with new behavior your button may break or not get new styling and functionality"
"But this works and meets the spec, that's 10x harder"
More like we have to use the god object to make all http calls for consistency in logging, despite this being a gcp pubsub.
Find the context you are missing and if it’s bad reason you will understand their perspective more and be able to persuade them.
Hard for me to comment definitively here since I don't have the other side of the story, but I will say that I have seen teams operating based on all kinds of assumed constraints where we lose sight of the ultimate objective of building systems that serve human needs. I've definintely seen cases where the true cost of technical debt is over-represented due to a lack of trust between between business stakeholders and engineering, and those kind of scenarios could definintely lead to this kind of rote engineering policy detached from reality. Without knowledge of your specific company and team I can't offer any more specific advice other than to say that I think your viewpoint of the big picture sounds reasonable and would resonate in a healthy software company with competent leadership. Your current company may not be that, but rest assured that such companies do exist! Never lose that common sense grounding, as that way madness lies. Good luck in finding a place where your experience and pragmatism is valued and recognized!
I feel your pain. But I guess you need to work harder on detecting these kinds of work places upfront, instead of joining them one after another?
Generally, companies filter out candidates who request to look at any measurable amount of source code as part of the process. Larger companies leveragethe 6-12 mo contractor to hire. You are still stuck there until you are not.
These topics are common knowledge, if you have interviewed in the last 5 to 10 years. I have been working for 25, so I find the blame trying to be redirected, by some, misguided.
Yes, you can't directly look at the source code (unless you pick companies that open source a lot). But I was more thinking of trying to develop some proxy metrics that you can measure; the most common being asking the right questions in the interview. But you can also try to look for other tells.
And to be practical, that's fine. In a big codebase it's more important to encourage consistent, well-defined, small interfaces, and a clean separation of concerns, than to try to get consistency in the lower-level implementation details. Other non-code concerns like coordinating releases and migration of shared services are also way more important than getting everyone to use the same string library.
(Of course, if you carry that principle to the extreme you end up with a lot of black-box networked microservices.)
> but what about when the existing codebase is already inconsistent?
Then you get people together to agree what consistent looks like.
I find the easiest way to do this is to borrow someone else's publicly documented coding conventions e.g. Company ABC.
Then anyone disagreeing isn't disagreeing with you, they're disagreeing with Company ABC, and they (and you) just have to suck it up.
From there on in, you add linting tools, PR checks etc for any new code that comes in.
If there's resistance to picking a style guide, autoformatting might be a viable start and will probably do quite a bit for shallow consistency at the price of large PR:s once per file. Once one has worked with a forced style for a while it starts to feel weird to see breaches of it, and I think that might help softening people to adapting a style guide regarding more subtle things like error handling or attitude to standardised protocols like HTTP.
My approach is what I call defensive programming, with a different meaning than the usual usage of the term. I assume that my coworkers are idiots that aren't going to read my documentation, so I make all public classes and methods etc. as idiot-proof as possible to use. Hasn't saved me from every issue caused by my teammates never reading my docs or asking me questions, but it's definitely prevented several.
> assume that my coworkers are idiots
I know (most?) people don't mean it literally when writing something like this but I still wonder why such self-evident ideas as "make things easy to use correctly and hard to use incorrectly" are framed in terms of "idiots who don't rtfm".
The best documentation is what wasn't written because it (actually!) wasn't needed. On the other hand, even if people aren't "idiots", they still make mistakes and take time to figure out (perhaps by reading tfm) how to do things and complete their tasks, all of which has a cost. Making this easier is a clear benefit.
The bigger the codebase, we all eventually find ourselves in scenarios where we were the idiot. Making the APIs as foolproof as possible, utilizing tools like Semgrep, deprecating stuff in the way your language supports that it shows up in the IDE,… all that stuff should be utilized.
Sometimes people are too afraid of attempting to make it consistent.
I've done several migrations of thing with dozens of unique bespoke usage patterns back to a nice consistent approach.
It sometimes takes a couple straight days of just raw focused code munging, and doesn't always end up being viable, but it's worth a shot for how much better a state it can leave things in.
Highly agree. I've done quite a few large refactors of unnecessarily complex systems that resulted in significant improvement - from lots of bugs to nearly no bugs, incomprehensible code to simple straight forward code, no tests to great test coverage.
I did have one bad experience where I ended up spending way too much time on a project like that, I think I made some mistakes with that one and got in a bit too deep. Luckily my team was very supportive and I was able to finish it and it's a lot better now than it was.
It is terrible to just do this on your own, particularly as the n00b.
If there are 5 different standards in the codebase, don't just invent your own better way of doing things. That is literally the xkcd/Standards problem. Go find one of the people who have worked there the longest and ask which of the 5 existing standards are most modern and should be copied.
And as you get more experience with the codebase you can suggest updates to the best standard and evolve it. The problem is that you then need to own updating that whole standard across the entire codebase. That's the hard part.
If you aren't experienced enough with the codebase to be aggressive about standardization, you shouldn't be creating some little playground of your own.
> If there are 5 different standards in the codebase, don't just invent your own better way of doing things. That is literally the xkcd/Standards problem. Go find one of the people who have worked there the longest and ask which of the 5 existing standards are most modern and should be copied.
I strongly disagree with you and believe you've missed the point of my comment. Think about this: why are there 5 different standards in the codebase, none of which meet your needs? Do you think any engineers on the team are aware of this situation? And how might you get more experience with the codebase without writing code that solves your problems?
“Time” is the answer almost always.
Standards evolve over time, as do the languages and frameworks. Old code is rarely rewritten, so you end up with layers of code like geological strata recording the history of the developer landscape.
There’s a more complicated aspect of “Conway’s law, but over time” that’s hard to explain in a comment. And anyway, Casey Muratori did it better: https://youtu.be/5IUj1EZwpJY?si=hnrKXeknMCe0UPv4
People who created 3 of the 5 no longer work at the company, test coverage is minimal or nonexistent and the system mostly does what it’s supposed to.
In this situation, ‘getting more experience in the code base’ is more or less synonymous with ‘getting paged on the weekend’.
> Do you think any engineers on the team are aware of this situation?
Yes, there probably are. If you haven't been working there for long enough to know who they are, then you shouldn't be YOLO'ing it.
The fact that it hasn't all been cleaned up yet is due to that being an order of magnitude harder than greenfielding your own standard. That doesn't mean that nobody is aware of it, or working on it.
I've absolutely worked for a decade on a codebase which had at least 5 different standards, and I was the one responsible for cleaning it all up, and we were understaffed so I could never finish it, but I could absolutely point you at the standard that I wanted you to follow. It also probably was somewhat deficient, but it was better than the other 4. It evolved over time, but we tried to clean it all up as we went along. Trying to ram another standard into the codebase without talking it over with me, was guaranteed to piss me off.
Consistency in huge legacy codebase is like virginity in a brothel: desired but non-existent.
Consistency is never the sole reason to change something, there's always consistency and at least one of the following:
- coherency
- type safety (shout-out to dynamically typed languages that adapted static typing)
- concurrency
- simplicity
- (and more)
> If it's actually better, others will start following your lead.
A lot of people don't want to improve the quality in their output and for various reasons... some are happy to have something "to pay the bills", some don't want to use a programming language to its full extend, some have a deeply rooted paradigm that worked for 10 years already ("static types won't change that"), others are scared of concurrency etc. For some people there's nothing to worry about when a server can be blocked by a single request for 60 secs.
I advise against this if you have not been allocated the time or budget to revise the code. For one thing, you're lying. For another thing, were you hired to be a part of the contributing team or hired to be part of a research team doing experiments in the contributing team's codebase and possibly deploying your experiment on their production systems?> your work is an experiment and you will later revise it
I would immediately push back on any new guy who says this, no matter how confident he seems that his way is the right way.
Counter-thought:
We are making brand new things here and not being in an assembly line coming up with the very same thing dozens to million times. We are paid to make new products never existed, having novelty elements in it desired to be a bigger extent than not!
Those pretending knowing exactly what they are doing are lying!
Of course we are speculating here about the size of novelty content to a differing extent, which is never 0% and never 100%, but something inbetween. But those pushing back on those at least trying to revise the work - putting emphasis on it -, deserve no-one coming to them to be pushed back (at least for the inability of allocating resources for this essential activity of development. Development!).
(tried to mimic the atmosphere of the message, sorry if failed)
> but what about when the existing codebase is already inconsistent
It depends.
If it's a self contained code base, automatic refactoring can safely and mechanically update code to be consistent (naming, and in some cases structurally).
If it's not self contained and you're shipping libraries, i.e. things depend on your code base, then it's more tricky, but not impossible.
"Lean on tooling" is the first thing you should do.
To me the (a), (b) and (c) tactics remind me of people who were very hard to work with. I believe a better approach is indeed like you already mentioned, explain and document, but, as an extra: also be open to comments on the docs and implementation. Often there's a reason that your particular approach was not used earlier, like explained in the article.
My last experience with this, in the section of code I had to navigate for my first ticket at startup X we had some code that was querying the same tables multiple times unnecessarily. We were also using a highly bespoke (imo) code library with a relatively small following on github but this library permeated the entire codebase and dictated the way things had to be done. I tried to just make my changes by touching the code as little as possible, but avoiding the most outstanding inefficiencies. I thought of my changes as a small oasis of sanity in a desert of madness. In that first PR the seniors tore me a new one. It turned out there was a v2 of "the right way of doing things" that I had to learn about and then write my code to conform to. v2 had it's own issues, though was perhaps not as bad as v1. Later on when I became more influential I was able to successfully advocate to management to change many things to my liking, including axing the beloved exotic library that distinguished our codebase. But the old guard remained highly resistant to changing the way our code was written, and switched their stance from 'this is great' to, 'it sucks but its too much effort not to keep with it'. I am left feeling that it was all not worth it, not just the struggle but whether the product was appreciably effected one way or another. Just another crappy war story of my blighted career.
This resonates hard. In particular, managing the old guard’s emotions is a defeating process. It is almost always easier to jump into a project and critique it than it is to start from scratch. New perspectives and better ideas should be welcome, but instead they can be shut down because folks take things personally.
My (rather unfortunate) conclusion is that when I encounter this behavior I move to another team to avoid it. If that’s not possible it’s honestly worth looking for another job.
I had this in 2018 with a company that were still using csh scripts in all of the development tooling.
Relevant xkcd: https://xkcd.com/927/
"Now we have five inconsistent coding conventions..."
> The other reason is that you cannot split up a large established codebase without first understanding it. I have seen large codebases successfully split up, but I have never seen that done by a team that wasn’t already fluent at shipping features inside the large codebase. You simply cannot redesign any non-trivial project (i.e. a project that makes real money) from first-principles.
This resonates. At one former company, there was a clear divide between the people working on the "legacy monolith" in PHP and the "scalable microservices" in Scala/Go. One new Scala team was tasked with extracting permissions management from the monolith into a separate service. Was estimated to take 6-9 months. 18 months later, project was canned without delivering anything. The team was starting from scratch and had no experience working with the current monolith permissions model and could not get it successfully integrated. Every time an integration was attempted they found a new edge case that was totally incompatible with the nice, "clean" model they had created with the new service.
I worked at a company that had a Rails monolith that underwent similar scenario. A new director of engineering brought in a half dozen or of his friends from his previous employer to write Scala. They formed up a clique and decide Things Were Going to Change. Some 18 months and 3 projects later, nothing they worked on was in production. Meanwhile the developer that was quietly doing ongoing maintenance on the monolith had gradually broken out some key performance-critical elements into Scala and migrated away from the Ruby code for those features. Not only had it gone into production, it made maintenance far easier.
> Meanwhile the developer that was quietly doing ongoing maintenance on the monolith had gradually broken out some key performance-critical elements into Scala and migrated away from the Ruby code for those features.
Yep and that's what I've seen be successful: someone who really knows the existing code inside and out, warts and all, needs to be a key leader for the part being broken out into a separate system. The hard part isn't building the new system in these projects, it's the integration into the existing system, which always requires a ton of refactoring work.
> needs to be a key leader for the part being broken out into a separate system
Indeed, the developer was one of the best programmers I've known and absolutely the key person on the system. The New Guys Clique were the sort of developers, you might know some, who come in, look at the existing systems, decide it's all wrong and terrible, and set out to Do It Right.
I've seen almost this exact scenario play out, although in my case it was just one person as opposed to a clique. He had just come from a much larger company in the same business, and almost right away he proposed that we should rearchitect a significant portion of our software to match the way things were done at his previous employer.
His proposed architecture wasn't without elegance, but it was also more complex and, more importantly, it didn't solve any problems that we actually had. So in the end it was more of an ideological thing.
He seemed to take it personally that we didn't take him up on his proposal, and he left a few months later (and went back to his previous employer, though to a different group). Don't think he was around for even a year. He wasn't young either; he was pretty experienced.
What was the language and proposed architecture? Nothing is quite as grating as being hired for work on an Erlang/Elixir code base and seeing basically no correct usage of the platform and it'd definitely result in an instant suggestion to start correctly leveraging Erlang, piece by piece. It's not even a question of architecture as much as it is completely ignoring what makes the platform good, IMO.
It was all c++. The existing architecture consisted of a number of different executables, which were mostly statically linked (except for stuff like the C and C++ runtime libs). His proposal was to factor out the business logic from each executable into a shared library, and then (if memory serves) we would only need a single executable that could work with any of these new shared libraries. Like a plugin architecture.
This would mean that when rolling out a new version, usually only the shared library would need to be replaced. So.. significantly more complexity, due to now having a more arms-length interface between the executable and the business logic in the shared libraries. And the payoff was.. when deploying, we would upgrade a shared library instead of an executable? Still doesn't seem worthwhile.
Even if you’re generally suspicious of so called best practice/ design patterns / Martin Fowlerisms.. this is a time for the strangler approach. (Parent and siblings are already talking about the same idea without naming it.)
Rewrites from scratch never work with sufficiently large systems, and anyone that’s been involved with these things should be savvy enough to recognize this. The only question is around the exact definition of sufficiently large for a given context.
A similar, more concrete approach is parallel implementations, as written about by John Carmack[0]. I suppose the main difference is that parallel implementation has you very explicitly and intentionally leave the "old stuff" around until you're ready to flip the switch. I've used this approach in large-scale refactorings very successfully.
One of the benefits is that you get to compare the new vs old implementation quickly and easily, and it lets you raise questions about the old implementation; every time I've done this I've found real, production-impacting bugs because the old system was exhibiting behaviors that didn't match the new system, and it turned out they weren't intentional!
[0] http://sevangelatos.com/john-carmack-on-parallel-implementat...
I second this approach. I've utilized it successfully in an ongoing migration. I also second the need to have engineering knowledge from the previous system available. I was lead for 5 years on a previous system before being tasked as solution architect of new total rewrite to cloud native. The hardest part of such a migration is maintaining sponsor buy-in while you build the parallel run evaluation strangler fig integration with the old system and get some existing flow polished enough to take live. If you happen to have a rule or scripting system in place piggy pack off of it so you can do an incremental migration.
Also an issue is that the director attempted a full rewrite as a separate project.
You can do successful rewrites but your rewrite has to be usable in production within like a month.
If you don’t know how to achieve that, don’t even try.
The quiet developer was able to get their own rewrite done because they understood that.
Looks like the director of engineering showed some classic inexperience. You can tell when someone has done something before and when it’s their first time.
> a full rewrite as a separate project.
And it was never constrained to rewriting the existing system. The rewrite plan was motivated by the entirely reasonable desire to make further improvements possible, an additional mistake was the attempt to add major improvements as part of the rewrite. The new guys made their disdain for the existing system obvious, to the extent that their intent for the rewrite ballooned into a ground-up rebuild of everything.
Things You Should Never Do, Part I: https://www.joelonsoftware.com/2000/04/06/things-you-should-...
> You can do successful rewrites but your rewrite has to be usable in production within like a month.
I strongly disagree with this, and it reminds me of one of the worse Agile memes: "With every commit, the product must be production-ready.". [0]
The rewrite has to be generally not behind schedule. Whatever that schedule is is up to the folks doing the work and the managers who approve doing the work.
[0] I've worked for an Agile shop for a long time, so please don't tell me that I'm "Doing Agile Wrong" or that I misunderstand Agile. "No True Scotsman" conversations about Agile are pretty boring and pointless, given Agile's nebulous definition.
OK so you're actually right, but the actual criteria of "whether you can do this" depends on a lot of factors from the project to the people.
But there's no way to really describe it. It's like explaining to somehow how to parallel park or do a kickflip... you can only explain it so much.
I like to say "it should be usable in production soon" because it's generally a good approximation that takes into account what you might have to work with. It's an upgrade from advice like Joel's who just say "IT NEVER WORKS"
> It's an upgrade from advice like Joel's who just say "IT NEVER WORKS"
What you originally said ("It must be usable in production within a month") is equivalent to "Just don't do it, because IT NEVER WORKS" for all but the smallest, simplest projects out there in Professional Programmer land. [0]
> But there's no way to really describe it.
There really is a way to describe it:
"The rewrite has to be generally not behind schedule. Whatever that schedule is is up to the folks doing the work and the managers who approve doing the work."
Establishing that schedule is the same sort of cost/benefit and expected-level-of-difficulty analysis that should be done before planning any nontrivial work in Professional Programmer land. All but the most green or most sheltered-from-Process programmers are at least aware of this analysis. Many of those who are aware of it have participated in it.
[0] Or the most well-designed projects, which have small, easily understood pieces with easily-comprehensible interactions with the rest of the system, that can be quickly and easily replaced with new pieces. There's not much out there like that... and I'd imagine the task of "making major changes to how many of those pieces interact with each other" wouldn't be usable in production in a month for most of those systems.
If the schedule is three years, and in the meantime the product being rewritten isn't getting maintenance, the company might as well go ahead and fold and save everyone pain and disappointment. https://www.joelonsoftware.com/2000/04/06/things-you-should-...
If the product is not getting needed maintenance, then yeah, maybe.
But, man, sometimes software is fit-for-purpose and can really be just be left alone for extended periods. Other times, the users of that software upgrade on a hemi-annual or annual schedule (or even LESS frequently), so they'd never notice a three month delay in new releases.
> sometimes software is fit-for-purpose and can really be just be left alone for extended periods
I suppose if the software maker is a functional monopoly in that space and the customers are cursed withe vendor lock-in, sure. I can think of Major Software companies that barely maintain their products yet companies stick with the garbage because what choice do they have?
> I suppose if the software maker is a functional monopoly in that space and the customers are cursed withe vendor lock-in, sure.
Given this statement, it might surprise you to learn that there are folks who aren't keen on software development for its own sake, and really rather dislike having to re-learn how to use their tools when those tools get significantly changed with no clear benefit to those users.
See also: the folks who are perfectly happily using "ancient" versions of -say- Photoshop, or WordPerfect, or Word or...
literally what I wanted to do as an opinionated junior
This is the way. You absolutely can turn shit codebases into something nicer and faster, and this is best done by people who know the system, and maybe even more important, knows the operational context the system exists in.
I once came into an old codebase like this as a junior, thinking I can start again. And I was gently but firmly told by my boss that it wouldn't work, this software is crucial to operations that support all our revenue, and while it can be improved it has to keep working. And we improved the hell out of it.
Am I naive for thinking that nothing like that should take as long as 6-9 months in the happy case and that it's absurd for it to not succeed at all?
You know so little about the team, the organisation, the codebase, other concurrent obligations (e.g. prod support), and the way the project is run. The only way I can imagine one having confidence in a statement like "nothing should take that long" is naïveté.
Then maybe 18 months wasn't too long and they should have been given more time. But seriously?
It really depends. Honestly 6-9 months would have been an optimistic estimate even if it were 2-4 devs intimately familiar with the existing codebase. Permissions is a very cross-cutting concern, as you might imagine, and touched a huge amount of the monolith code. A big problem was that permissions checks weren't done in a consistent layer, instead scattered all over the place, and the team responsible for the rewrite, being new to the code, was starting from scratch and finding these landmines as they went. Scoping was also unclear and kept changing as the project went along, at first to pull in more scope that was peripherally related, then to push stuff out of scope as the project went off track. And during all these changes you have to keep the existing auth system working with zero downtime.
The devs were good developers, too! Two people on the team went off to Google after, so it's not like this was due to total incompetence or anything; more just overambition and lack of familiarity with working on legacy code.
Maybe. There's a lot of dragons hidden inside enterprise code. Only if you know all of them can you really succeed the first time around.
At a large enterprise, 6-9 months is blazingly fast.
Everything takes longer than you think and this sounds like it involves at least 2 teams (the php team and the scalar team). Every team you include increases time line factorially in the best case.
It takes a lot of time to have meetings with a dozen managers to argue over priority and whatever. Especially since their schedules are full of other arguments already
authorization and access control is an awfully difficult problem, as soon as you derive from user defined ACLs on all persisted objects and user/groups based granting data. Each access can have an arbitrary rule that must be evaluated with all dependant data, that will end up being anything in the persisted data. How to you make this long rule list maintainable, without redundancy, ensuring that changing re-used rules won't introduce regressions on all call sites ?
> Am I naive for thinking that nothing like that should take as long as 6-9 months in the happy case and that it's absurd for it to not succeed at all?
Bluntly, yes. And so is every other reply to you that says "no this isn't naive", or "there's no reason this project shouldn't have finished". All that means is that you've not seen a truly "enterprise" codebase that may be bringing in tons of business value, but whose internals are a true human centipede of bad practices and organic tendrils of doing things the wrong way.
> whose internals are a true human centipede of bad practices and organic tendrils of doing things the wrong way
Currently there. On one hand: lot of old code which looks horrible (the "just put comments there in case we need it later" pattern is everywhere). Hidden scripts and ETL tasks on forgotten servers, "API" (or more often files sent to some FTP) used by one or two clients but it's been working for more than a decade so no changing that. On the other: it feels like doing archeology, learning why things are how they are (politics, priority changes over the years). And when you finally ship something helping the business with an easier to use UI you know the effort was not for nothing.
If you find me any resources to build access control on arbitrary (I mean it, arbitrary) rules the right way, I would be very very (very) glad.
I think that's just called "code".
Authz can make the most otherwise trivial features into a depressing journey in the valley of edge cases.
Only in the sense that you seem to not understand how terrible big companies, big teams and big code bases are for efficiency and productivity. It pushes the bar for reaching desired results way up, and the time it takes to get there even more. No one should ever want to be part of organizations, teams or code bases like this, for their own sake.
No, you’re not naive. If it’s done by one-two people that know what they’re doing, it should be done much faster.
If it’s a big new team that doesn’t know what they’re doing, working separately from existing codebase, with lots of meetings… I see no reason why it would finish at all.
>Am I naive for thinking
Yes.
In my experience with very large codebases, a common problem is devs trying to improve random things.
This is well intentioned. But in a large old codebase finding things to improve is trivial - there are thousands of them. Finding and judging which things to improve that will actually have a real positive impact is the real skill.
The terminal case of this is developers who in the midst of another task try improve one little bit but pulling on that thread leads to them attempting bigger and bigger fixes that are never completed.
Knowing what to fix and when to stop is invaluable.
Which can lead to trying to rewrite Netscape Navigator from scratch and killing the company:
https://www.joelonsoftware.com/2000/04/06/things-you-should-...
> common problem is devs trying to improve random things.
Been there, been guilty of that at the tail end of my working life. In my case, looking back, I think it was a sign of burnout and frustration at not being able to persuade people to make the larger changes that I felt were necessary.
Do you think boyscouting, "leave it better than you found it" is misguided as well?
I always took it as "leave it better than you found it" across the files that I've been working on (with some freedom as long I'm on schedule). My focus is to address the ticket I'm working on. Larger improvements and refactorings get ticketed separately (and yes, we do allocate time for them). In other words, I don't think it's misguided.
I do not believe in "boyscouting". I think if you want to leave it better, make a ticket and do it later. Tacking it on to your already planned work is outside the scope of your original intent. This will impact your team's ability to understand and review your changes. Your codebase is unlikely to be optimized for your whimsy. Worse though is when a reviewer suggests boyscouting.
I've seen too many needless errors after someone happened to "fix a tiny little thing" and then fail to deliver their original task and further distract others trying to resolve the mistake. I believe clear intention and communication are paramount. If I want to make something better, I prefer to file a ticket and do it with intention.
Boyscouting works because you don’t need to get permission to fix tech debt when it is bundled with something else. 98% of those tickets you file to fix warts will never be addressed because the business demands that time is spent on features that make money.
Isn't the point of OP of this thread that most of those wart-fixes are pointless?
I tend to agree, if you can't sell it as a ticket you probably shouldn't work on it. And "boyscout" PRs are pain to review.
> Single-digit million lines of code (~5M, let’s say)
> Somewhere between 100 and 1000 engineers working on the same codebase
> The first working version of the codebase is at least ten years old
That's 5,000 to 50,000 lines of code per engineer. Not understaffed. A worse problem is when you have that much code, but fewer people. Too few people for there to be someone who understands each part, and the original authors are long gone. Doing anything requires reverse engineering something. Learning the code base is time-consuming. It may be a year before someone new is productive.
Such a job may be a bad career move. You can spend a decade learning a one-off system, gaining skills useless in any other environment. Then it's hard to change jobs. Your resume has none of the current buzzwords. This helps the employer to keep salaries down.
> A worse problem is when you have that much code, but fewer people. Too few people for there to be someone who understands each part, and the original authors are long gone.
Maybe.
I spent most of my career at a small mom and pop shop where we had single-digit MLOC spanning 20-25 years but only 15-20 engineers working at any given time. This wasn't a problem, though, because turn-over was extremely low (the range in engineer count was mostly due to internships), so many of the original code owners were still around, and we spent some effort to spread out code ownership such that virtually all parts were well understood by at least 2 people at any given moment.
If anything, I rather shudder at the thought of working somewhere that only has ~5M lines of code split up amongst 100 (and especially 1000) engineers over a span of 10 years. I can't imagine producing only 5-50 KLOC over that time, even despite often engaging in behind-the-scenes competition with colleagues over who could produce pull requests with the least (and especially negative) net LOC.
> Your resume has none of the current buzzwords.
That's one of my bigger pet peeves about software development, actually.
While you probably didn't mean it this way, over the years, I encountered a number of people who'd consistently attempt to insert fad technologies for the sake of doing so, rather than because they actually provided any kind of concrete benefit. Quite the contrary: they often complicated code without any benefit whatsoever. My more experienced colleagues snidely referred to it as resume-driven development.
I can't hate people doing this too much, though, because our industry incentivizes job hopping.
Being able to navigate and reverse engineer undocumented legacy code in a non-modern stack is a skill set in and of itself. Most people don't enjoy it in the slightest, so being one of the few devs who does means that I have been able to take on the gnarly legacy problems nobody else will touch. It might not build buzzwords on my resume, which does limit using this particular aspect of dev work to get an initial call back on jobs. But it absolutely exposes me to a variety of ways of doing things and expands my skills in new directions, and that expanded perspective does help in interviews.
You lost me on how this helps employers keep salaries down. My value is greater by being able to do such things, not less. If I can work on modern stacks, legacy stacks, enterprise platforms, and am willing to learn whatever weird old tech you have, that does not decrease my salary.
This. So much this.
> Being able to navigate and reverse engineer undocumented legacy code in a non-modern stack is a skill set in and of itself.
And I find that it's a pretty rare skill to find.
LOC is 'not a good' metric to 'you should be able to understand a codebase'. In either scenario, too many people or too few people, or (my favorite) 'not enough' (whatever that means). Mythical Man-Month comes to mind. What I think you're trying to get at is you need skill to reverse engineer software. And even if you have that skill it takes time (how much?). We work in a multifaceted industry and companies need to build today. At any given project, the probabilities are small that there is a dev who has the skill. We all know 'they can do it/they can learn on the job/they'll figure it out'. And then OP's observation comes into fruition.
I've worked in codebases like this and disagree. Consistency isn't the most important, making your little corner of the codebase nicer than the rest of it is fine, actually, and dependencies are great - especially as they're the easiest way to delete code (the article is right about the importance of that). What's sometimes called the "lava layer anti-pattern" is actually a perfectly good way of working, that tends to result in better systems than trying to maintain consistency. As Wall says, the three cardinal virtues of a programmer are laziness, impatience, and hubris; if you don't believe you can make this system better then why would you even be working on it?
Also if the system was actually capable of maintaining consistency then it would never have got that large in the first place. No-one's actual business problem takes 5M lines of code to describe, those 5M lines are mostly copy-paste "patterns" and repeated attempts to reimplement the same thing.
Pulling in lots of dependencies will eventually grind progress on features to a halt as you spend more and more time patching and deploying vulnerabilities. The probability of seeing new vulnerabilities I believe is pretty much linear in the number of dependencies you have.
As opposed to your in-house code which is vulnerability free?
The issue isn’t vulnerability's, it’s dependency hell where all your packages are constantly fighting each other for specific versions. Although some languages handle this better than others.
In house code could very well have many fewer vulnerabilities, as you only write exactly the functionality you need, vs pulling a large dependency and only using a small percentage of the API.
> pulling a large dependency and only using a small percentage of the API.
This is normally a direct result of trying to limit the number of dependencies. People are much more able to use small, focused dependencies that solve specific problems well if you have a policy that permits large numbers of dependencies.
> No-one's actual business problem takes 5M lines of code to describe, those 5M lines are mostly copy-paste "patterns" and repeated attempts to reimplement the same thing.
I'm pretty sure this is trivially untrue. Any OS is probably more than 5M lines (Linux - 27.8 lines according to a random Google Search). Facebook is probably more lines of code. Etc.
> Any OS is probably more than 5M lines (Linux - 27.8 lines according to a random Google Search).
Linux is notoriously fragmented/duplicative, and an OS isn't the solution to anyone's actual business problem. A well-factored solution to a specific problem would be much smaller, compare e.g. QNX.
> Facebook is probably more lines of code.
IIRC Facebook is the last non-monorepo holdout among the giants, they genuinely split up their codebase and have parts that operate independently.
Does Facebook have more than 5M lines of code now? I'm sure they do. Does that actually result in a better product than when it was less than 5M lines of code? Ehhhh. Certainly if we're talking about where the revenue is being generated, as the article wants to, then I suspect at least 80% of it is generated by the <5M that were written first.
So I mean yeah, on some level solving the business problem can take as many lines as you want it to, because it's always possible to add some special case enhancement for some edge case that takes more lines. But if you just keep growing the codebase until it's unprofitable then that's not actually particularly valuable code and it's not very nice to work on either.
I'm fairly sure Word, Excel, Google Sheets, Youtube, Photoshop, etc. all have fairly high counts.
As do many tens of thousands of applications that are the backbone of services we all rely on. The systems that run banks, that run power plants, the routers that make up the backbone of the internet, etc.
Again, I agree with some of the spirit of what you're saying... but there's also a tendency of many developers (like myself) to only think of shiny new products, or to only think about the surface-level details of most business problems. You write:
> So I mean yeah, on some level solving the business problem can take as many lines as you want it to, because it's always possible to add some special case enhancement for some edge case that takes more lines. But if you just keep growing the codebase until it's unprofitable then that's not actually particularly valuable code and it's not very nice to work on either.
I think this misunderstands how the companies that have stayed in business for so long have done so. Excel is the software we all use every day because it kept adding more and more features, stealing the best ideas from new products that tried to innovate. It's still doing so, though obviously to a lesser extent.
> I think this misunderstands how the companies that have stayed in business for so long have done so. Excel is the software we all use every day because it kept adding more and more features, stealing the best ideas from new products that tried to innovate.
Not convinced. I think a lot of companies keep adding features because they don't know how to do anything else - adding new features is how managers get promoted, so it's what devs get rewarded for, so it keeps happening even as the RoI drops lower and lower (and in many cases eventually goes negative - but this is masked because the core idea was good enough then the product as a whole is still profitable). At "best" a bunch of esoteric features act as a de facto moat that can shut out the competition in a tickbox-feature-comparison, rather than being something that's actually adding value in day-to-day use.
Yes, I understand the sentiment and am familiar with this argument. And sometimes it's totally true! But I'm pushing back because I think it's sometimes not true, and that this is more an "anti-growth" or "here's-how-real-devs-work" talking point that is sometimes applied incorrectly.
E.g. how far are you taking this? Excel was released in 1985. Do you think after 5 years there was no more business value? VBA, allowing scripting, wasn't released until 1993. Do you think think Excel circa 2000 is as good as Excel today is? I'm sure it's roughly similar, but I'm just as sure there are many features that I would miss.
And that's not even getting to the fact that Excel lost a ton of marketshare to Google Sheets, because it was too late to adopt what is Sheet's biggest feature - collaborative editing. I'm sure in 2005 you could've made the case that Excel already has all the business value it needs, and trying to add something like collaborative editing is just a corporate waste of time, a totally "esoteric" feature that no one really needs and doesn't provide value, and is only there to get managers promoted or to get devs working on something "cool". Yet arguably it was a critical thing they needed to get done and didn't.
Anyway, I'm sure you're right a lot of the time. I just think blankly applying this statement is very wrong, when real people in the real world are sometimes working on systems, 10, 20 or sometimes even 50 years old.
You're talking about a couple of general, powerful features, which is exactly the kind of thing that doesn't take 5 million lines of code to implement. Collaborative editing is definitely worthwhile. Integrating a scripting language is probably worthwhile. The only way you get to a huge ball of mud codebase (other than by being bad at programming) is by having lots of things that really are isolated special cases like, IDK, one more obscure statistical distribution that you can sample from, or one more date format, or one more connector integration.
Do important general features start out looking like this kind of esoteric edge case? I don't think so, though I'm not hugely confident. I think it should've been clear in 2005 that enabling collaborative editing would have been a fundamental shift that required reworking the core of Excel, not something that a single internal team could add as a checkbox feature. I think that's probably part of why it didn't happen.
There's definitely some business value from "supports everything", and in a messy human world that can be very complex - just things like knowing the business day calendar for every country take space in code, and if you're writing a piece of software that commits to integrating with everything else then the number of integrations you have to write is more a function of politics than of engineering. But I think that part is very much on the long tail, the "last 20%".
To answer the specific question, yes, I do think Excel 2000 is as good as Excel today is. I honestly can't think of a single positive new feature (automatically suggesting tables seems cool, but I don't trust it), and the changes I did notice are mainly the ones that got in my way like the "ribbon" menu. I'm sure some businesses are getting some extra value out of today's Excel, but only in long-tail stuff.
I don't blame people for working on the long-tail things, I've done plenty of it myself. It pays the bills, it can even be good engineering. But I don't think the article is right to imply that it's where most of the business value comes from.
Everyone wants to be a greenfield developer - even if they don't realize they are in a brown field.
I work on a 5M+ line code base. It's not copy/paste or the same problems solved in different ways. It's a huge website with over 1K pages that does many, many things our various clients need.
>making your little corner of the codebase nicer than the rest of it is fine, actually
As TFA points out, you might find out that you've made your little corner worse, actually.
I don't think the examples add up. Like, yes, if the system has a way to do auth then you should use it. But if the system's way of doing auth has nasty surprises, rather than cargo-cult the workaround to those nasty surprises, you should fix them! (Especially if the article wants to argue that adding inconsistency is bad - then by the same token removing the inconsistency that someone added before you is good). And if the system has its own custom auth implementation that does the same thing as a standard library, you should probably pull it out completely and replace it with that standard library.
That's the wrong model for the example. Some nasty surprises are derived from real world complexity. Imagine something like a timezone library.
Also:
>you should fix them!
OK, great! See you in 10 years when you've fixed everything and are ready to start implementing stakeholder requirements.
The "The cardinal mistake is inconsistency" is 100% true. We used to call the guiding philosophy of working in these codebases "When in Rome".
I have this bad codebase at work. Really bad. One of the things I’ve been working on for the past two years is making it consistent. I’m almost at the point where interfaces can be left alone and internals rewrites in a consistent style.
People often ask why I hardly ever have any prod issues (zero so far this year). This is part of the reason. Having consistent codebases that are written in a specific style and implement things in similar manner.
Some codebases make me feel like I’m reading a book in multiple languages …
> People often ask why I hardly ever have any prod issues (zero so far this year).
It also helps that we're still only in January!
Bahh thanks for the chuckle. The man is 7/7 as of today!
> zero so far this year
I saw what you did there.
Maybe that’s not even that bad if number of issues went down from multiple a day to none in a couple of days.
> Some codebases make me feel like I’m reading a book in multiple languages …
In most cases the codebase does consist of muliple languges.
I don't like this philosophy as it often leads to stagnation in patterns and ways of working that seep into newer systems. "That's not how we do things here" becomes a common criticism, resulting in systems and services that share the same flaws and trade-offs, making progress difficult.
Engineers often adhere too rigidly to these principles rather than taking a pragmatic approach that balances existing practices with future improvements.
>improvements
Therein lies the rub. Everyone has a different idea of what is an improvement in a codebase. Unless there's some performance or security concern, I'd much rather work in an "old" style codebase that's consistent than a continually partially updated codebase by multiple engineers with different opinions on what an "improvement" is.
> Everyone has a different idea of what is an improvement in a codebase
Yes, and consistency is the tie-breaker. So the status quo remains, and improvements aren't made.
I completely agree with this.
> I don't like this philosophy as it often leads to stagnation in patterns and ways of working that seep into newer systems.
The rule isn't "don't introduce change", it's "be consistent". Using the example from the post, if you want to use a different method of doing auth that simpler the "be consistent" rule means you must change the way auth is done everywhere.
Interestingly, if you do that the negatives he lists go away. For example, if the global auth mechanism handles bots specially, you will learn that if you are forced to change it everywhere.
What's bad is code exhibiting multiple fragmentary inconsistencies, and no plan or effort exists to bring older code up to match the new patterns. An example I was closely involved with: A java programmer who wrote a plethora of new code in a pure functional paradigm, scattering Vavr library uses all over it. The existing code was built on Dropwizard and any experienced Java programmer could rapidly get comfortable with it. The difference between the existing code and the new was jarring (sorry for the pun) to say the least, and I wonder if, later, the company ever managed to recruit anyone who understood both well enough to maintain the system.
ETA: upon reflection I'd consider that programmer a canonical example of the kinds of mistakes the author covers in the article.
And that's a fair criticism, however, if you change a pattern without changing it everywhere, you now have two patterns to maintain (the article mentions this). And if multiple people come up with multiple patterns, that maintenance debt multiplies.
Progress and improvement is fine, great even, but consistency is more important. If you change a pattern, change it everywhere.
Change it at once everywhere on an existing large codebase? That's going to be one huge PR no one will want to review properly, let alone approve.
Document the old pattern, document the new pattern, discuss, and come up with a piece by piece plan that is easy to ship and easy to revert if you do screw things up.
Unless the old pattern is insecure or burns your servers, that is.
I don’t think you and the comment you are replying to are in conflict. Documenting and rolling it out piecemeal is the correct way to make a large change.
I think the point is, either actually commit to doing that, or don’t introduce the new pattern.
Yes, it usually can't be done with one massive checkin. First get buy-in that the old pattern will be changed to the new pattern (hopefully you won't have some senior developer who likes it the old way and fights you), then as you say, come up with a plan to get it done.
The down side to this that I've experienced more than once, though, is incomplete conversions: we thought we had agreement that the change should be done, it turns out to be more difficult than planned, it gets partially completed and then management has a new fire for us to fight, resources are taken away, so you still have two or more ways of doing things.
If things are consistent enough, tools like open rewrite can be used. I’ve seen reviews where it is the recipe for generating the code transformation that gets reviewed, not the thousands of spots where the transform was applied.
Naturally unless it's trivial do it in steps, but committing to doing the whole migration as quickly as prudent or rolling it back completely is key.
Yep when I have to work on old code I find something in the existing code that's close to what I want to do, and copy/paste it. I do not try to abstract it into a common function, unless that's already been done and can be used verbatim.
You don't know the 10 years of reasons behind why the code is the way it is, and the safest thing is to stay as close as possible to how the existing code is written, both to avoid landmines, and so that future you (or someone else) has one less peculiar style they have to figure out.
All that said, the more usual case is that the code is already a huge mess of different styles, because the 100 different developers who have touched it before you didn't follow this advice.
Do you ever find yourself taking an existing function and adding another parameter to it? The benefit is that you don’t break existing code. The problem is that the function is now more complicated and likely now does more than one thing because the extra parameter is effectively a mode switch.
The only time I abstract something into a function even if it’s only used in one place, is if the generic case is as easier or easier than the code for the specific case.
But that’s a judgment call based on experience.
"When is Rome" is good, might use that.
My old boss used to say: "Be a chameleon. I don't want to know that I didn't write this."
How do you tackle the case where the codebase is consistent in a bad way, like pervasive use of antipatterns that make code difficult to change or to reason about? If you want to improve that, you have to start somewhere. Of course, Chesterton’s Fence applies.
Or when people keep the old pattern because there's a "higher priority".
I've worked on a project with absolutely terrible duplication of deserialisers of models, each one slightly different even those most properties were named the same and should've been handled the same. But we can't touch anything because important things are happening in business and we can't risk anything. The ignored part was that this led to bugs and extreme confusion from new people. They were even too worried to accept a repo-wide whitespace normalisation.
In the past I have performed needed refactorings as part of new feature development, without asking permissions. Even though my boss at the time said “don’t make it pretty, just make it work.”
Of course, I knew that writing the code the “pretty”, more maintainable, easier to understand way wouldn’t take any longer to write, and might take less time as the refactoring requires less new code overall.
But I didn’t bother explaining all that. Just nodded then went and implemented it the best way as I was the experienced software engineer.
I have three maxims that basically power all my decisions as an engineer:
1. The three C’s: Clarity always, Consistency with determination, Concision when prudent. 2. Keep the pain in the right place. 3. Fight entropy!
So in the context of the main example in this article, I would say you can try to improve clarity by e.g. wrapping the existing auth code in something that looks nicer in the context of your new endpoint but try very hard to stay consistent for all the great reasons the article gives.
It's got a nice ring to it :)
I don't have a real critique because I don't have that many years in a codebase the size of OP (just 2). But I struggle with the advice to not try and make a clean section of the code base that doesn't depend on the rest of the application.
Isn't part of good engineering trying to reduce your dependencies, even on yourself? In a latter part of the post, OP says to be careful tweaking existing code, because it can have unforeseen consequences. Isn't this the problem that having deep vertical slices of functionality tries to solve? High cohesion in that related code is grouped together, and low coupling in that you can add new code to your feature or modify it without worrying about breaking everyone else's code.
Does this high cohesion and low coupling just not really work at the scale that OP is talking about?
It's one thing to reduce dependency and another to have reduced consistency. If you have 10 web routes and 1 behaves differently, it doesn't matter if the code is cross coupled or not, it matters if it behaves similarly. Does it return the same status codes on error? Does it always return JSON with error messages inside? Do you auth the same way? The implementation can be wholly separate but end users will notice because logic on their side now has to special-case your 11th endpoint because you returned HTTP 20x instead of 40x on error. Or when you realize that you want to refactor the code to DRY it (Don't Repeat Yourself), now you can't reduce all the duplication because you have bespoke parts.
I think the gist of it is humility: as a newcomer, you don't really know what's out there and why, and there are often good reasons for things to be how they are. Not always, but often enough for avoiding being too original to be favored. This doesn't imply relinquishing on "good engineering habits" either.
Now, once you have a deeper understanding of the codebase, you'll know when and why to break away from existing patterns, but in the beginning phase, it's a good habit to start by learning carefully how things are designed and why.
Consistency makes code predictable and reduces mental overhead. It doesn't mean you have to write it poorly like the rest of the codebase, but it does mean using the same general practices as the rest of the codebase. Think of it like using knockoff legos vs the real thing. They both work interchangeably which makes it easy to use them together, but you'd prefer to use the nicer lego pieces as much as possible in your builds because the material is higher quality, tighter tolerances, just overall works better even if it's the same shape as the knockoff pieces.
These are two different concepts though; reducing dependencies is good, but you can have minimal dependencies AND have the code look / feel / behave like the rest of the codebase. Always remember, it's not your code. Assume someone else will need to read / maintain it. Thousands might. You might have made the perfect section of code, then get an offer you can't refuse or get hit by a bus.
Nope, you've got it.
Code-consistency is a property just like any other property, e.g. correctness, efficiency, testability, modifiability, verifiability, platform-agnosticism. Does it beat any of the examples I happened to list? Not a chance.
> worrying about breaking everyone else's code
You already said it, but just to expand: if you already have feature A, you might succeed in plumbing feature B through feature A's guts. And so on with feature C and D. But now you can't change any of them in isolation. When you try to fix up the plumbing, you'll now break 4 features at once.
A little buried since the overarching focus is on consistency but I found these two paragraphs from the blog post really relevant points as well:
"You need to develop a good sense of how the service is used in practice (i.e. by users). Which endpoints are hit the most often? Which endpoints are the most crucial (i.e. are used by paying customers and cannot gracefully degrade)? What latency guarantees must the service obey, and what code gets run in the hot paths? One common large-codebase mistake is to make a “tiny tweak” that is unexpectedly in the hot path for a crucial flow, and thus causes a big problem.
You can’t rely on your ability to test the code in development like you can in a small project. Any large project accumulates state over time (for instance, how many kinds of user do you think GMail supports?) At a certain point, you can’t test every combination of states, even with automation. Instead, you have to test the crucial paths, code defensively, and rely on slow rollouts and monitoring to catch problems."
I never really understood putting consistency on a pedestal. It's certainly nice when everything operates exactly the same way - but consistency for consistency's sake is awful to work in too. If a team realizes that logging library B is better than library A, and but NEVER switches from A to B because of consistency concerns, then in two years they'll still all be using inferior tools and writing worse code. Similarly, if a team DOES decide to switch from A to B, they probably shouldn't spend months rewriting all previous code to use the new tool. It's ok for multiple established patterns to live in the same codebase, so long as everyone has an understanding of what the "correct" pattern should be for all new code.
The consistency that they're referring to specifically is to do with consistency in the way that certain features or functionality is implemented.
To make your example match, it would be more so that there are two teams A and B, Team A already created a framework and integration for logging across the entire application. Team B comes along and doesn't realize that this framework exists, and also invents their own framework and integration for logging.
This is the type of consistency that the author points to, because Team B could have looked at other code already referencing and depending on the logging framework from Team A and they would have avoided the need to create their own.
It's about minimizing cognitive load.
"Consistency for consistency's sake" is usually a misinterpretation of "Consistency because there are reasons for the way things are already done and you don't understand those reasons well enough to diverge". If you understand the current system completely, then you can understand when to diverge from the system (though this is usually better expressed as "when to improve the system" rather than doing something completely new). If you don't understand the current system, then you can't possibly ensure that you haven't missed something in your shiny new way of doing things.
Sometimes the right approach is to keep the consistency. Other times, that approach is either impossible or catastrophic.
IMO software development is so diverse and complex that universal truths are very very rare.
But to us programmers, anything that promises to simplify the neverending complexity is very tempting. We want to believe!
So we're often the equivalent of Mike Tyson reading a book by Tiger Woods as we look down a half-pipe for the first time. We've won before and read books by other winners, now we're surely ready for anything!
Which leads to relational data stored in couchDB, datalayers reimplemented as microservices, simple static sites hosted in kubernetes clusters, spending more time rewriting tests than new features, and so on.
IMO, most advice in software development should be presented as "here's a thing that might work sometimes".
> Single-digit million lines of code (~5M, let’s say)
> Somewhere between 100 and 1000 engineers working on the same codebase
> The first working version of the codebase is at least ten years old
> The cardinal mistake is inconsistency
Funny enough, the author notes the problem on why consistency is impossible in such a project and the proceeds to call it the cardinal mistake.
You cannot be consistent in a project of that size and scope. Full stop. Half those engineers will statistically be below average and constantly dragging the codebase towards their skill level each time they make a change. Technology changes a lot in ten years, people like to use new language features and frameworks.
And the final nail in the coffin: the limits of human cognition. To be consistent you must keep the standards in working memory. Do you think this is possible when the entire project is over a million LOC? Don't be silly.
There's a reason why big projects will always be big balls of mud. Embrace it. http://www.laputan.org/mud/
> people like to use new language features and frameworks.
Have of the point of this article is that people need to suck it up and not use new frameworks sometimes...
There are times for coding in a way you, personally, find pleasing; and there are times when:
> So you should know how to work in the “legacy mess” because that’s what your company actually does. Good engineering or not, it’s your job.
A quote from the 'big ball of mud':
> Sometimes it’s just easier to throw a system away, and start over.
It is easier, but it's also a) not your decision and b) enormously disruptive and expensive.
How do you tell if you're in the 'naive and enthusiastic but misguided' camp, or in the 'understand the costs and benefits and it's worth a rewrite' camp?
Maybe the takeaway from the OP's post really should be this one:
> If you work at a big tech company and don’t think this is true, maybe you’re right, but I’ll only take that opinion seriously if you’re deeply familiar with the large established codebase you think isn’t providing value.
^ because this is the heart of it.
If you don't understand, or haven't bothered to understand, or haven't spent the time understanding what is already there, then you are not qualified to make large scale decisions about changing it.
I've successfully pulled off most of such a majority re-write but a key driver - but not the only was that the legacy language the existing system was implemented in had lost virtually all traction in the local and global market. Mostly only expensive contractors coming out of pension availabile and on top of that the custom libraries required us to recruit the 10 percent of that segment. Any new hires straight up refused to pick it up as they accurately deemed it career suicide.
"Coding defensively" is perhaps the understatement of the year. Good software architecture is, in my opinion, the single most powerful tool you have to keep things from becoming an unmanageable mess.
If I could give one "defensive coding" tip, it would be for seniors doing the design to put in road blocks and make examples that prevent components from falling for common traps (interdependency, highly variant state, complex conditions, backwards-incompatibility, tight coupling, large scope, inter-dependent models, etc) so that humans don't have to remember to avoid those things. Make a list of things your team should never do and make them have a conversation with a senior if they want to do it anyway. Road blocks are good when they're blocking the way to the bad.
Starting with good design leads to continuing to follow good design. Not starting with good design leads to years of pain. Spend a lot more time on the design than you think you should.
Wrong, wrong. Opposite of everything he said. All his examples are backwards. The article is basically inversing the Single Responsibility Principle.
First of all, consistency does not matter at all, ever. THat's his main thesis so it's already wrong. Furthermore, all his examples are backwards. If you didn't know the existence of "bot" users, you probably don't want your new auth mechanism to support them. Otherwise, the "nasty surprise" is the inverse of what he said: not that you find you don't support bot users, but you find out that you do.
Build stuff that does exactly what you want it to do, nothing more. This means doing the opposite of what he said. Do not re-use legacy code with overloaded meanings.
> First of all, consistency does not matter at all, ever. THat's his main thesis so it's already wrong.
Can you say more about this? Because I strongly disagree with your assertion.
> Build stuff that does exactly what you want it to do, nothing more
This is also confusing to me. In a multi-million line codebase, it's extremely difficult to find an actual place where you have zero side effects with ANYTHING you write.
Wrong. If code is written consistently everywhere, that allows any dev to dive in anywhere to get work done. Which is what you often have to do in large code bases to make cross functional updates.
Code bases where devs pick a different framework or library for every little thing are a nightmare to maintain. Agreed on standards is what gets your team out of the weeds to work on a higher and more productive level.
I have been working on a code base that is now 14 year old for many years (almost since the beginning), and is now well over 1M LoC of typescript (for Nodejs) - we are only 20-30 engineers working on it, rather than the 100-1000 suggested on the article. And I can say I couldn't agree more with the article.
If you have to work on such projects, there are two things to keep in mind: consistency and integration tests.
> integration tests
Yes. I remember working in a 700,000+ line PHP code base that around 30% unit test coverage and an unknown percentage of e2e test coverage. I kept my changes very localised because it was a minefield.
Also, the unit tests didn't do teardown so adding a new unit test required you to slot it in with assertions accounting for the state of all tests run so far.
I really liked this: "as a general rule, large established codebases produce 90% of the value."
People see the ugliness -- because solving real problems, especially if business practices are involved, is often very messy -- but that's where the value is.
I also find amusing that “legacy” more often than not gets used in negative conotation. I hear “legacy” and I think “bunch of people wrote some AWESOME shit that lasted so long that now other people get to view it as ‘legacy’”
There's a good chance that's not what people mean by this term though.
It's probably used in the (now) classic sense as defined by M. Feathers in his "Working with legacy code" book.
Code that is old but otherwise awesome, maintainable (or even actively maintained) and easy / a joy to work with are rarely referred to as "legacy code".
It doesn’t seem awesome at first glance because it takes longer to get up to speed on a large, old code base than a small, young one.
But you will quickly learn how awesome the old code base is if you attempt to rewrite it, and realize everything the old code base takes into account.
hmmm almost 3 decades in the industry and have very seldom (some exceptions) heard “legacy” for code that is old but awesome.
Measuring value by what stays the longer is tempting, but sometimes it's just that the mess is such that no one can touch it :)
100% but mess or not - it works - otherwise you’d have no option but to touch it
Sometimes it doesn't work and no one can tell.
Sometimes it doesn't work, but fixing the mess is too expensive and we don't have time
how can that be possible?
In my experience people use it to mean "old crap that we can't get rid of (yet)".
Earning code trumps pretty code every time.
A big part of this advice boils down to the old adage: "Don't remove a fence if you don't know why it was put there." In other words, when making changes, make sure you preserve every behavior of the old code, even things that seem unnecessary or counter-intuitive.
Chesterton's Fence
Consistency is often helpful, but you also need to be wary of cargo culting. For example, you see a server back end that uses an ORM model and you figure you'll implement your new feature using the same patterns you see there. Then a month later the author of the original code you cribbed comes by and asks you, "just out of curiosity, why did you feel the need to create five new database tables for your feature?"
I know, that's a pretty specific "hypothetical," but that experience taught me that copying for the sake of consistency only works if you actually understand what it is you're copying. And I was also lucky that the senior engineer was nice about it.
>The other reason is that you cannot split up a large established codebase without first understanding it. I have seen large codebases successfully split up, but I have never seen that done by a team that wasn’t already fluent at shipping features inside the large codebase
I cannot resonate with this. Having worked with multiple large code bases 5M+, splitting the codebase is usually a reflection of org structure and bifurcation of domain within eng orgs. While it may seem convoluted at first, its certainly doable and gets easier as you progress along. Also, code migrations of this magnitude is usually carried out by core platform oriented teams, that rarely ship customer-facing features.
Another post from the same author puts this in an interesting context: https://www.seangoedecke.com/glue-work-considered-harmful/ (follow up: https://www.seangoedecke.com/cynicism/)
Keeping the code base tidy is glue work, so you should only do enough of it to ship features. So maybe these are not "mistakes" but rather tactical choices made by politically smart engineers focused on shipping features.
Following management instructions against your better judgement is hardly uncynical - unless it's coming from a place of such naivety that you actually think you're wrong and they're right, and I don't think it is.
My experience is the opposite of the author's: in terms of their revealed preferences, line workers care far more about the company and its customers than managers and executives do, precisely because it's far easier for the latter to fail upwards than the former.
I'm now working on a codebase which is quite large (13 micro-services required to run the main product); all containerized to run on Kubernetes. The learning curve was quite steep but luckily, I was familiar with most of the tech so that made it easier (I guess that's why they hired me). The project has been around for over 10 years so it has a lot of legacy code and different repos have different code styles, engine versions and compatibility requirements.
The biggest challenge is that it used to be maintained by a large team and now there are just 2 developers. Also, the dev environment isn't fully automated so it takes like 20 minutes just to launch all the services locally for development. The pace of work means that automating this hasn't been a priority.
It's a weird experience working on such project because I know for a fact that it would be possible to create the entire project from scratch using only 1 to 3 services max and we would get much better performance, reliability, maintainability etc... But the company wouldn't be willing to foot the cost of a refactor so we have to move at steady snail's pace. The slow pace is because of the point mentioned in the article; the systems are all intertwined and you need to understand how they integrate with one another in order to make any change.
It's very common that something works locally but doesn't work when deployed to staging because things are complicated on the infrastructure side with firewall rules, integration with third-party services, build process, etc... Also, because there are so many repos with different coding styles and build requirements, it's hard to keep track of everything because some bug fixes or features I implement touch on like 4 different repos at the same time and because deployment isn't fully automated, it creates a lot of room for error... Common issues include forgetting to push one's changes or forgetting to make a PR on one of the repos. Or sometimes the PR for one of the repos was merged but not deployed... Or there was a config or build issue with one of the repos that was missed because it contained some code which did not meet the compatibility requirements of that repo...
If you can’t change, test and deploy each service independently from the others, you don’t really have separate services. You just have a Ball of Mud held together with HTTP calls.
We can test and change them independently, some features just require changing just one repo but most features affect multiple repos. But yes, essentially, it's a ball of mud in our case anyway because the separation between the two most important microservices is unclear (cannot be explained simply, there is no clear logic/responsibility separation besides the fact that one is an older codebase and the other is newer).
I've worked on a lot of projects in my career and this one has one of the most complex/chaotic architectures I've seen yet. Surprisingly, it recovers from service downtimes and reboots pretty well. The main issues are maintainability, deployment and configuration. It's often the case that local env does not match staging when building features.
I'm just thinking about this time at a previous job, I was reviewing a PR and they decided to just find/replace every variable and switch from snake to camel case. I was like "why are you guys doing this, not part of the job". There was some back and forward on that. This is a place where PRs weren't about reviews but just a process to follow, ask someone to approve/not expect feedback.
edit: job = ticket task
For stuff like this, the best thing is to have a style guide. If you disagree with something in the style guide (tabs/spaces, casing, etc.), you make a PR to the style guide and hash it out there. Also, those PRs need updates everywhere to make the codebase agree with the change and updates to relevant linters enforce it.
Outside of that, the style guide is law.
This. There's consistency and there's yak-shaving. We had a guy who spent two months changing tabs to multiple spaces (or vice versa; whichever was in fashion at the time) mostly to get those sweet commits for "productivity" metrics. Yes, our bad for not realizing that sooner, but at some point you have to let inconsistencies go.
What was the established code style (...if any) in that project?
Anyway it doesn't sound like that was a very mature project or developers, not when the reviewer decide to just edit code instead of provide a review.
the old/existing code was all underscore, they wanted to use camelcase instead. it's a dumb thing to be argue about I know but it made the code review harder when instead of 10s of line diffs there's almost a hundred granted easy to see just changing casing
I just insist that style only changes go in a separate commit.
And when it impacts a lot of files, it can break the compiler or introduce bugs. It MUST go in its own PR/MR.
It's also literally not part of the job.
I once worked on a large project in the past where it took 3 days to rename a field in an HTTP response because of how many services and tests were affected. Just getting that through QA was a huge challenge.
Working in a large dev team, focusing on a small feature and having a separate product manager and QA team makes it easier to handle the scale though. Development is very slow but predictable. In my case, the company had low expectations and management knew it would take several months to implement a simple form inside a modal with a couple of tabs and a submit button. They hired contractors (myself included), paying top dollar to do this; for them, the ability to move at a snail's pace was worth it if it provided a strong guarantee that the project would eventually get done. I guess companies above a certain size have a certain expectation of project failure or cancellation so they're not too fussed about timelines or costs.
It's shocking coming from a startup environment where the failure tolerance is 0 and there is huge pressure to deliver on time.
Only 3 days? That's incredible.
Getting a PR reviewed in 3 days is an achievement!
Not sure if you're serious. I can't remember working for any company that took more than a day to review a PR. I think this company took about 1 day to provide QA feedback and I was thinking that it's so slow.
In startup land, I got my code reviewed by the CTO within a few hours. It was rare if it required a whole day like if he was too busy.
In my current company, the other dev usually reviews and merges my code to staging within an hour. Crazy thing is we don't even write tests. A large project with no tests.
In my current job I did a PR in the first week of joining. It was reviewed after exactly 2 years. I had to rewrite the whole PR because of the affected lines had changed. Of course I did not remember at all what is was about.
Some PRs are faster but some are slower as well.
Maybe you meant getting a PR merged. Then 3 days seems possible depending on the approach. At my current company, the team is small so approval process is quite fast and not too fussy.
I was definitely serious.
OP has some particular type of project in mind, where what they say probably makes sense. Not all large codebases are like that.
For example, it could be a lot of individual small projects all sitting on some common framework. Just as an example: I've seen a catering business that had an associated Web site service which worked as follows. There was a small framework that dealt with billing and navigation etc. issues, and a Web site that was developed per customer (couple hundreds shops). These individual sites constituted the bulk of the project, but outside of the calls to the framework shared nothing between them, were developed by different teams, added and removed based on customer wishes etc. So, consistency wasn't a requirement in this scheme.
Similar things happen with gaming portals, where the division is between some underlying (and relatively small) framework and a bunch of games that are provided through it, which are often developed by teams that don't have to talk to each other. But, to the user, it's still a single product.
Debates about the technical merits aside...the alternative to not allowing people to build their own nice little corner of a legacy codebase is not a bunch of devs building a consistent codebase. It's devs not wanting to touch the codebase at all.
Working on an old shitty codebase is one thing. Being told you have to add to the shit is soul crushing.
This is good advice but only it has been followed from the beginning and consistently throughout the development of the original code. It is applicable to large organizations with lots of resources who hire professional developers and have a lot of people who are familiar with the code that are active in code reviews and have some minimum form of documentation / agreement on what the logic flow in the code should look like (the article does not claim otherwise). But I would implore those who work at the 80% of other companies that this advice is nearly useless and YMMV trying to follow it. The one thing that I think is universally good advice is to try and aggressively remove code whenever possible.
Except, the old stuff will be effectively untestable, and they'll demand near perfect coverage for your changes.
Also, they're will be four incomplete refactorings, and people will insist on it matching the latest refactoring attempt. Which, will then turn out to be impossible, as it's too unfinished.
A big problem I come across is also half-assing the improvements.
Taks as an example - for some reason you need to update an internal auth middleware library, or a queue library - say there is a bug, or a flaw in design that means it doesn't behave as expected in some circumstances. All of your services use it.
So someone comes along, patches it, makes the upgrade process difficult / non-trivial, patches the one service they're working on, and then leaves every other caller alone. Maybe they make people aware, maybe they write a ticket saying "update other services", but they don't push to roll out the fixed version in the things they have a responsibility for.
My only advice is "if it ain't broke don't fix it". And if you're going to improve something, make sure it's something small and local, ideally further from the "core logic" of the business.
> If they use some specific set of helpers, you should also use that helper (even if it’s ugly, hard to integrate with, or seems like overkill for your use case). You must resist the urge to make your little corner of the codebase nicer than the rest of it.
This reads like an admission of established/legacy codebases somewhat sucking to work with, in addition to there being a ceiling for how quickly you can iterate, if you do care about consistency.
I don't think that the article is wrong, merely felt like pointing that out - building a new service/codebase that doesn't rely on 10 years old practices or code touched by dozens of developers will often be far more pleasant, especially when the established solution doesn't always have the best DX (like docs that tell you about the 10 abstraction layers needed to get data from an incoming API call through the database and back to the user, and enough tests).
Plus, the more you couple things, the harder it will be to actually change anything, if you don't have enough of the aforementioned test coverage - if I change how auth/DB logic/business rules are processed due to the need for some refactoring to enable new functionality, it might either go well or break in hundreds of places, or worse yet, break in just a few untested places that aren't obvious yet, but might start misbehaving and lead to greater problems down the road. That coupling will turn your hair gray.
> Plus, the more you couple things, the harder it will be to actually change anything, if you don't have enough of the aforementioned test coverage
The author cites some imaginary authentication module where "bot users" are a corner case, and you can imagine how lots of places in the software are going to need to handle authentication at some point
Say you don't use the helper function. Do you think you've avoided coupling?
The thing is, you're already coupled. Even if you don't use it
Fundamentally, at the business level, your code is coupled to the same requirements that the helper function helps to fullfil.
Having a separate implementation won't help if one day the requirements change and we suddenly need authentication for "super-bot" users. You'll now need to add it to two different places.
> Say you don't use the helper function. Do you think you've avoided coupling?
> The thing is, you're already coupled. Even if you don't use it.
In the case of using multiple services, your auth service would need some changes. Now, whether those need to be done in a somewhat small service that's written in Spring Boot, launches in 5 seconds and can be debugged pretty easily due to very well known and predictable surface area, or a 10 year old Spring codebase that's using a bunch of old dependencies, needs a minute or two just to compile and launch and has layers upon layers of abstractions, some of which were badly chosen or implemented, but which you would still all need to utilize to stay consistent, making your changes take 2-5x as long to implement and then still risk missing some stuff along the way... well, that makes a world of difference. Breaking up a codebase that is a million lines big wouldn't make any of the business requirements not be there, but might make managing the particular parts a bit easier.
The impact of both old code and heavily bloated codebases is so great to the point where some people hate Java because a lot of projects out there are enterprise brownfield stuff with hopelessly outdated tech stacks and practices, or like other languages just because they don't have to deal with that sort of thing: https://earthly.dev/blog/brown-green-language/
That's even before you consider it from the egoistic lens of a software dev that might want to ship tangible results quickly and for whom a new service/stack will be a no-brainer, the team lead whose goals might align with that, or even the whole business that would otherwise be surprised why they're slower in regards to shipping new features than most of their competitors. Before even considering how larger systems that try to do everything end up, e.g. the likes of Jira and DB schemas that are full of OTLT and EAV patterns and god knows what else.
If you find a codebase that is pristine, best of luck on keeping it that way. Or if you have to work in a codebase that's... far less pleasant, then good luck on justifying your own time investment in the pursuit of long term stability. Some will, others will view that as a waste, because they'll probably be working in a different company by the time any of those consequences become apparent. Of course, going for a full on microservices setup won't particularly save anyone either, since you'll still have a mess, just of a different kind. I guess that the main factor is whether the code itself is "good" or "bad" at any given point in time (nebulous of a definition as it may be), except in my unfortunate experience most of it is somewhere in the middle, leaning towards bad.
Writing code that is easy to throw away and replace, in addition to being as simple as you can get away with and with enough documentation/tests/examples/comments might be a step in the right direction, instead of reading an OOP book and deciding to have as many abstractions and patterns as you can get. Of course, it doesn't matter a lot if you need to call a helper method to parse a JWT or other comparably straightforwards code like that, but if you need to setup 5 classes to do it, then someone has probably fucked up a little bit (I know, because I have fucked that up, bit of a mess to later develop even with 100% test coverage).
> like docs that tell you about the 10 abstraction layers needed to get data from an incoming API call through the database and back to the user,
docs.. lol
But if you see yourself as trying to make the world a better place, you'll accept that because the large code base is actually doing a lot of good out there. The article discusses this at the end.
Sure, I can accept that, I'm not saying they're all bad, just that they have very obvious drawbacks, same as projects that pick untested technologies and get burned when they cease to exist after a year.
> Large codebases are worth working in because they usually pay your salary
Though remember that greenfield projects might also pay your salary and be better for your employability and enjoyment of your profession in some cases. They might be the minority in the job market, though.
During my last job I have formulated what for me are the 2 unquestionable metrics to care about when trying to build long-term maintanable systems:
- Consistency (fully agree with the article here)
- Control
Control to me means that you have to work extremely hard to lose the ability to change the parts you care about. For example:
- Do not leak libraries and frameworks far into your business logic. At some point you want to introduce a new capabilty but say the class/type you re-used from a library makes it really awkward. Now you are faced with a huge refactor. The more logic, the purer and simpler the code should be. Ideally stdlib only.
- Do not build magic, globally shared test harnesses. Helpers yes, but if you give up control over the environment a test runs is / setting up fixtures, test data etc. you will run into a world of pain due to dependencies between tests and especially the test data.
- Do not let libraries dictate your application architecture. E.g. I always separate the web framework layer (controllers, views etc.) from the service and data layers.
- Consistency plays a major part here. If you introduce 3 libraries to do the same thing you have basically given up control over that dependencies and refactors in the future will be much harder.
I lump those together into what I call Dependency-Driven Development. Manage and reduce dependencies first, both external dependencies and internal dependencies within code, and better code will follow.
It's not always 100%, but in general the fewer the dependencies the better the code.
https://jimmyhmiller.github.io/ugliest-beautiful-codebase
Heres an alternate take of what a lack of consistency and a tendency to build microservices makes for an enjoyable work environment (and an admittedly ugly codebase)
> Because it protects you from nasty surprises, it slows down the codebase’s progression into a mess, and it allows you to take advantage of future improvements.
The codebase is already a nasty surprise for people coming in from the outside with experience or people that are aware of current best practices or outside cultures, therefore, the codebase is already a mess and you cannot take advantage of future improvements without a big bang since that would be inconsistent.
How to keep your code evolving in time and constantly feeling like it is something you want to maintain and add features to is difficult. But constantly rewriting the world when you discover a newer slight improvement will grind your development to a halt quickly. Never implementing that slight improvement incrementally will also slowly rot your feelings and your desire to maintain the code. Absolute consistency is the opposite of evolution: never allowing experimentation; no failed experiments mean no successes either. Sure, too much experimentation is equally disastrous, but abstinence is the other extreme and is not moderation.
> You can’t practice it beforehand (no, open source does not give you the same experience).
This is ridiculous. Even if you want to ignore the kernel, there are plenty of "large established codebases" in the open source world that are at least 20 years old. Firefox, various *office projects, hell, even my own cross-platform DAW Ardour is now 25 years old and is represented by 1.3M lines of code at this point in time.
You absolutely can practice it on open source. What you can't practice dealing with is the corporate BS that will typically surround such codebases. Which is not to say that the large established codebases in the open source world are BS free, but it's different BS.
Personally, I think that part is just incomplete. I think that there is an attitude that because you have jumped into several older open source projects and successfully made meaningful changes, that you have mastered the art of diving into legacy codebases and can handle anything. Even if those projects were not all that large. People sometimes think that because in theory thousands of people could be looking at the code, that they can treat the code as if it has been vetted over time by thousands of people.
But I agree that there are absolutely eligible open source codebases that could be used to practice beforehand. I'd better; I work on Firefox. It is not a small thing to dive into, but people successfully do, and get solid experience from it.
Agreed. If you didn’t work on a codebase of 1000 engineers, how else would you practice.
All these are part of a different mistake - lack of common culture amount the coders. It’s really really hard, and often antithetical to the politics of the org, but having a common area (email, actual physical meetings) between the “tech leads” - so that’s one in 8 to one in twenty devs, is vital.
Sharing code, ideas, good and bad etc is possible - but it requires deliberate effort
I'm all for consistency. But imagine having a codebase where most operations point back to an in-memory dataset representation of a database table, and everytime you change your position in this dataset (the only "correct" way of accessing that table's data), it updates the UI accordingly.
New feature where you compare 2 records ? Too bad, the UI is going to show them both then go back to the first one in a epileptic spasm.
Sometimes, things are just that bad enough that keeping it consistent would mean producing things that will make clients call saying it's a bug. "No sorry, it's a feature actually".
Small prior discussion:
Mistakes engineers make in large established codebases - https://news.ycombinator.com/item?id=42570490 - Jan 2025 (3 comments)
I agree that consistency is important, and also this is the real problem. There is no perfect architecture. Needs evolve. So consistency is a force, but architecture evolution (pushed by new features, for example) is an opposite force.
Balancing the two is not easy, and often if you do not have time, you are forced to drop your strong principles.
Let me do a simple example.
Imagine a Struts2 GUI. One day your boss ask you to do upgrade it to fancy AJAX. It is possible, for sure, but it can require a lot of effort, and finding the right solution is not easy,
I mostly agree, however experienced a different challenge exactly for the very reason of consistency:
I used to work within the Chromium codebase (at the order of 10s of million LOC) and the parts I worked in were generally in line with Google's style guide, i.e. consistent and of decent quality. The challenge was to identify legacy patterns that shouldn't be imitated or cargo-culted for the sake of consistency.
In practice that meant having an up to date knowledge of coding standards in order to not perpetuate anti-patterns in the name of consistency.
Very good advice. An implication from being reluctant to introducing dependencies is that you should remove dependencies if you can. Perhaps different parts of the system is using different PDF-generation libraries, or some clever person introduced Drools at some point but you might as well convert those rules to plain old Java.
Tooling is important too. IDE:s are great, but one should also use standalone static analysis, grepping tools like ripgrep and ast-grep, robust deterministic code generation, things like that.
Sounds like common law—one of the biggest, oldest "codebases" there is.
To quote Wikipedia:
> Common law is deeply rooted in stare decisis ("to stand by things decided"), where courts follow precedents established by previous decisions.[5] When a similar case has been resolved, courts typically align their reasoning with the precedent set in that decision.[5] However, in a "case of first impression" with no precedent or clear legislative guidance, judges are empowered to resolve the issue and establish new precedent.
Unit tests, exhaustive regression tests, and automated tests are the best way to prevent regressions.
Time spent writing good unit tests today allows you to make riskier changes tomorrow; good unit tests de-risk refactors.
Unit tests cover the single functionality but ignore the system as a whole. Most regressions I've seen in industry are because of a lack of understanding how the system components interact with one another.
Therefore, I see unit tests as one pillar but also suspect that without good quality integration or end-to-end testing you won't be able to realize the riskier re-factors you describe. Perhaps you consider these part of your regression testing and if so, I agree.
The way I like to view it is that Lego bricks might pass unit tests aka QC with zero issue, but you can still easily build an unreliable mess with them.
I see an over-reliance on automated tests recently. Often suggesting just passing the CI tests is enough to approve a change. In ancient code it's just as important to limit the blast radius of your changes, and have a good reason for making them. Not changing something is the ultimate way to prevent a regression.
If you have no idea what the usage is you are already doomed. However tests helps, whenever there is a regression you should write a test so that the same thing wont regress again.
Writing code is not like in real life where herd mentally usually saves your life. Go ahead and improve the code, what helps is tests... but also at least logging errors, and throwing errors. Tests and errors go hand in hand. Errors are not your enemy, errors helps you improve the program.
> Single-digit million lines of code (~5M, let’s say)
as someone working on a 60M codebase, we have very different understandings of the word "large". My team is leaning more towards "understand the existing code, but also try to write maintainable and readable code". Everything looks like a mess built by a thousand different minds, some of them better and a lot of them worse, so keeping consistency would just drag the project deeper into hell.
I love how the first example is "use the common interfaces for new code". If only! That assumes there _is_ a common interface for doing a common task, and things aren't just a copy-paste of similar code and tweaked to fit the use case.
So the only tweak I'd make here, is that if you are tempted to copy a bit of code that is already in 100 places, but with maybe 1% of a change - please, for the love of god, make a common function and parameterize out the differences. Pick a dozen or so instances throughout the codebase and replace it with your new function, validating the abstraction. So begins the slow work of improving an old code base created by undisciplined hands.
Oh, and make sure you have regression tests. The stupider the better. For a given input, snapshot the output. If that changes, audit the change. If the program only has user input, consider capturing it and playing it back, and if the program has no data as output, consider snapshotting the frames that have been rendered.
At some points, new improvement and occasionally ingenuity need to find a healthy way back into the workflow. Moreso early on, but consistently over time as well.
If we just create copies of copies forever, products degrade slowly over time. This is a problem in a few different spheres, to put it lightly.
The main rule is a good one, but the article overfocuses on it.
Yes, this is the counterpoint I'd make to "resist the urge to make every corner of the codebase nicer than the rest of it": in an inconsistent codebase, maybe we should prioritize making it consistent where possible, and reducing unnecessary duplication is one way to reduce future change costs.
Fantastic article.
One small point - consistency is a pretty good rule in small codebases too, for similar reasons. Less critical, maybe, but if your small codebase has a standard way of handling e.g. Auth, then you don't want to implement auth differently, for similar reasons (unified testing, there might be specialized code in the auth that handles edge cases you're not aware of, etc.)
> no, open source does not give you the same experience
Why not? There are open source projects that are many years old with millions lines of code and many developers.
yeah, this doesn't make sense.
The project I work on has had a thousand+ contributors of extremely varied skill levels, and is over 15 years old. Many lines of code, but I'm not going to count them because that's a terrible metric.
This fits all of the criteria outlined in the article. Sure, it might not apply to portfolio project #32 but there's plenty of open source repositories out there that are huge legacy codebases.
Single-digit million lines of code (~5M, let’s say)
Somewhere between 100 and 1000 engineers working on the same codebase
The first working version of the codebase is at least ten years old
All of these things, or any of them?
In any event, though I agree with him about the importance of consistency, I think he's way off base about why and where it's important. You might as well never improve anything with the mentality he brings here.
One thing I did was implement a code formatter, and enforce it in CI.
"dotnet format" can do wonders, and solved most serious inconsistency issues.
This is good practice, but I don't think it's the kind of inconsistency the author is talking about. There are forms of inconsistency that an auto formatter can't fix. An example: old code deciding that an unhandled error is a 400 status code, but in newer code it causes a 500 status code (real problem I'm dealing with at work).
Honestly Go's approach to code formatting and it being taken over by other parties has saved so much trivial debates. I remember spending stupid amounts of review comments on stupid shit like formatting, trailing commas, semicolons, spacing, etc that should have been spent on more important things. How come automatic, standardized, non-ide bound automatic formatting has only been a thing in the past decade or so? I do recall Checkstyle for Java ages ago but I forgot if it did any formatting of its own.
> How come automatic, standardized, non-ide bound automatic formatting has only been a thing in the past decade or so?
A lot of it boils down to "because the people writing code parsers/lexers weren't thinking about usability". Writing a C formatter, for example, depends on having a parser that doesn't behave like a compiler by inlining all your include files and stripping out comments. For a long time, writing parsers/lexers was the domain of compiler developers, and they weren't interested in features which weren't strictly required by the compiler.
Another effect of those improvements, incidentally, has been higher quality syntax errors.
Problem with consistency is that people miss forest for the trees.
So lots of nitpicking on irrelevant stuff - keep files under 50 lines - that is silly consistency of little minds.
Author of the post fortunately writes from experience perspective with architectural examples so I ca write that it is good article.
Or you can help migrate them to Karenina Microservices: each service can be dysfunctional in its own way.
Have you ever tried Source-graph ? To handle such consistency issues. (we are trying to do the same at Anyshift for Terraform code) For me the issue is only be exacerbated by gen AI and the era of "big code" thats ahead
"as a general rule, large established codebases produce 90% of the value."
This is only until your new upstart competitor comes along, rewrites your codebase from scratch and runs you out of the market with higher development velocity (more features).
> rewrites your codebase from scratch
This almost never happens. It takes a long time and huge amounts of money to come up to parity, and in the meantime, the legacy org is earning money on the thing you're trying to rewrite.
It's more often the case that the technology landscape shifts dramatically helping a niche player (who has successfully saturated the niche) become mainstream or more feasible. Take, for example, Intel. Their CISC designs and higher power consumption is now being challenged by relatively simpler RISC and lower power designs. Or Nvidia with its GPUs. In both cases, it's the major shifts that have hurt Intel. No one can outcompete Intel in making server CPUs of old, if they are starting from scratch.
Take another example, this time, of a successful competitor (of sorts). Oracle vs Postgres. Same deal, except that Postgres is the successor of Ingres (which doesn't exist anymore), and was developed at Berkeley and was open-source (i.e., it relied upon the free contributions of a large number of developers). I doubt that another proprietary database has successfully challenged Oracle. Ask any Oracle DB user, and you will likely get the answer that other databases are a joke compared to what it offers.
Not really, a startup can certainly disrupt the old big cos, but as its growing and taking on more large enterprise customers and scaling up teams, by the time its "producing 90% of the value" you're a few short years from finding yourself with a large, complex, and legacy codebase.
There was only one mistake that the article felt like giving a header to: "The cardinal mistake is inconsistency"
The instinct to keep doing things the wrong way because they were done the wrong way previously is strong enough across the industry without this article.
I love to
> take advantage of future improvements.
However, newer and better ways of doing things are almost invariably inconsistent with the established way of doing things. They are dutifully rejected during code review.
My current example of me being inconsistent with our current, large, established database:
Every "unit test" we have hits an actual database (just like https://youtu.be/G08FxxwPjXE?t=2238). And I'm not having it. For the module I'm currently writing, I'm sticking the reads behind a goddamn interface so that I can have actual unit tests that will run without me spinning up and waiting for a database.
If it's wrong then it needs to be fixed, obviously, but only if you fix it in a way that ensures consistency and doesn't break existing functionality. But the article doesn't mention wrong code per se, just different code. There's always multiple ways to solve a problem, stick to one for you and the 999 other developers' sakes.
Your example is a good example; you call it a unit test, but if it hits a real database it's by definition an integration test. No mocked database will be as accurate as the real deal. It'll be good enough for unit tests (amortize / abstract away the database), but not for an integration test.
You will find someday that you'd rather have tests that are comprehensive rather than tests that are fast. Especially when a significant portion of your program logic is in sql statements.
Code which is unit testable is integration testable. Not the other way around.
I test my units more thoroughly than integrations allow. Make the db return success, failure, timeout, cancellation, etc.
One of my colleagues was trying to prevent a race condition towards the end of last year. He wanted the first write to succeed, and the second to be rejected.
I suggested "INSERT IF NOT EXISTS". We agreed that it was the best approach but then he didn't put it in because the codebase doesn't typically use raw SQL.
I like keeping things consistent even if the consistent way is "wrong". One thing that bugged me about the large codebase I most recently worked on is that we used a custom assert library for tests. The Go team says this about them: https://go.dev/wiki/TestComments#assert-libraries , and having learned Go at Google, I would never have been allowed to check in code like that. But this place wasn't Google and there were tens of thousands of lines of these tests, so I told new developers to keep doing things the "wrong" way. This didn't cause many problems, even if failing tests failing too soon is pretty annoying. Most of the time the tests pass, and the yes/no signal is valuable even if you can debug more by simply `t.Errorf(...)` and continuing.
As for starting databases during tests, it's saved me a lot of trouble over the years. One time, we used sqlite for tests and Postgres for production. We had some code that inserted like `insert into foo (some_bool) values ('t')` and did a query like `select * from foo where some_bool='true'`. This query never matched rows in the tests, because t != true in SQLite, but t == true in Postgres. After that, I found it easier to just run the real database that's going to be used in production for tests. The only thing that behaves identically to production is the exact code you're running in production.
Over here, I have code that uses a hermetic Postgres binary (and chain of shared libraries because Postgres hates static linking) that starts up a fresh Postgres instance for each test. It takes on the order of a millisecond to start up: https://github.com/jrockway/monorepo/blob/main/internal/test.... The biggest problem I've had with using the "real" database in tests is low throughput because of fsync (which `perf` showed me when I finally looked into it). Fortunately, you can just disable fsync, and boy is it fast even with 64 tests running in parallel.
One thing that's been slow in the past is applying 50 migrations to an empty database before every test. When you have one migration, it's fast, but it's one of those things that starts to slow down as your app gets big. My solution is to have a `go generate` type thing that applies the migrations to an empty database and pg_dumps resulting database to a file that you check in (and a test to make sure you remembered to do this). This has two benefits; one, tests just apply a single SQL file to create the test database, and two, you get a diff over the entire schema of your database for the code reviewer to look at during code reviews. I've found it incredibly useful (but don't do it for my personal projects because I've been lazy and it's not slow yet).
Overall, my take on testing is that I like an integration test more than a unit test. I'd prefer people spend time on exercising a realistic small part of the codebase than to spend time on mocks and true isolation. This is where a lot of bugs lie.
Of course, if you are writing some "smart" code and not just "glue" code, you're going to be writing a lot of unit tests. Neither replaces the other, but if you can spend 30 seconds writing a test that does actual database queries or 2 weeks mocking out the database so the test can be a unit test instead of an integration test, I'd tell you to just write the integration test. Then you know the real code works.
I work on a service where a big percentage of the code is persisting to and reading from various stores, so unit tests have very limited value compared to integration tests.
Is this not a solved problem? Why do you need to write so much persistence logic?
I have no idea how to interpret your comment. Do you mean just throw an ORM library into your code and never give another thought to persistence issues?
At scale, there will always be challenges with latency, through put, correctness, and cost of persisting and retrieving data that require considering the specifics of your persistence code.
The service I’m describing handles abstracting these persistence concerns so other services can be more stateless and not deal with those issues.
Yeah, if you can start up the actual things, then you know your code's going to work against the actual things. That's ultimately what we're aiming for.
That's not sufficient.
The actual things are IO devices, and will sometimes fail and sometimes succeed. No judgement, just a fact of life.
I code my tests such that my logic encounters successes, timeouts, exceptions, thread-cancellations, etc. All at unit-test speed.
I can't trick an MSSQL deployment into returning me those results.
It doesn't take 30 seconds to test what your system will do in 30 seconds.
I once did some contract work for a development group at Apple in the late 90's working on a product not yet released. It was the first time I was exposed to a large codebase that could be updated by any of a large number of programmers at any time. While investigating a bug, it was scary to constantly see large, complicated routines headed with one-line comments from 20 or 30 people logging the changes they had made. There would be no consistent style of code, no consistent quality, no consistent formatting, no real sense of ownership, a real free-for-all. The system not only immediately projected a sense of hopelessness, but also indicated that any attempts at improvement would quickly be clobbered by future sloppy changes.
Great article, has much wisdom. If you're contributing to a large code base, you need to be knee deep. It's against your developer instinct, but it's constant pruning and polishing. It is the only way.
This is reminiscent of the stupid things people have tried to fix email.
If I just do this simple thing in my mail client. ... or server ... mail security and spam and whatever else will be solved.
One common mistake: not playing Factorio and understanding how to scale, maintain, and debug a large factory with friends >>
(this is a half-joke... iykyk)
there is a balance to be had here. oftentimes people make their own corner of the code because they are afraid, or their superiors are, of the scope of work which is actually about 3 hours of consistent work with good discipline and not the 17 years they imagine.
millions of lines of code itself is a code smell. some of the absolute worst code i have to work with comes from industry standard crapware that is just filled with lots of do nothing bug factories. you gotta get rid of them if you want to make it more stable and more reliable.
however... i often see the problem, and its not "don't do this obvious strategy to improve qol" its "don't use that bullshit you read a HN article about last week"
i suspect this is one of those.
Coders aren’t engineers, they are just bit bureaucrats. Everything in this post isn’t engineering, it’s autism.
Interesting point of view
this should be tattooed to every newly hired CTO arm.
To be fair, the previous engineers got paid to write the legacy mess and were employed for a long time if there's a lot of it.
Where is the incentive to go the extra mile here? Do you eventually put up with enough legacy mess, pay your dues, then graduate to the clean and modern code bases? Because I don't see a compelling reason you should accept a job or stay in a code base that's a legacy mess and take on this extra burden.
I'm some what in agreement with this. OP sounds like he has legacy codebase stockholm syndrome. Imagine being appointed mayor of a slum and deciding to build more slums because you wanted to fit in.
Would you be willing to put that take on your LinkedIn profile? If not... well that's why.
Remembering a controversial take over the code produced/improved is evidence to me that the matter is not settled that we should be spending the extra effort to align our practices with the article. The incentives are not there.
Of course it's not settled. It's awful (the code, not the article). It was written at a time when they didn't know better. Now they do, but they need somebody to maintain it anyway.
> Do you eventually put up with enough legacy mess, pay your dues, then graduate to the clean and modern code bases?
Yeah, that's called retirement. The point of the article isn't that whatever you're conforming to in the legacy codebase is worth preserving. The point is that whatever hell it is, it'll be a worse hell if you make it an inconsistent one.
If they want to pay more to maintain legacy messes, then I'm fine with more rules. That shows the business wants it done right. They don't though, so I can't agree with putting in more work for no extra compensation.
I know this counter argument sounds crabby, but going along with existing conventions on a legacy code base might be a lot of work for someone who's only familiar with more recent practices. It's not something you can passively do. Plus having to adopt these older patterns won't help your resume, which is an opportunity cost we are absorbing for free (and shouldn't have to)
Big codebases develop badly/well because of the established company culture.
Culture tends to flow from the top. If it's very expedient at the top then the attitude to code will be too.
You get stuck in the "can't do anything better because I cannot upgrade from g++-4.3 because there's no time or money to fix anything, we just work on features. Work grinds to a near halt because the difficulties imposed by tech debt. The people above don't care because they feel they're flogging a nearly-dead horse anyhow or they're just inappropriately secure about its market position. Your efforts to do more than minor improvements are going to be a waste.
Even in permissive environments one has to be practical - it's better to have a small improvement that is applied consistently everywhere than a big one which affects only one corner. It has to materially help more than just you personally otherwise it's a pain in the backside for others to understand and work with when they come to do so. IMO this is where you need some level of consensus - perhaps not rigid convention following but at least getting other people who will support you. 2 people are wildly more powerful and convincing than 1.
The senior programmers are both good and bad - they do know more than you and they're not always wrong and yet if you're proposing some huge change then you very likely haven't thought it out fully. You probably know how great it is in one situation but not what all the other implications are. Perhaps nobody does. The compiler upgrade is fine except that on windows it will force the retirement of win2k as a supported platform .... and you have no idea if there's that 1 customer that pays millions of dollars to have support on that ancient platform. So INFORMATION is your friend in this case and you need to have it to convince people. In the Internet world I suppose the equivalent question is about IE5 support or whatever.
You have to introduce ideas gradually so people can get used to them and perhaps even accept defeat for a while until people have thought more about it.
It does happen that people eventually forget who an idea came from and you need to resist the urge to remind them it was you. This almost never does you a favour. It's sad but it reduces the political threat that they feel from you and lets them accept it. One has to remember that the idea might not ultimately have come from you either - you might have read it in a book perhaps or online.
At the end, if your idea cannot be applied in some case or people try to use it and have trouble, are you going to help them out of the problem? This is another issue. Once you introduce change be prepared to support it.
In other words, I have no good answers - I've really revolutionised an aspect of one big system (an operating system) which promptly got cancelled after we built the final batch of products on it :-D. In other cases I've only been able to make improvements here and there, in areas where others didn't care too much.
The culture from the top has a huge influence that you cannot really counter fully - only within your own team sometimes or your own department and you have to be very careful about confronting it head on.
So this is why startups work of course - because they allow change to happen :-)
OP has identified a universal norm: "Law of Large Established Codebases (LLEC)" states that "Single-digit million lines of code, Somewhere between 100 and 1000 engineers, first working version of the codebase is at least ten years old" tend to naturally dissipate, increasing the entropy of the system, inconsistency being one of characteristics.
OP also states that in order to 'successfully' split a LEC you need to first understand it. He doesn't define what 'understanding the codebase' means but if you're 'fluent' enough you can be successful. My team is very fluent in successfully deploying our microfrontend without 'understanding' the monstrolith of the application.
I would even go out and make the law a bit more general: any codebase will be both in a consistent and inconsistent state. If you use a framework, library, or go vanilla, the consistency would be the boilerplate, autogenerated code, and conventional patterns of the framework/library/programming language. But inconsistency naturally crops up because not all libraries follow the same patterns, not all devs understand the conventional patterns, and frameworks don't cover all use cases (entropy increases after all). Point being, being consistent is how we 'fight' against entropy, and inconsistency is a manifestation of increasing entropy. But there is nothing that states that all 'consistent' methods are the same, just that consistency exists and can be identified but not that the identified consistency is the same 'consistency'. And taking a snapshot of the whole you will always find consistent & inconsistent coexisting