> With Prequal we saw dramatic improvements across all metrics [for youtube], including reductions of 2x in tail latency, 5-10x in tail RIF, 10-20% in tail memory usage, 2x in tail CPU utilization, and a near-elimination of errors due to load imbalance. In addition to meeting SLOs, this also allowed us to significantly raise the utilization targets that govern how much traffic we are willing to send to each datacenter, thereby saving significant resources.
This feels like one of those "my company realized savings greater than my entire career's expected compensation."
The question you should ask is for how much less others would accept to do it. How much it saves isn't how to price some work, you price work based on the people available that can do it.
If the fire department puts out a fire in your house you don't pay them the cost of the building. You don't give your life to a doctor, etc. That way of thinking is weird.
It's not weird, but incomplete. It is broadly captured in the Economics concept of "willingness to pay", loosely the maximum price someone or some firm would be willing to pay for something of benefit to them.
This contrasts with "willingness to accept", loosely the minimum compensation someone or some firm would accept to produce a good or service (or accept some negative thing).
Neither of these is sufficient to determine the price of something precisely, but, in aggregate, these concepts bound the market price for some good or service.
I agree I wasn't super precise, I thought it was enough to show that pricing work equal or close to equal to value produced is unlikely as long as there's many people available to do it.
In my example of the fire department if nobody is really coming and you have no insurance or other way to save stuff you would indeed pay a lot. From what I read these were the dynamics in Roman times.
If you're thinking of Marcus Crassus, the dynamic there was that he'd offer to buy your burning property from you at a steep discount, and would only put the fire out if you agreed.
> "willingness to pay" [and] "willingness to accept"
Often abbreviated somewhat sloppily into demand and supply.
which should make the employee proud, and I'm sure Google is compensating them very, very well.
let's also not forget that the people involved didn't create this in a vacuum. it cost Google a LOT more than their compensation to make it possible for them to even start working on this project, let alone carrying it forward to completion.
people underestimate how hard and expensive it is to manage a company in a way that allows its employees to do a good job.
> it cost Google a LOT more than their compensation to make it possible for them to even start working on this project
The actual "not a vacuum" context here is an environment that has been basically printing money for google for the last twenty years. It did not "cost" them anything. It's fine to acknowledge that the people who built google, however well paid they are, are creating vastly more value than they are personally receiving.
Profit margins matter a hell of a lot to publicly traded companies. Share price is a multiple of earnings.
If it costs nothing, why don't we both get together and build a money-printing machine as well?
You say like it's something easy anyone can get done. Have you ever tried? If you try to build a sustainable and successful business, you'll see how hard it is.
> let's also not forget that the people involved didn't create this in a vacuum. it cost Google a LOT more than their compensation to make it possible for them to even start working on this project
...let's also not forget that Google didn't manufacture this opportunity in a vacuum. It cost the rest of society a lot more than their revenue to make it possible for them to employ people who can spend their entire days working on software.
Didn't it cost society to raise us to adulthood so that we can work for Google and other businesses? How many people have sacrificed their time and attention, directly or indirectly, to our benefit, since we were born? Yet, nobody's saying our salaries aren't deserved or the merit of our own efforts.
I think many people are explicitly saying that.
The person chose to be employed instead of starting their own business. Less risk, less reward.
I disagree with toenail being downvoted. In Europe, employees enjoy a great deal of stability. In Netherlands, it's nigh-impossible to fire someone: you have to fill in 10 pages of justification paperwork and have an independent government agency review and approve it. If someone has a long-term illness then you have to pay 70% of their salary for up to 2 years, even when they do no work at all. Most people don't want to be entrepreneur: they want clear instructions and stability. When you try to give stock to employees, the tax authorities raise an eyebrow: why would you give stock to employees when they enjoy none of your risks? It makes no sense, so we'll treat it as a form of salary, so we'll tax you 52% on the stock's paper value.
At the end of the day, what's left for the entrepreneur? You enjoy all the risk, but you don't get to have a paid 2 year sick leave. Even sympathy for your hard work can be hard to get. The potential of money is all you have.
Things are different in the US of course, where people can be fired the next minute without reason. That looks like just borderline abuse to me. But from a European perspective, the above comment does not deserve downvoting at all.
> In Netherlands, it's nigh-impossible to fire someone: you have to fill in 10 pages of justification paperwork and have an independent government agency review and approve it.
Practically speaking, won't they fire someone by negotiating a "voluntary" severance agreement? It's not like an employee wants to stay on for long once it's been made known that they are unwanted.
Though obviously a severance agreement is better than at-will employment, it's also not a guarantee of long employment.
That is possible, but you'll have to go through a lawyer to draft a contract. It costs time and money (apart from the severance fee). Assuming negotiations are successful. It's still not easy.
What you get as an employee is certainty that your compensation will only increase in the single digits per year and that the pension fund will compound a part of your income in single digits as well (pension funds typically underperform the S&P; even in bad years).
So either 2 years of “stable income” or the chance of much higher compounding rate. As I see it, employment is a nice backup if being self-employed doesn’t work out.
> why would you give stock to employees when they enjoy none of your risks? It makes no sense, so we'll treat it as a form of salary, so we'll tax you 52% on the stock's paper value.
The Netherlands is the only country in the EU that taxes on unrealised gains.
Not sure if you mean just for stock options, but more countries/regions have some form wealth tax on unrealised gains.
Just stock, but I forgot that Spain also have stock included in wealth tax for people with over 700k assets. I think it’s just NL and ES though - I dont think any other EU countries tax stocks in this way?
I thought that were more, but a quick search only revealed Norway, apart from Spain that you've already mentioned.
Also worth noting that in the Madrid region they don't have the wealth tax.
Also Denmark I believe
Especially given that many people will elect to be stably and cheaply employed and do just as good work as someone asking a “royalties model” fee basis for this kind of solution, it’s hard to see the advantage of going it alone. The competition is the ”cheap” FTE.
I'm surprised that S3/AWS hasn't blogged or done a white paper about their approach yet. It's been something like 7 years now since they moved away from standard load balancing.
If you think about an object storage platform, much like with YouTube, traditional load balancing is a really bad fit. No two requests are even remotely the same in terms of resource requirements, duration etc.
The latest deep dive on S3 at reinvent sheds some light on how it’s done.
Out of curiosity, any documents on this, even for something else than AWS's S3? I find the idea very interesting
They talked a lot about using probes to select candidate servers, but I struggled to find a good explanation of what exactly a probe was.
However the "Replica selection" section seems to shed some detail, albeit somewhat indirectly. From what I can gather a probe consists of N metrics, which are gathered by the backend servers upon request from the load balancers.
In the paper they used two metrics, requests in flight (RIF) and measured latency for the most recent requests.
I assume the backend server maintains a RIF counter and a circular list of the last N requests, which it uses to compute the average latency of recent requests (so skipping old requests in the list presumably). They mention that responding to a probe should be fast and O(1).
At least that's my understanding after glossing through the paper.
Needs to read this in-depth over weekends. Have fully imersed into LLMs for the past 2 years, ignored system research.
This appears a very logical solution, i.e., use estimation service quality instead of resource metrics, for scheduling. This is also more or less a known facts in the recent years, as systems are becoming so complex and so distributed intertwined that scheduling based on host load concerns a minor factors of serving requests. It's like one grew taller, and need to worry not stepping on huddles, but not bumping heads into door frame.
But we do need this kind of research to fomalize the practice, and get everyone on board.
Google's applied research absolutely winning here.
Using observed latency, power of two choices, and requests in flight reminds me a lot of Finagle https://twitter.github.io/finagle/guide/Clients.html#p2c-lea...
They specifically compare to client-side least-loaded with Po2C in section 5.2/figure 7.
From TFA:
> PReQuaL does not balance CPU load, but instead selects servers according to estimated latency and active requests-in-flight
So, still load balancing
Load balancing is a term of art; the actual algorithm for distributing requests need not be load-based. A more accurate term for the component might be "request distributor," but I don't foresee people changing their vocabulary any time soon.
Abstract:
We present PReQuaL (Probing to Reduce Queuing and Latency), a load balancer for...
“Don’t load balance, erm, here’s our loadbalancer” struck me as quite humorous too. :)
Maybe RIF-balancing is a better term.
Fascinating that 2-3 probes per request is a sweet spot, intuitively it seems like a lot of overhead.
Rquests (in flight or currently processing) are the load in this case. But I guess "queue balancing" captures the intuittion better: what matters for latency is the future delay more than current delay.
They estimate what load will be in the future too.
Are people really balancing load based on simple cpu utilization for non-trivial services? That seems really surprising to me, but they present it as the current best practice?
When reading a paper like this, ignore what the paper frames as the current state of the art, a lot of times. Often the research is framed in such a way that gives an impact boost by setting up somewhat of a strawman about the current state.
Every production load balancer that I have come across in the last 10 years, load balances on the metric that is important to them.
In my experience by far the most widespread solution for directing requests is randomly. Round robin is pretty popular. Everything more sophisticated than that is vanishingly rare outside of large organizations. Look in gRPC. What client side load balancers does it come with? Pick_first and round_robin.
Weighted round robin has some traction but even in engineering groups with lots of experience and talent the complexity of measuring CPU rate utilization is underappreciated.
What about this is a surprising result? Optimize as directly as possible for load balancing the thing(s) that matter most to your users (e.g. latency), not the thing that indirectly correlates to the thing(s) that matter (CPU Utilization). I've been making this exact argument at work for a while.
Funny to read this article today, just after myself and two others at work also just saved the company we work at far more than our combined expected lifetime gross earnings with a single (different) optimization.
The paper reiterates -- multiple times -- that optimizing for latency is obviously the right choice; the other major signal is instantaneous requests-in-flight (RIF). They explicitly mention, on the very first page, that their contributions are not that "low latency is good, aim for that", it's that A) they use a particular combination of RIF and latency to select replicas called their "hot-cold lexicographic rule" which they find works really well, and B) their "probes" (requests by the balancer to the backends, to discover what their current state is) are collected asynchronously rather than in the critical request serving path, to help further drive utilization. Most of the ink in the paper in Section 4 explains the design choices they made around probing as a result.
I'm not done with the paper yet, but the basics are in fact written on page one.
you got me curious. can you share?
I would have liked to have read something quantitative about the measured cost (in terms of client and server CPU) other than describing them as "small". I'm trying to imagine doing this in gRPC and it seems like the overhead would be pretty high. I know Stubby is more efficient but some hard numbers would have been nice.