Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Fair: A Go library for serving resources fairly (github.com)

211 points by ngaut 4 days ago | 40 comments

ignoramous 3 days ago [-]

> Since the state is stored in a multi-level Bloom Filter style data structure, the memory needed is constant and does not scale with the number of clients.

Constant memory, but those hashes will take up CPU cycles. If you're running a workload that completes sub 20 milliseconds, these cycles spent hashing may not be worth it over, say, a constant-time admission control like token bucket.

mnadkvlb 3 days ago [-]

Absolutely my thought as well. During my thesis i found token buckets to be better and more fair compared to other algos[1]. This was in libvirt. my finding was that it scaled well up to around 1k buckets if memory serves right. after that the results were weird to say the least. (of course the servers i tested on were quite old with not too much memory). Would be nice to run my thesis tests again to find what happened in the last decade.

I tried writing another algorithm for network splitting but didn't get any better results. [1]: https://www.csg.uzh.ch/publications/details.php?id=1007

JensRantil 1 days ago [-]

The world is full of trade-offs. I don't think the author intended to solve CPU-bound workloads like that. Or have they claimed that?

remram 2 days ago [-]

I'm not exactly sure what you propose, a token bucket per client? In a hashmap client=>bucket?

otterley 2 days ago [-]

That's the typical design for a token-bucket rate limiter. A token-bucket limiter uses more memory but is a simpler design. I believe this bloom-filter based implementation is designed to be more memory efficient at the cost being less CPU efficient.

As usual, tradeoffs are everywhere.

hinkley 2 days ago [-]

Bloom filters really shine when they avoid a network round trip or pulling data from disk. Those are so far away you can easily afford to spend 5% of the cost of the request to avoid 99% of the requests.

otterley 2 days ago [-]

Agreed! In a typical token bucket implementation, bucket state must be shared across multiple servers, which means you need a shared state repository such as Redis or a database (SQL or noSQL). This can add milliseconds to each request being handled.

hinkley 2 days ago [-]

> This can add milliseconds

My previous (very long) project was in such a state when I got there that it was only in the last year I was there that measuring things in microseconds was something I could get on other people's radars. I wish I had started sooner because I found a lot of microseconds in about a four month period. That was the most intensive period of user-visible latency reduction I saw the entire time it was there and second through fourth place took years of manpower to accomplish. Past me and future me are both still mad about that.

foobazgt 2 days ago [-]

Rate limiters seem like the poster child for something you'd want to host in an in-memory KVS. Typical access times look more like ~100us, not milliseconds. Even if you have a complicated algorithm, you can still limit it to a single round trip by using a Redis Function or some moral equivalent.

jrockway 2 days ago [-]

You can also rate limit on the server side. Service A talks to Service B. Service B is 3 replicas. Each replica keeps track of how many times Service A talked to it. If that goes over the limit / 3, deny the request. Now there is no IPC required.

This isn't as accurate, but it's often adequate. I have never liked making a network request to see if I can make a network request; too slow. (I do use this architecture to rate-limit my own website, mostly because I wanted to play with writing a service to do that. But I'd hesitate to make someone else use that.)

otterley 2 days ago [-]

That would work if you can assign requests from a given tenant to a single instance, but there are many situations in which that's either impossible or unwise. What if a single server doesn't have enough capacity to handle all the traffic for a tenant? How do you preserve the state if that instance fails?

foobazgt 2 days ago [-]

I'm sorry, I don't think I understand your question. Are you talking about the KVS? You shard the servers for extra capacity. Several KVS's have built-in clustering if you want to go that route. They're usually incredibly stable, but if one goes down for whatever reason (say the physical machine fails), you just spin another one up to take its place.

In terms of preserving state, the answer for rate limiting is that it is almost always far, far less dangerous to fail open than it is to deny requests during a failure. If you really, really, wanted to preserve state (something I'd suggest avoiding for a rate limiter), several KVS's have optional persistence you can turn on, for example, Redis' AOF.

The end services themselves should be designed with some sort of pushback mechanism, so they shouldn't be in any danger of overloading, regardless of what's going on with the rate limiter.

otterley 2 days ago [-]

I think I misunderstood what you were saying. By "in-memory KVS" with "access times in the microseconds" I thought you were implying a KVS hosted on the server that handles the requests. Otherwise, even if the KVS can respond in 100us to a local query, network latency is going to add much more than that.

foobazgt 2 days ago [-]

Ah ok, that makes sense. Over loopback, you can roundtrip Redis at least as fast as double-digit micros. Intra-DC, your network latency should be somewhere in the triple-digit micros. I'd say if you're not seeing that, something is probably wrong.

hinkley 2 days ago [-]

I don't recall all the details but we did end up with lua to talk to redis or memcached to do some traffic shaping at one point. One for bouncing to error pages before we switched to CDN (long story), and another one for doing something too clever by half about TTFB. It's still really cheap especially on a box that is just doing load balancing and nothing else.

If you wanted to throw another layer of load balancer in, there are consistent hashing-adjacent strategies in nginx+ that would allow you to go from 2 ingress routers to 3 shards with rate limiters to your services, using one KV store per box. But I highly suspect that the latency profile there will look remarkably similar to ingress routers doing rate limiting talking to a KV store cluster.

tazu 3 days ago [-]

Does anyone have some real-world use cases for something like this? The algorithm is cool but I'm struggling to see where this is applicable.

codaphiliac 3 days ago [-]

Thinking this could be useful in a multi tenants service where you need to fairly allocate job processing capacity across tenants to a number of background workers (like data export api requests, encoding requests etc.)

jawns 3 days ago [-]

That was my first thought as well. However, in a lot of real world cases, what matters is not the frequency of requests, but the duration of the jobs. For instance, one client might request a job that takes minutes or hours to complete, while another may only have requests that take a couple of seconds to complete. I don't think this library handles such cases.

hinkley 2 days ago [-]

Lots of heuristics continue to work pretty well as long as the least and greatest are within an order of magnitude of each other. It’s one of the reasons why we break stories down to 1-10 business days. Anything bigger and the statistical characteristics begin to break down.

That said, it’s quite easy for a big job to exceed 50x the cost of the smallest job.

codaphiliac 3 days ago [-]

defining a unit of processing like duration or quantity and then feeding the algorithm with the equivalent of units consumed (pre or post processing a request) might help.

dtjohnnymonkey 2 days ago [-]

To mitigate this case you could limit capacity in terms of concurrency instead of request rate. Basically it would be like a fairly-acquired semaphore.

hinkley 2 days ago [-]

I believe nginx+ has a feature that does max-conns by IP address. It’s a similar solution to what you describe. Of course that falls down wrt fairness when fanout causes the cost of a request to not be proportional to the response time.

itake 21 hours ago [-]

The text suggests a method for managing GPU or rate-limited resources across multiple clients. It highlights the problem of spikey workloads, where a client might generate a large number of events (e.g., from a CSV upload) causing resource starvation. The text advises against using naive solutions like FIFO, which could disadvantage clients with steady live traffic.

mnadkvlb 3 days ago [-]

I responded above, but it could be used maybe for network libraries for eg. libvirt. I did my thesis on this topic a couple years ago.

I am very intrigued to find out how this would fit in, if at all.

otterley 2 days ago [-]

Rate limiters are used to protect servers from overload and to prevent attackers--or even legitimate but unintentionally greedy tenants--from starving other tenants of resources. They are a key component of a resilient distributed system.

See, e.g., https://docs.aws.amazon.com/wellarchitected/latest/framework...

This project, however, looks like a concurrency limiter, not a rate limiter. I'm also not sure how it works across a load-balanced cluster.

nstateful 4 days ago [-]

Very interesting and looking forward to trying this. I am a big fan of SFB for this type of stuff but haven't seen anything in distributed space that's beyond a science project. Would be great if I can use it on my 10k+ user web site.

JensRantil 1 days ago [-]

I coded up something like Fair a couple 1-2 years ago: https://github.com/JensRantil/conc It's alpha software, but maybe interesting to someone don't know.

roboben 2 days ago [-]

Shouldn’t this be built into a queue somehow? I’d love to see a queuing solution like SQS but has a built in fairness, where you can fully utilize a capacity but as soon as, let’s say customers compete on resources, some fairness kicks in. Is there anything like that?

ahoka 1 days ago [-]

Multiple queues?

salomonk_mur 3 days ago [-]

Why would you learn and use this over typical load-balancing solutions like K8S? Honest question.

otterley 2 days ago [-]

They are complementary solutions, not substitutes. Load balancers distribute traffic among servers and are a key component to enable horizontal scalability. Throttling is a prophylactic feature that prevents attackers or greedy consumers from overutilizing capacity and depriving it from other legitimate users. This library relates to the latter.

Unfortunately the title of the GitHub repo ("A Go library for serving resources fairly") is misleading. This is not a server; it's a library that a server can utilize to determine whether a request has exceeded fairness bounds and should be rejected with an HTTP 429 (too many requests) response.

joshuanapoli 3 days ago [-]

In a multi-tenant system, you might have one customer who drops a big job that creates a huge number of tasks. We'd like to process this as fast as possible, but not block the tasks of small jobs from other customers, which should normally be completed very quickly.

kgeist 2 days ago [-]

We had to make something similar. We have both huge tenants (200k users in one tenant) and small tenants with 10 users. Sometimes there are spikes when a large tenant generates thousands of jobs. In a naive implementation, 1 tenant would be able to completely block job processing for all other tenants. We have to make sure the next job is picked from a different tenant each time, so that all tenants were served fairly. However, a large tenant may end up waiting for its jobs to complete for too long. In that case, we move such a tenant to a different infrastructure (sometimes, fully dedicated).

dpatterbee 2 days ago [-]

Isn't the exactly the problem that preemptive multitasking is built for? For example any program built on the BEAM[1] wouldn't have this problem presumably. Do most languages not have a standard solution for this?

[1]: https://en.m.wikipedia.org/wiki/BEAM_(Erlang_virtual_machine...

jerf 2 days ago [-]

Pre-emptive multitasking is a tool that a solution might use, but it is not a solution on its own. If you have three users spawning one thread/process/task and one user spawning a million (literally, not figuratively), that one user can easily completely starve the three little users. Some of their million tasks may also be starved. But they'll get more done overall.

This whole problem gets way more complicated than our intuition generally can work with. The pathological distribution of the size of various workloads and the pathological distribution of the variety of resources that tasks can consume is not modeled well by our human brains, who really want to work with tasks that are essentially uniform. But they never are. A lot of systems end up punting, either to the OS which has to deal with this anyhow, or to letting programs do their own cooperative internal scheduling, which is what this library implements. In general "but what if I 'just'" solutions to this problem have undesirable pathological edge cases that seem like they "ought" to work, especially at the full generality of an operation system. See also the surprisingly difficult task of OOM-killing the "correct" process; the "well obviously you 'just'" algorithms don't work in the real world, for a very similar reason.

As computers have gotten larger, the pathological distributions have gotten worse. To be honest, if you're thinking of using "fair" it is likely you're better off working on the ability to scale resources instead. There's a niche for this sort of library, but it is constantly shrinking relative to the totality of computing tasks we want to perform (even though it is growing in absolute terms).

dpatterbee 2 days ago [-]

Thank you for the explanation, I hadn't clocked the difference between fair allocation for tasks and fair allocation for users.

udkl 2 days ago [-]

Appreciate this very insightful comment

maxmcd 2 days ago [-]

In preemptive multitasking you are trying to get all the work done as quickly as possible, but with FAIR and similar systems you want to make sure that a single client/user/resource cannot steal an inordinate amount of capacity.

I do not think languages/runtimes typically implement that kind of prioritization/limiting mechanism.

remram 2 days ago [-]

Those are completely unrelated. K8s does not provide client rate-control and FAIR does not do load-balancing. It appears you misunderstood what this is.

waqasloper 2 days ago [-]

[flagged]

alegonza212 2 days ago [-]

[flagged]

AnnaMere 3 days ago [-]

Extremely interesting and valuable

Rendered at 01:35:16 GMT+0000 (UTC) with Wasmer Edge.