NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Swiss Tables (abseil.io)
dan-robertson 22 hours ago [-]
This description reminds me of a hashtable used for some APL things, which is described in a talk online somewhere. The difference is that the APL table did Robin Hood insertions (an open table where new elements are inserted at or after the position corresponding to their hash, but you rearrange things to try to reduce the maximum distance any element is from where it should be) so it stored 4 bits of offset information and 4 bits of hash in each entry of the metadata table. Lookups were still sse based, something like:

  position = (hash>>4) & mask
  test = set1_16x8(hash&0xf) | 0xf0e0d0c0b0a090807060504030201000
  candidates = eq_16x8(load_16x8(table + position), test)
  // convert candidates to mask and test each option
Sesse__ 17 hours ago [-]
FWIW, I found that when doing Robin Hood, SSE is just a loss, since you usually find the element so soon. (If you are dominated by non-matching lookups, you can consider putting a Bloom filter in front.)
pkhuong 6 hours ago [-]
With linear probing RH, a failed lookup can stop as early as a successful one, as long as you also store the hash (useful for fast insertion): stop looking when you'd insert the key you're looking for before the current entry.
fmajid 21 hours ago [-]
Apparently Go 1.24's internal implementation of maps was changed to use Swiss Tables (reimplemented in Go, of course, not the Abseil C++ implementation).

https://www.bytesizego.com/blog/go-124-swiss-table-maps

I'd be interested to see how the new Tiny Pointers algorithm would perform against Swiss Tables, specially when occupation rate is high.

https://www.quantamagazine.org/undergraduate-upends-a-40-yea...

jsheard 17 hours ago [-]
Rust also switched their standard maps over to Swisstables a while ago. Unfortunately despite the original implementation being in C++, I don't think it's possible for C++ implementations to do the same with the STL maps due to constraints imposed by the spec. They'd have to create a new STL type for them.
tialaramex 17 hours ago [-]
Yes, in C++ in 2011 they standardized the hash table design you'd have been shown in a 1980s (or maybe 1990s if you got unlucky in where you went to school) Data Structures class. They called this type std::unordered_map

This was a Closed Addressing aka Separate Chaining table. The idea is easy enough that it's actually an Advent Of Code problem one year which is why this design is much older than the modern efficient Open Addressing tables. In C++ with std::unordered_map you are guaranteed that the address of items in the table won't change while they're in the table, this works because they're not actually stored directly in the table. You also provided an API to work with these chains of related items and to fiddle with "how full" the table is.

This API makes complete sense for a Closed Addressing table, but both the API functions and the pointer stability guarantee don't make any sense for Open Addressing.

If you only need the pointer stability you can pay just for that, Abseil offers a compatible type with that property, but if you need all the Chaining APIs (and you might in principle) then it doesn't have those, too bad.

tonfa 19 hours ago [-]
Tiny Pointers doesn't currently have practical use, it's a complexity results.

> The team’s results may not lead to any immediate applications, but that’s not all that matters, Conway said. “It’s important to understand these kinds of data structures better. You don’t know when a result like this will unlock something that lets you do better in practice.”

tialaramex 16 hours ago [-]
Yeah, naive application of big-O complexity models can lead you far astray when you have real world problems. If you can do F in n*k1 time, but I need n*n*k2 time you may think your algorithm for F is better, O(n) compared to O(n squared) but if k1 is 1 second while k2 is 80 nanoseconds then actually my "worse" algorithm is faster until n is so huge that we have runtimes of about half a year.
camel-cdr 20 hours ago [-]
Here is a great talk about the new boost::unordered_map implementation: https://m.youtube.com/watch?v=Rg8MZ5pJIJA

It's design is clearly inspired by abseil::flat_map but improves it in a few aspects.

spacechild1 18 hours ago [-]
Nitpick, but I think you mean boost::unordered_flat_map.

boost::unordered_map is a drop-in replacement for std::unordered_map and as such uses seperate chaining.

Anyway, thanks for the video!

camel-cdr 17 hours ago [-]
Yeah, thanks for the correction.
jbreckmckye 24 hours ago [-]
I'm lazy. What's a Swiss table in a couple of sentences? Assume I am _moderately_ competent with C++ but not some Level 800 cyberbrain
procaryote 24 hours ago [-]
Caveat that this is new to me and I might get it wrong

It seems to be a way to arrange a hashtable so you use most of the hashed value (H1) to identify a range of cells in your table, and the rest of the hashed value (H2) as metadata. In the metadata for a cell it's tagged as empty/full/deleted and a full cell is also tagged with its H2 value

So when looking up things you find the range with H1, then you check H2 against the metadata for that range, to see which of the cells might contain the right value.

I imagine the benefit is that checking H2 against warm metadata is cache friendly and can use vector instructions, whereas with regular double hashing you'd need to lookup cold memory to check the key, and then perhaps do it again a couple of times

Do correct me if I am wrong

tialaramex 23 hours ago [-]
It's an open addressed hash table using SIMD to get faster probing? If all that makes sense but you feel unsatisfied I can't help you and you need to ask somebody who knows exactly what you do or don't know.

If those phrases don't make sense, Wikipedia has entries on Open Addressing and on SIMD which can help you.

jbreckmckye 23 hours ago [-]
Thanks, I don't fully understand but I know enough about each idea to Google it further
froh 21 hours ago [-]
especially it doesn't "degrade" when nearly full (text book disadvantage of linear probing), by cleverly rearranging existing items on hash table insert (and also on delete).

so there is a kinda sorta "balancing" of the linear probing lengths.

jrimbault 18 hours ago [-]
Not a couple of sentences

CppCon 2017: Matt Kulukundis “Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step” : https://www.youtube.com/watch?v=ncHmEUmJZf4

n2d4 24 hours ago [-]
A hash table implementation that's faster than the usual
cjj_swe 15 hours ago [-]
I always get excited when I see Abseil related things online. I was a member of the team from 2018 until 2022 and had a blast getting to learn from the best.
DeathArrow 24 hours ago [-]
skunkworker 24 hours ago [-]
For those reading, Go officially adopted swisstable into Map for 1.24 last week.
procaryote 24 hours ago [-]
Hmm, the article is a bit hard to follow and lacks any elaboration on which cases this data structure is better or worse than other hash tables

As swiss tables are new to me I looked around a bit and found this to be a better explanation

https://faultlore.com/blah/hashbrown-tldr/

bsdooby 24 hours ago [-]
...and why is it called a Swiss table? (I am from CH, asking for a friend).
GeneralMayhem 24 hours ago [-]
Because they were designed and implemented by a team in Zurich. I'm not joking, that's literally the reason.
DeathArrow 24 hours ago [-]
I believe that's because they were first crafted in 19th century a small village in Jura Mountains by some skilled artisans.
n2d4 24 hours ago [-]
Some of its authors work(ed) at Google Zurich
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 08:27:37 GMT+0000 (UTC) with Wasmer Edge.