Does someone know why these are not reentrant without the GNU extension? It seems like a really weird choice to make a library for hash tables and then make the state global like that.
layer8 154 days ago [-]
You can look them up here in System V Interface Definition Issue 2 from 1986: http://www.bitsavers.org/pdf/att/unix/SVID/System_V_Interfac.... It has a note “Future Directions: The restriction on only having one hash search table active at any given time will be removed.” Which of course couldn’t be done without breaking compatibility.
Most programs at the time probably didn’t need more than one hash table, and it made for a simpler interface. (A program could in principle also work around the limitation by storing multiple values per hash table entry.) Most Unix libraries worked with global state at the time, since multi-threading also didn’t exist yet. I remember running into the limitation of only being able to have one lexer/parser grammar (via lex/yacc) per program.
loeg 154 days ago [-]
They're really, really old. Dating to System V Unix. At the time it was pretty common to have stateful libc functions.
kelnos 154 days ago [-]
Back when these functions were created, it was normal and expected for libc to store global state for a variety of things. Reentrancy wasn't really an issue that was considered then.
(Though this particular API has an even weirder limitation: you can only create a single hash table at a time in your entire program!)
kragen 154 days ago [-]
probably they were added in 2.9bsd or something when programs couldn't be more than about 5000 lines of code before you ran out of the 16-bit address space
> The hcreate(), hdestroy(), and hsearch() functions first appeared in AT&T System V UNIX.
which... does seem a bit late for thinking it was a good idea to only support one hash table per process? that was in 01983, 6 years after the vax was introduced and 4 years after the 68000. i mean awk supported arbitrarily many hash tables in a single process in 7th edition unix, even before it had subroutines
Someone 154 days ago [-]
> i mean awk supported arbitrarily many hash tables in a single process
This API supports that, too, as long as you don’t call one of the functions while another call is in progress. You get that for free if your process is single threaded.
SunOS 4.x implemented light-weight processes or LWPs. NetBSD 2.x+, and DragonFly BSD implement LWPs as kernel threads (1:1 model)”
According to Wikipedia, SunOS 4.0 is from 1988, NetBSD 2.0 from 2004, so I guess chances are good that AT&T System V UNIX didn’t support threads in user space programs.
kragen 154 days ago [-]
you seem to be discussing what does or doesn't support multiple threads, but what we were talking about is what does or doesn't support multiple simultaneously existing hash tables, which hcreate() doesn't. i've edited my comment upthread to clarify
(it's also true, as kimixa points out, that a single-threaded process with signal handlers isn't really single-threaded, but i don't think that's really the important consideration in this context)
for what it's worth, it's not very difficult to use alarm() to invoke a context-switching subroutine to get preemptive user-space multithreading on unix without any kernel support. in theory you have to write the context-switching subroutine in assembly language, which isn't difficult at all; here's one i wrote in arm assembly for the arm eabi, for cooperative multithreading:
.thumb_func
.globl einyield
einyield:
push {r0, r4-r11, lr} @ save all callee-saved regs plus r0
ldr r0, =current_task_pointer
ldr r1, [r0]
str sp, [r1] @ save stack pointer in current eintask
ldr r1, [r1, #4] @ load pointer to next eintask
str r1, [r0]
ldr sp, [r1] @ switch to next eintask's stack
pop {r0, r4-r11, pc} @ return into einyielded context there
but in practice you can usually just use longjmp()
preemptive multithreading requires you to save all the registers, not just the callee-saved registers, but so does signal handling, so you can usually just use the signal-handling machinery and use the above code (or its equivalent on your vax or 68000 or whatever) to context-switch between signal-handler stack frames
kimixa 154 days ago [-]
Or a signal handler? Even if there's a single kernel thread, if it can be interrupted while inside one of these functions you can hit the same sort of problem.
Reentrancy was an issue on consumer devices well before multithreading was commonly supported there, after all
rdtsc 152 days ago [-]
Good point. There are multiple levels of "safety". I think glibc API does a decent job marking and indicating which is which:
this isn't about safety, though; hcreate() and hsearch() don't support having two hash tables in the same process at the same time, not even in an unsafe way. that also makes them non-reentrant, but that isn't the main problem!
p_l 154 days ago [-]
Adding threading support is why there was a bunch of interest in using Mach microkernel in Unix systems.
clbrmbr 152 days ago [-]
Don’t key prefixes take care of the need for an arbitrary number of tables?
tedunangst 154 days ago [-]
Probably started as a function in one program. Somebody copied it into a second program. Somebody wanted it in a third program and copied it into a library.
metadat 154 days ago [-]
Is there a command-line interface for this, or is it strictly library-only?
Would be cool if there were a way to point to a memmapped region under /dev/shm or equivalent.
sph 154 days ago [-]
Type `man redis` or `man sqlite3` for memmapped hash tables with a lot of bells and whistles.
I can't see what would you do with a command line hash table CLI. That's basically a poor man's database.
metadat 154 days ago [-]
Redis is way too much, but you've got a really good point with SQLite. Thanks!
154 days ago [-]
Neywiny 154 days ago [-]
I usually turn to C++ when I need such things. Hopefully I remember this exists next time.
nextaccountic 154 days ago [-]
Don't use this, it's worse than useless
self_awareness 154 days ago [-]
I did a quick search on GitHub for today's code that uses it, and
actually found out that people do use it today, but generally I agree,
this is a legacy API, legacy implementation, and the best thing we can
do for it is to forget about it.
The `hcreate(3)` API is interesting only because it has a very limited
use, yet it still exists in libc. So it's kind of interesting in a
circus way; we want to see it, but then we want to get out of it, and we
want to continue our rational lives without clowns and bearded ladies.
There are probably many thirdparty hashtables for C which work much
better than this.
152 days ago [-]
Rendered at 01:06:00 GMT+0000 (UTC) with Wasmer Edge.
But I prefer openbsd or man7 pages, sure.
Debian: https://manpages.debian.org/
Ubuntu: https://manpages.ubuntu.com/
FreeBSD: https://man.freebsd.org/
OpenBSD: https://man.openbsd.org/
https://man.archlinux.org/
Useful if you need to see the manpage of a very recent version of a package.
This page there: https://man7.org/linux/man-pages/man3/hsearch.3.html
https://www.mankier.com/1/qemu
https://man.freebsd.org/cgi/man.cgi?query=hcreate&manpath=Fr...
Most programs at the time probably didn’t need more than one hash table, and it made for a simpler interface. (A program could in principle also work around the limitation by storing multiple values per hash table entry.) Most Unix libraries worked with global state at the time, since multi-threading also didn’t exist yet. I remember running into the limitation of only being able to have one lexer/parser grammar (via lex/yacc) per program.
(Though this particular API has an even weirder limitation: you can only create a single hash table at a time in your entire program!)
hmm, no, it doesn't seem to be in https://www.tuhs.org/Archive/Distributions/UCB/2.9BSD/ or even in https://www.tuhs.org/Archive/Distributions/UCB/4.2BSD/ or https://www.tuhs.org/Archive/Distributions/UCB/4.3BSD/. and https://man.freebsd.org/cgi/man.cgi?query=hsearch&apropos=0&... says explicitly:
> The hcreate(), hdestroy(), and hsearch() functions first appeared in AT&T System V UNIX.
which... does seem a bit late for thinking it was a good idea to only support one hash table per process? that was in 01983, 6 years after the vax was introduced and 4 years after the 68000. i mean awk supported arbitrarily many hash tables in a single process in 7th edition unix, even before it had subroutines
This API supports that, too, as long as you don’t call one of the functions while another call is in progress. You get that for free if your process is single threaded.
https://en.wikipedia.org/wiki/Thread_(computing)#Threading_m... says
“History of threading models in Unix systems
SunOS 4.x implemented light-weight processes or LWPs. NetBSD 2.x+, and DragonFly BSD implement LWPs as kernel threads (1:1 model)”
According to Wikipedia, SunOS 4.0 is from 1988, NetBSD 2.0 from 2004, so I guess chances are good that AT&T System V UNIX didn’t support threads in user space programs.
(it's also true, as kimixa points out, that a single-threaded process with signal handlers isn't really single-threaded, but i don't think that's really the important consideration in this context)
for what it's worth, it's not very difficult to use alarm() to invoke a context-switching subroutine to get preemptive user-space multithreading on unix without any kernel support. in theory you have to write the context-switching subroutine in assembly language, which isn't difficult at all; here's one i wrote in arm assembly for the arm eabi, for cooperative multithreading:
(from http://canonical.org/~kragen/sw/dev3/einkornix.S)but in practice you can usually just use longjmp()
preemptive multithreading requires you to save all the registers, not just the callee-saved registers, but so does signal handling, so you can usually just use the signal-handling machinery and use the above code (or its equivalent on your vax or 68000 or whatever) to context-switch between signal-handler stack frames
Reentrancy was an issue on consumer devices well before multithreading was commonly supported there, after all
https://sourceware.org/glibc/manual/2.40/html_node/POSIX-Saf...
There is a host of other "safety" remarks also useful to keep in mind: https://sourceware.org/glibc/manual/2.40/html_node/Other-Saf...Would be cool if there were a way to point to a memmapped region under /dev/shm or equivalent.
I can't see what would you do with a command line hash table CLI. That's basically a poor man's database.
The `hcreate(3)` API is interesting only because it has a very limited use, yet it still exists in libc. So it's kind of interesting in a circus way; we want to see it, but then we want to get out of it, and we want to continue our rational lives without clowns and bearded ladies.
There are probably many thirdparty hashtables for C which work much better than this.