I see various comments below along the lines of “oh, the article is missing so and so”. OK… then please see the other articles in this series! I think they cover most of what you are mentioning :-)
Some of these were previously discussed here too, but composing this in mobile and finding links is rather painful… so excuse me from not providing those links now.
WalterBright 49 minutes ago [-]
The Zortech C/C++ compiler had another memory model: handle pointers. When dereferencing a handle pointer, the compiler emitted code that would swap in the necessary page from expanded memory, extended memory, or disk.
It works like a virtual memory system, except that the compiler emitted the necessary code rather than the CPU doing it in microcode.
That is sort of like inlining the demand paging code from the OS. When we have exokernels, they exist as a library so can be delt with like regular code
This would be trivial (and fun) to implement with Wasm.
Aardwolf 3 hours ago [-]
Many things in computing are elegant and beautiful, but this is not one if them imho (the overlapping segments, the multiple pointer types, the usage of 32 bits to only access 1MB, 'medium' having less data than 'compact', ...)
akira2501 58 minutes ago [-]
> but this is not one
It really is though. Memory and thus data _and_ instruction encoding were incredibly important. Physical wires on the circuit board were at a premium then as well. It was an incredibly popular platform because it was highly capable while being stupidly cheap compared to other setups.
Engineering is all about tradeoffs. "Purity" almost never makes it on the whiteboard.
Joker_vD 2 hours ago [-]
Yeah, good thing that e.g. RV64 has RIP-relative addressing mode that can address anywhere in the whole 56-bits of available space with no problems, unlike the silly 8086 that resorted to using a base register to overcome the short size of its immediate fields.
akira2501 57 minutes ago [-]
...and then x86_64 went ahead and added RIP relative addressing back in, and you get the full 64 bits of address space.
Joker_vD 31 minutes ago [-]
...you know that that's not true, neither for x64 nor RV64, and my comment was sarcastic, right? Both can only straightforwardly address ±2 GiB from the instruction pointer; beyond that, it's "large code model" all over again, with the same inelegant workarounds that's been rediscovered since the late sixties or so. GOT and PLT versus pools of absolute 64-bit addresses, pick the least worst one.
akira2501 11 minutes ago [-]
> and my comment was sarcastic, right?
Pardon me for not realizing and treating it appropriately.
> with the same inelegant workarounds that's been rediscovered since the late sixties or so
Short of creating instructions that take 64bit immediate operands you're always going to pay the same price. An indirection. This will look different because it will be implemented most efficiently differently on different architectures.
> GOT and PLT versus pools of absolute 64-bit addresses, pick the least worst one.
Or statically define all those addresses within your binary. That seems more "elegant" to you? You'll have the same problem but your loader will now be inside out or you'll have none of the features the loader can provide for you.
At that point just statically link all your dependencies and call it an early day.
skissane 48 minutes ago [-]
I think it is a pity Intel went with 16 byte paragraphs instead of 256 byte paragraphs for the 8086.
With 16 byte paragraphs, a 16 bit segment and 16 bit offset can only address 1MiB (ignoring the HMA you can get on 80286+).
With 256 byte paragraphs, the 8086 would have been able to address 16MiB in real mode (again not counting the HMA, which would have been a bit smaller: 65,280 bytes instead of 65,520 bytes).
nox101 3 hours ago [-]
I feel like this is missing EMS and XMS memory. Both were well supported ways of getting more than 640k. EMS worked by page banking. 1 or 2 64k segments of memory would be changed to point to different 64k banks from an add on memory card. XMS just did a copy instead of a page bank IIRC. It's been a long time but I wrote DOS apps that used both to support more than 640k of memory using both standards.
As someone that was already coding during those days, having done the transition from a Timex 2068 into MS-DOS 3.3 and wonderful 5¼-inch floppies, the article is quite good.
One thing missing are overlays, where we could have some form of primitive dynamic loading, having multiple code segments for the same memory region, naturally only one could be active at a time.
PaulHoule 3 days ago [-]
Today Java has pointer compression where you use a 32 bit reference but shift it a few places to the left to make a 64-bit address which saves space on pointers but wastes it on alignment
xxs 2 hours ago [-]
All allocated objects would have the three least significant bits as 0. Any java object cannot be 'too small' as they all have object headers (more if you need a fully blown synchronized/mutex). So with compressed pointers (up to 32GB Heaps) all objects are aligned but then again, each pointer is 4 bytes only (instead of 8). Overall it's a massive win.
o11c 3 hours ago [-]
It's not wasted on alignment, since that alignment is already required (unless you need a very large heap). Remember that Java's GC heap is only used to allocate Objects, not raw bytes. There are ways to allocate memory outside of the heap and if you're dealing with that much raw data you should probably be using them.
geon 3 hours ago [-]
Is this only relevant to real mode, or is it still in use in protected mode and/or x64?
Dwedit 2 hours ago [-]
On 32-bit Windows, segmentation registers still exist, but they are almost always set to zero. CS (code segment), DS (data segment), ES (extra segment), and SS (stack segment) are all set to zero. But FS and GS are used for other purposes.
For a 32-bit program, FS is used to point to the Thread Information Block (TIB). GS is used to point to thread-local storage since after Windows XP. Programs using GS for thread-local storage won't work on prior versions of Windows (they'll just crash on the first access).
X64 made it even more formal that CS, DS, SS and ES are fixed at zero. 32-bit programs running on a 64-bit OS can't reassign them anymore, but basically no programs actually try to do that anyway.
---
As for shorter types of pointers being in use? Basically shorter pointers are only used for things relative to the program counter EIP, such as short jumps. With 32-bit protected mode code, you can use 32-bit pointers and not worry about 64K-size segments at all.
---
Meanwhile, some x64 programs did adopt a convention to use shorter pointers, 32-bit pointers on a 64-bit operating system. This convention is called x32, but almost nobody adopted it.
xxs 2 hours ago [-]
>some x64 programs did adopt a convention to use shorter pointers, 32-bit pointers on a 64-bit operating system.
It's doable in managed languages, e.g. Java has compressed pointers by default on sub 32GB heaps. I suppose it's doable even in C alike setup (incl OS calls) but that would require wrappers to bit shift the pointers on each dereference (and passive to the OS, extern)
o11c 3 hours ago [-]
It's worth noting that all the memory models have DS=SS, which makes sense for C (where you often take the address of a local variable - though nothing is stopping you from having a separate "data stack" for those) but is a silly restriction for some other languages.
I'm sure someone took advantage of this, but my knowledge is purely theoretical.
xxs 2 hours ago [-]
I never had SS=DS in Assembly. Used it for TSR for example.
brudgers 2 days ago [-]
"DOS Memory Models" brought "QEMM" immediately to mind.
I see various comments below along the lines of “oh, the article is missing so and so”. OK… then please see the other articles in this series! I think they cover most of what you are mentioning :-)
The first was on EMS, XMS, HMA and the like: https://blogsystem5.substack.com/p/from-0-to-1-mb-in-dos
The second was on unreal mode: https://blogsystem5.substack.com/p/beyond-the-1-mb-barrier-i...
The third was on DJGPP: https://blogsystem5.substack.com/p/running-gnu-on-dos-with-d...
And the last, which follows this one, is on 64 bit memory models: https://blogsystem5.substack.com/p/x86-64-programming-models
Some of these were previously discussed here too, but composing this in mobile and finding links is rather painful… so excuse me from not providing those links now.
It works like a virtual memory system, except that the compiler emitted the necessary code rather than the CPU doing it in microcode.
https://www.digitalmars.com/ctg/handle-pointers.html
Similarly, Zortech C++ had the "VCM" memory model, which worked like virtual memory. Your code pages would be swapped in an out of memory as needed.
https://digitalmars.com/ctg/vcm.html
This would be trivial (and fun) to implement with Wasm.
It really is though. Memory and thus data _and_ instruction encoding were incredibly important. Physical wires on the circuit board were at a premium then as well. It was an incredibly popular platform because it was highly capable while being stupidly cheap compared to other setups.
Engineering is all about tradeoffs. "Purity" almost never makes it on the whiteboard.
Pardon me for not realizing and treating it appropriately.
> with the same inelegant workarounds that's been rediscovered since the late sixties or so
Short of creating instructions that take 64bit immediate operands you're always going to pay the same price. An indirection. This will look different because it will be implemented most efficiently differently on different architectures.
> GOT and PLT versus pools of absolute 64-bit addresses, pick the least worst one.
Or statically define all those addresses within your binary. That seems more "elegant" to you? You'll have the same problem but your loader will now be inside out or you'll have none of the features the loader can provide for you.
At that point just statically link all your dependencies and call it an early day.
With 16 byte paragraphs, a 16 bit segment and 16 bit offset can only address 1MiB (ignoring the HMA you can get on 80286+).
With 256 byte paragraphs, the 8086 would have been able to address 16MiB in real mode (again not counting the HMA, which would have been a bit smaller: 65,280 bytes instead of 65,520 bytes).
https://en.wikipedia.org/wiki/Expanded_memory
https://en.wikipedia.org/wiki/Extended_memory
One thing missing are overlays, where we could have some form of primitive dynamic loading, having multiple code segments for the same memory region, naturally only one could be active at a time.
For a 32-bit program, FS is used to point to the Thread Information Block (TIB). GS is used to point to thread-local storage since after Windows XP. Programs using GS for thread-local storage won't work on prior versions of Windows (they'll just crash on the first access).
X64 made it even more formal that CS, DS, SS and ES are fixed at zero. 32-bit programs running on a 64-bit OS can't reassign them anymore, but basically no programs actually try to do that anyway.
---
As for shorter types of pointers being in use? Basically shorter pointers are only used for things relative to the program counter EIP, such as short jumps. With 32-bit protected mode code, you can use 32-bit pointers and not worry about 64K-size segments at all.
---
Meanwhile, some x64 programs did adopt a convention to use shorter pointers, 32-bit pointers on a 64-bit operating system. This convention is called x32, but almost nobody adopted it.
It's doable in managed languages, e.g. Java has compressed pointers by default on sub 32GB heaps. I suppose it's doable even in C alike setup (incl OS calls) but that would require wrappers to bit shift the pointers on each dereference (and passive to the OS, extern)
I'm sure someone took advantage of this, but my knowledge is purely theoretical.
So possibly related, https://en.wikipedia.org/wiki/QEMM