This is innovative and kind of brilliant, I'm impressed!
- Binary message formatting for efficiency and data integrity (float to/from string is messy)
- Familiar printf()-style format strings that the compiler can check[*] gives type safety!
- No preprocessing and/or code generation from message templates or anything like that.
Really nice!
Edits: markdown and grammar.
[*]: This uses an extension I think to mark the argument as a format string
lelanthran 154 days ago [-]
> This is innovative and kind of brilliant, I'm impressed!
As someone who is not the author of this blog post, I'm glad you are impressed: I wrote pretty much the same thing around 7 years ago: https://www.lelanthran.com/chap2/content.html
The difference with mine is that I intended for the library to be endian-independent, so usable over a network transmission to a different endian machine.[1]
Great minds ... (everyone knows how the rest of it goes).
[1] I also eschewed using `%` for the format string specifiers, because (in my words at the time):
> Our parameterised functions must take a specification that tells it how each field should be written, very similar to the printf and scanf family of functions. While it is indeed possible to reuse the standard format specifiers as our own field specifiers it might not be a good idea to do so as this would break The Principle of Least Astonishment.1 This is because anyone who is changing the code and who sees a string literal with well-known format specifiers such as %02x and %c would naturally (but incorrectly) assume that all format specifiers are supported. We do not wish to confuse the reader.
thechao 154 days ago [-]
You can use `funopencookie` to create a binding of a socket to a `FILE*`; in that case you could literally use fprintf and fscanf to send messages...
shawn_w 154 days ago [-]
`fdopen()` is a lot simpler.
pjmlp 154 days ago [-]
Not really inovative, this used to be common in C libraries for network programming, back when C paint was still rather fresh.
Also reminds me of the huge shortcoming that these "simple" serialization libraries have. In most use-cases you're eventually going to need some way to add/remove fields from the payloads.
JSON and protobuf make that easy. You can start emitting the new field and then update the receiving end at your leisure.
With these libraries you have to write your own versioning system. For a small performance improvement over JSON parsing it's pretty much not worth it. There's almost always going to be a piece of lower hanging fruit.
sltkr 154 days ago [-]
This is a valid concern, but it's possible to work around in a similar way that you do with protobufs: never redefine the format of any message type used in production, but instead, introduce a new message type whenever you need to change the format.
For example, imagine you are sending message type 1 with format "%d %d" and later you realize you actually need three instead of two ints. You introduce message type 2 with format "%d %d %d" and update your readers to support both types. Once those are deployed in production, you update your writers to send the new message.
This is kind of the opposite of what happens with protobufs, where unsupported fields are silently dropped by the receiver, so you can update the sender before the receiver. But this is arguable less safe, since if the receiver drops some fields, it might not interpret it the way the sender intended. In that case, it might be more sane if the receiver outright refuses to process a message it doesn't understand.
To mention a usecase for this library: it is used as part of the software Parrot ships with their drones.
Davidbrcz 154 days ago [-]
And like all of Parrot's code, it's terrible.
Source: I worked on Parrot's code. And you can make your mind by looking at the other projects in Parrot dev group...
0xbadcafebee 154 days ago [-]
Show me code written by a corporation that isn't terrible and I'll show you one developer's pet project
Diederich 154 days ago [-]
Honestly most of the code I saw inside of Facebook was pretty solid, and some of it was exceptional.
ibash 154 days ago [-]
This is a bit sad and disappointing.
Writing good code is not impossible. Or even difficult.
It just requires engineers to hold themselves to a higher standard.
shadowgovt 154 days ago [-]
It requires a lot of things.
For starters, engineers will disagree on what "a higher standard" means.
travisgriggs 154 days ago [-]
And none of manager types really agree with what the various engineers think either. They’ll do tongue in cheek buy in, but usually their feedback loops drive for different standards.
gtirloni 154 days ago [-]
design sessions, code reviews, style guidelines, sanitizers, etc... the list is endless. you know, software engineering and all.
shadowgovt 153 days ago [-]
These are the things engineers disagree on: the structure of these, how much of them to do, what constitutes "good" or "necessary" practice (I have a friend who is fond of saying that every place he's worked that claims to use scrum engages in "scrum-but" in practice. "We do scrum, buuuut the daily standup is a waste of time." "We do scrum, buuut we don't do sprint planning; we just have people pick up tasks on a common queue as they come up dry." And so on.
(Indeed, I've worked in industry sub-sectors where some of these best practices are widely understood to be counterproductive. Videogames often do few if any unit tests, preferring instead to rely on an army of human testers reviewing every build over and over because you can't generally capture the je ne sais quoi that is the feeling of "fun" in a test or even a design doc. By the time they've rapid-developed until they have a prototype that feels right, that prototype is the product and it doesn't make sense to add tests at that point because you can just test that one artifact to death by hand and ship it. This does, indeed, damage the reliability of the end product and result in something that often needs a week or two of follow-up patches... But because for all they complain, gamers don't actually stop buying games day 1 because of the bugs, the industry as learned leaving them in is acceptable risk that saves time).
0xbadcafebee 154 days ago [-]
I've been writing code for decades and I only recently got to the point where I'd consider my code good. Either writing good code isn't that easy, or I'm just really bad at it...
solarkraft 154 days ago [-]
“good code” is code that makes money, in the context of corporations.
drdaeman 154 days ago [-]
And that is exactly how we ended up in this modern word of crappy software. By not caring.
gtirloni 154 days ago [-]
in the context of someone who won't be maintaing it, sure.
shadowgovt 154 days ago [-]
Well, that's the thing. Where the money is is in novelty, not maintenance.
If we shifted resources to supporting maintenance over novelty, this could change... But the people with the money want to make more money, and that mostly happens in the "come out with something novel and everyone gives you a buck" space, not the "keep the electrical grid control architecture running" space.
Incentives drive the industry.
immibis 154 days ago [-]
IIRC Cube 2: Sauerbraten uses a similar concept for network, though the format strings aren't similar to printf (just a simple list of sized types) and the types/format aren't transmitted on the wire.
nurettin 154 days ago [-]
> How the knowledge of
the format string is shared is out of the scope of this library. It can simply
be a shared header with defines.
They don't force you to share the format strings at connection time. In fact, it is just a thin layer around sockets and you just override the message handler. I like it! Might be improved further with something like libfmt which is also used by spdlog.
jbverschoor 154 days ago [-]
How is the string length determined?
How does this handle packed bits/ and odd sized numbers, for example a 5 bit digit and 3 flags
edflsafoiewq 154 days ago [-]
Strings are serialized as (type code) (uint16 length) (string contents) (null byte).
In the API, the string length is determined the normal way in each language: strlen() in C, len() in Python. The trailing NUL allows the C decoder to return just a pointer to the string (without allocating).
So internal null bytes and strings longer than 0xFFFF are functionally prohibited.
rangerelf 154 days ago [-]
I think that would be up to you; just use a union struct with that data and send/receive the corresponding int.
tonyg 154 days ago [-]
Feels similar to the dbus serialisation format.
inetknght 154 days ago [-]
I've used libpomp to interact with Parrot's Anafi drones.
This library takes the idea of modern type safety and throws it away. Instead, the library leans in on `printf()` instead, which is known to be unsafe. And it does it in a memory-unsafe language.
...on a drone. Where safety needs to be important (even a small drone can do significant damage).
It's neat and all. But that's a killer anti-feature in my opinion. I wouldn't use it on my drones.
khimaros 154 days ago [-]
i wonder if there is anything equivalent to this for rust?
touringmachine 154 days ago [-]
This is very clever. Furthermore, lmao.
lukevp 154 days ago [-]
I don’t really get the advantage of this over either json (for human readability), protobuf (for type safety and schemas at the parsing layer), etc. it seems like it mostly has disadvantages and would be difficult to implement efficiently because there are not consistent characters to tokenize.
jitl 154 days ago [-]
It’s binary and there’s no code generating step like with Protobuf. Protobuf is hugely complicated, I can see using this for IPC between processes I own. I’m not sure what you mean about tokenization, but printf format strings are pretty well understood.
klodolph 154 days ago [-]
I read the page and missed the part where it’s binary. I had to read it a second time to find that information.
The reason why I missed it the first time is because this information is introduced in the second half of a paragraph, where the first half is about a different topic. This is a bad way to divide paragraphs.
MobiusHorizons 154 days ago [-]
The paragraph:
> The encoding/decoding is done with printf/scanf like functions with a format
string and a variable number of arguments. However, no actual string formatting
is done, the payload is a binary representation of arguments.
Seems to encapsulate the complexity of having a string formatting interface to a binary protocol reasonably clearly, but is not crafted to withstand speed reading or skipping over sentences.
154 days ago [-]
packetlost 154 days ago [-]
It's far, far simpler, than Protobuf, more efficient than JSON, and uses existing infrastructure that many people will be at least loosely familiar with. I think it's a pretty good idea!
shadowgovt 154 days ago [-]
Key things it lacks that protobuf offers:
- bindings to other languages: I'm seeing C/C++ and Python supported. I could, hypothetically, use anything with a C-native interface to wrap that C library and bind to other languages, but that's work I'll have to do. And that binding is going to be as easy-to-use as a wrapper around a C-native library.
- versioning (or some other way to deal with drift between serialization and deserialization). In practice, protocol buffers are often used to do both long-term storage serialization and short-term on-wire transmission. Both use cases always eventually involve two services that cannot be kept in lockstep sync on releases (either because you have to read older files or because you needed to release your backend before your frontend, do a rolling release, whatever).
This is simpler than protobuf, which is cool (and I'd recommend looking over its implementation to figure out how to write something like this), but if you put it into implementation in a non-toy distributed-system project it will not scale. It may have other applications though; I can definitely see it as the backbone for a simple videogame messaging protocol.
Still, no harm in writing a thing and putting it out there; I'm happy it exists even if I won't use it.
packetlost 154 days ago [-]
> but if you put it into implementation in a non-toy distributed-system project it will not scale
Plenty of distributed systems don't have the requirements you outline or need to scale. Distributed systems are not necessarily large, cross organizational lines, or require rolling releases. Sometimes the tradeoff in complexity by, for example, introducing a dependency on protoc et. al., is not worth it. I've built several systems that this would've been a perfectly fine choice for.
I'd also point out that because it's so simple, adding implementations in new languages should be fairly straightforward, but I think this really shines in C codebases where complexity must be mostly dealt with by hand (for better or worse).
154 days ago [-]
0xbadcafebee 154 days ago [-]
It's intended to be a barebones IPC between 2 processes. It's not for Google to use for 100-gigabit control planes, or for web development.
(But fwiw, if you did want to inspect this traffic as human-readable, it would be insanely easy to create either a packet sniffer for this or add support to tcpdump/wireguard. It's just dumping a printf format code and some raw data onto the wire, so you just pick up a new frame, read the format code, snip the data from the frame, pass it to printf() with the format code and the data. It's like 12 lines of code, no library needed other than libc. As a hacker I find this way more exciting than using a bloated standard to pass utf-8 strings)
dhagrow 154 days ago [-]
Seems like a better comparison would be any schemaless binary format, like msgpack, bson, or the many others. This adds some basic type validation, which you would have to do with a separate library, like msgspec if you're using Python.
lanstin 154 days ago [-]
The format (%ii) semantics aren't platform independent that seems like an obvious error.
And code gen and protocol descriptions bring a certain useful discipline to modifying existing protocols.
But it has that funky C style so I bet people using C like it.
Rendered at 01:18:43 GMT+0000 (UTC) with Wasmer Edge.
- Binary message formatting for efficiency and data integrity (float to/from string is messy)
- Familiar printf()-style format strings that the compiler can check[*] gives type safety!
- No preprocessing and/or code generation from message templates or anything like that.
Really nice!
Edits: markdown and grammar.
[*]: This uses an extension I think to mark the argument as a format string
As someone who is not the author of this blog post, I'm glad you are impressed: I wrote pretty much the same thing around 7 years ago: https://www.lelanthran.com/chap2/content.html
The difference with mine is that I intended for the library to be endian-independent, so usable over a network transmission to a different endian machine.[1]
Great minds ... (everyone knows how the rest of it goes).
[1] I also eschewed using `%` for the format string specifiers, because (in my words at the time):
> Our parameterised functions must take a specification that tells it how each field should be written, very similar to the printf and scanf family of functions. While it is indeed possible to reuse the standard format specifiers as our own field specifiers it might not be a good idea to do so as this would break The Principle of Least Astonishment.1 This is because anyone who is changing the code and who sees a string literal with well-known format specifiers such as %02x and %c would naturally (but incorrectly) assume that all format specifiers are supported. We do not wish to confuse the reader.
https://perldoc.perl.org/perlpacktut
JSON and protobuf make that easy. You can start emitting the new field and then update the receiving end at your leisure.
With these libraries you have to write your own versioning system. For a small performance improvement over JSON parsing it's pretty much not worth it. There's almost always going to be a piece of lower hanging fruit.
For example, imagine you are sending message type 1 with format "%d %d" and later you realize you actually need three instead of two ints. You introduce message type 2 with format "%d %d %d" and update your readers to support both types. Once those are deployed in production, you update your writers to send the new message.
This is kind of the opposite of what happens with protobufs, where unsupported fields are silently dropped by the receiver, so you can update the sender before the receiver. But this is arguable less safe, since if the receiver drops some fields, it might not interpret it the way the sender intended. In that case, it might be more sane if the receiver outright refuses to process a message it doesn't understand.
https://docs.python.org/3/library/struct.html
Source: I worked on Parrot's code. And you can make your mind by looking at the other projects in Parrot dev group...
Writing good code is not impossible. Or even difficult.
It just requires engineers to hold themselves to a higher standard.
For starters, engineers will disagree on what "a higher standard" means.
(Indeed, I've worked in industry sub-sectors where some of these best practices are widely understood to be counterproductive. Videogames often do few if any unit tests, preferring instead to rely on an army of human testers reviewing every build over and over because you can't generally capture the je ne sais quoi that is the feeling of "fun" in a test or even a design doc. By the time they've rapid-developed until they have a prototype that feels right, that prototype is the product and it doesn't make sense to add tests at that point because you can just test that one artifact to death by hand and ship it. This does, indeed, damage the reliability of the end product and result in something that often needs a week or two of follow-up patches... But because for all they complain, gamers don't actually stop buying games day 1 because of the bugs, the industry as learned leaving them in is acceptable risk that saves time).
If we shifted resources to supporting maintenance over novelty, this could change... But the people with the money want to make more money, and that mostly happens in the "come out with something novel and everyone gives you a buck" space, not the "keep the electrical grid control architecture running" space.
Incentives drive the industry.
They don't force you to share the format strings at connection time. In fact, it is just a thin layer around sockets and you just override the message handler. I like it! Might be improved further with something like libfmt which is also used by spdlog.
In the API, the string length is determined the normal way in each language: strlen() in C, len() in Python. The trailing NUL allows the C decoder to return just a pointer to the string (without allocating).
So internal null bytes and strings longer than 0xFFFF are functionally prohibited.
This library takes the idea of modern type safety and throws it away. Instead, the library leans in on `printf()` instead, which is known to be unsafe. And it does it in a memory-unsafe language.
...on a drone. Where safety needs to be important (even a small drone can do significant damage).
It's neat and all. But that's a killer anti-feature in my opinion. I wouldn't use it on my drones.
The reason why I missed it the first time is because this information is introduced in the second half of a paragraph, where the first half is about a different topic. This is a bad way to divide paragraphs.
> The encoding/decoding is done with printf/scanf like functions with a format string and a variable number of arguments. However, no actual string formatting is done, the payload is a binary representation of arguments.
Seems to encapsulate the complexity of having a string formatting interface to a binary protocol reasonably clearly, but is not crafted to withstand speed reading or skipping over sentences.
- bindings to other languages: I'm seeing C/C++ and Python supported. I could, hypothetically, use anything with a C-native interface to wrap that C library and bind to other languages, but that's work I'll have to do. And that binding is going to be as easy-to-use as a wrapper around a C-native library.
- versioning (or some other way to deal with drift between serialization and deserialization). In practice, protocol buffers are often used to do both long-term storage serialization and short-term on-wire transmission. Both use cases always eventually involve two services that cannot be kept in lockstep sync on releases (either because you have to read older files or because you needed to release your backend before your frontend, do a rolling release, whatever).
This is simpler than protobuf, which is cool (and I'd recommend looking over its implementation to figure out how to write something like this), but if you put it into implementation in a non-toy distributed-system project it will not scale. It may have other applications though; I can definitely see it as the backbone for a simple videogame messaging protocol.
Still, no harm in writing a thing and putting it out there; I'm happy it exists even if I won't use it.
Plenty of distributed systems don't have the requirements you outline or need to scale. Distributed systems are not necessarily large, cross organizational lines, or require rolling releases. Sometimes the tradeoff in complexity by, for example, introducing a dependency on protoc et. al., is not worth it. I've built several systems that this would've been a perfectly fine choice for.
I'd also point out that because it's so simple, adding implementations in new languages should be fairly straightforward, but I think this really shines in C codebases where complexity must be mostly dealt with by hand (for better or worse).
(But fwiw, if you did want to inspect this traffic as human-readable, it would be insanely easy to create either a packet sniffer for this or add support to tcpdump/wireguard. It's just dumping a printf format code and some raw data onto the wire, so you just pick up a new frame, read the format code, snip the data from the frame, pass it to printf() with the format code and the data. It's like 12 lines of code, no library needed other than libc. As a hacker I find this way more exciting than using a bloated standard to pass utf-8 strings)
And code gen and protocol descriptions bring a certain useful discipline to modifying existing protocols.
But it has that funky C style so I bet people using C like it.