Even ancient[1] JavaScript is a safer language for programming with integers[2] than modern[3] C

2022-04-01 (some minor fixes: 2022-04-02)

I hope you enjoy this whirlwind tour. Which language do you suppose is my favourite?[4]

Integer sizes

C offers many integer sizes (char, short, int, long, long long) and guarantees the minimum size of each size, but not much else: you're at the mercy of the platform you're targeting. Bytes are at least 8 bits. long might be larger than int, which might be larger than short. What's the range of a signed integer? It might be [-2⁽ⁿ⁻¹⁾, 2⁽ⁿ⁻¹⁾-1], or it might be [-2⁽ⁿ⁻¹⁾+1, 2⁽ⁿ⁻¹⁾-1]. It is difficult to handle all the possibilities!

JavaScript has exactly one integer size supported by its integer operations: 32 bits. The range is always [-2³¹, 2³¹-1]. That's not a lot of choice, but it is very predictable. Predictability is safety.

Arithmetic overflow

Let's do some arithmetic on integers!

In JavaScript, a + b with integer inputs could overflow and produce a value that's no longer an integer.[5] The resulting value is well-defined and guaranteed by the standard, so it's not the end of the world. But if you want to detect this overflow and handle it, it's easy: if ((a + b) > 0x7FFFFFFF || (a + b) < -0x80000000). If you would prefer to have a two's-complement signed integer wraparound, that's also easy: (a + b) | 0. If you would like an unsigned integer wraparound, add 0x100000000 to that then modulo.

In C, a + b with integer inputs could overflow. If the types of a and b are unsigned after the usual arithmetic conversions, then you'll get an unsigned integer wraparound and the resulting value is guaranteed by the standard — though what exactly it is depends on the size and corresponding range of the integer type, which, as we've established, depends on the platform!

But that's just the nice case. If the types of a and b are signed after the usual arithmetic conversions, then if the result would overflow, you've triggered undefined behaviour. Not only the resulting value is undefined then, but the behaviour of every aspect of the entire program! Whether the result would overflow depends, of course, on the size and range of the type, which depends on the platform. Oh, and by the way, the usual arithmetic conversions can turn unsigned integers into signed ones, depending on — you guessed it — the platform! Now, of course, you can try to detect this overflow and handle it, but be careful: your signed overflow check must not itself contain undefined behaviour, which can be non-trivial. Alternatively, if you want a guaranteed wraparound: sorry, there is no standard facility for this. Cast to unsigned and pray the system is two's-complement.

All of this gets worse for C once you use operations more complicated than addition and subtraction, by the way. In particular, signed multiplication overflow is far from trivial to test for. In fairness to C, though, 1999 JavaScript didn't make it easy either — but that was fixed a decade ago.

Bitwise shifts

There's a similar story here. a << b is always well-defined in JavaScript, no matter the values of a and b. In C, b must be in range for the size of a after the integer promotions — which depend on the platform! — and you mustn't try to touch the sign bit. Like the usual arithmetic conversions, the integer promotions can turn an unsigned integer into a signed one and create an unexpected undefined behaviour. At least these conditions are relatively easy to check for dynamically if you're sure you know the types involved…

Division

Division or modulo by zero is undefined in C. In JavaScript, it's just NaN!

Array indexing

Indexing an array with an out-of-bounds index in C is undefined. In JavaScript, it's merely undefined. ;)

Binary input and output

If you're manipulating integers, you probably want to get them in and out of your program. The most efficient formats for this are binary.

Ancient JavaScript didn't have standardised binary I/O, so in some sense this isn't a fair contest. But it did have binary-safe strings with 16-bit code units. That's a primitive you can use to safely consume any binary format, so long as you know what endianness (UTF-16LE or UTF-16BE) the original has been read with. You can interpret the string as little-endian and read an arbitrary byte with (s.charCodeAt(offset/2) >> ((offset%2) * 8)) & 0xff, for example.

C is another creature. Of course, it has always had standardised functions for I/O, including binary. But those facilities are difficult to use portably for anything other than text:

Endianness is critical, yet C has no standard facility to find out what endianness the host uses, nor to convert values to and from a specific endianness. Due to other problems, these are not trivial to handle yourself either!
Type sizes matter, yet C does not, as we have already established, guarantee to provide types with the specific sizes that are commonly used in binary formats (e.g. 32-bit two's-complement signed integer). There also aren't standard facilities for converting from those types to the types C does provide.
C does not even guarantee bytes are eight bits wide, nor are there standard facilities for converting to and from eight-bit bytes.
C has structs, which are convenient for representing binary data structures, yet since you can't control their layout, they aren't safe to use in I/O.
The most convenient and fast way to consume binary data is to read it in large chunks of bytes, and then use pointer casting to interpret the bytes as convenient. However, it is very easy to do something undefined here (due to alignment or strict aliasing violations), and it is non-portable for the previously elaborated reasons. Instead you should make many fread() or memcpy() calls, which is not much safer, possibly slower, and still questionably portable. Yay!

It gets worse for C, though. If we fast-forward to the early 2010's, nothing has changed here for C. In fact, still in 2022, nothing has changed here for C (though some heroic people are trying…). But early 2010's JavaScript has Typed Arrays and Data Views, which not only provide safe and portable facilities for interpreting arbitrary byte-streams of any alignment in all the popular binary formats, but convenient and even fast ones. I/O was solved in the early 2000's, too. The “close to the metal” language is worse at speaking to metal than the infamously high-level scripting language.

Conclusion

I'm not advocating for using JavaScript instead of C for integer programs in general, though I'm not advocating against it, and if those were the only choices for a truly safety critical project, I couldn't defend picking the latter. The real purpose of this blog post is to provoke thought about the problems with C, and perhaps questioning of preconceived notions about what domains low-level and high-level languages belong to. (I wanted to post this on the 1st of April because the thought amused me, but this post is sincere, not a joke. Sadly.)

Footnotes

ECMAScript 3rd Edition (1999)
The title originally said “integer programming”, but that's actually a technical term that isn't what I meant, so I've changed it now (2022-04-02).
ISO/IEC 9899:1999 (1999). Yes, I know there's been some changes since, but nothing of interest here.
I've spent years working with both of them and like them equally, I suppose. Neither would be my first choice for a new project that makes heavy use of integers.
For the sake of convenience I'll refer to the values produced and preserved by the bitwise operations as “integers”, even though old JavaScript did not have an integer type, and there are many more integers that can be represented when constrainted to floating-point operations (up to ±2⁵² without rounding).

hikari's blog