I hope you enjoy this whirlwind tour. Which language do you suppose is my favourite?
C offers many integer sizes (
long long) and guarantees the minimum size of each size, but not much else: you're at the mercy of the platform you're targeting. Bytes are at least 8 bits.
long might be larger than
int, which might be larger than
short. What's the range of a signed integer? It might be [-2⁽ⁿ⁻¹⁾, 2⁽ⁿ⁻¹⁾-1], or it might be [-2⁽ⁿ⁻¹⁾+1, 2⁽ⁿ⁻¹⁾-1]. It is difficult to handle all the possibilities!
Let's do some arithmetic on integers!
a + b with integer inputs could overflow and produce a value that's no longer an integer. The resulting value is well-defined and guaranteed by the standard, so it's not the end of the world. But if you want to detect this overflow and handle it, it's easy:
if ((a + b) > 0x7FFFFFFF || (a + b) < -0x80000000). If you would prefer to have a two's-complement signed integer wraparound, that's also easy:
(a + b) | 0. If you would like an unsigned integer wraparound, add
0x100000000 to that then modulo.
a + b with integer inputs could overflow. If the types of
b are unsigned after the usual arithmetic conversions, then you'll get an unsigned integer wraparound and the resulting value is guaranteed by the standard — though what exactly it is depends on the size and corresponding range of the integer type, which, as we've established, depends on the platform!
But that's just the nice case. If the types of
b are signed after the usual arithmetic conversions, then if the result would overflow, you've triggered undefined behaviour. Not only the resulting value is undefined then, but the behaviour of every aspect of the entire program! Whether the result would overflow depends, of course, on the size and range of the type, which depends on the platform. Oh, and by the way, the usual arithmetic conversions can turn unsigned integers into signed ones, depending on — you guessed it — the platform! Now, of course, you can try to detect this overflow and handle it, but be careful: your signed overflow check must not itself contain undefined behaviour, which can be non-trivial. Alternatively, if you want a guaranteed wraparound: sorry, there is no standard facility for this. Cast to unsigned and pray the system is two's-complement.
There's a similar story here.
b. In C,
b must be in range for the size of
a after the integer promotions — which depend on the platform! — and you mustn't try to touch the sign bit. Like the usual arithmetic conversions, the integer promotions can turn an unsigned integer into a signed one and create an unexpected undefined behaviour. At least these conditions are relatively easy to check for dynamically if you're sure you know the types involved…
Binary input and output
If you're manipulating integers, you probably want to get them in and out of your program. The most efficient formats for this are binary.
(s.charCodeAt(offset/2) >> ((offset%2) * 8)) & 0xff, for example.
C is another creature. Of course, it has always had standardised functions for I/O, including binary. But those facilities are difficult to use portably for anything other than text:
- Endianness is critical, yet C has no standard facility to find out what endianness the host uses, nor to convert values to and from a specific endianness. Due to other problems, these are not trivial to handle yourself either!
- Type sizes matter, yet C does not, as we have already established, guarantee to provide types with the specific sizes that are commonly used in binary formats (e.g. 32-bit two's-complement signed integer). There also aren't standard facilities for converting from those types to the types C does provide.
- C does not even guarantee bytes are eight bits wide, nor are there standard facilities for converting to and from eight-bit bytes.
- C has structs, which are convenient for representing binary data structures, yet since you can't control their layout, they aren't safe to use in I/O.
- The most convenient and fast way to consume binary data is to read it in large chunks of bytes, and then use pointer casting to interpret the bytes as convenient. However, it is very easy to do something undefined here (due to alignment or strict aliasing violations), and it is non-portable for the previously elaborated reasons. Instead you should make many
memcpy()calls, which is not much safer, possibly slower, and still questionably portable. Yay!
- ECMAScript 3rd Edition (1999)
- The title originally said “integer programming”, but that's actually a technical term that isn't what I meant, so I've changed it now (2022-04-02).
- ISO/IEC 9899:1999 (1999). Yes, I know there's been some changes since, but nothing of interest here.
- I've spent years working with both of them and like them equally, I suppose. Neither would be my first choice for a new project that makes heavy use of integers.