When coding efficient algorithms having to do with hashing, random number generations or even cryptography, a common construction is the expression “-n%n“. My experience has been that it confuses many programmers, so let us examine it further.
To illustrate, let us look at the implementation of std::uniform_int_distribution found in the GNU C++ library (Linux) and clean up the line in question:
threshold = -range % range;
The percent sign (%) in this expression refers to the modulo operation. It returns the remainder of the integer division. To simplify the discussion, let us assume that range is strictly positive since dividing by zero causes problems.
We should pay attention to the leading minus sign (–). It is the unary operator that negates a value, and not the subtraction sign. There is a difference between “-range % range" and “0-range % range". They are not at all equivalent. They will actually give you different values; the latter expression is always zero. And that is because of the priority of operation. The negation operation has precedence on the modulo operation which has precedence on the subtraction operation. Thus you can rewrite “-range % range" as “(-range) % range". And you can write “0-range % range" as “0- (range % range)“.
When the variable range is a signed integer, then the expression -range % range is zero. In a programming language with only signed integers, like Java, this expression is always zero.
So let us assume that the variable range is an unsigned type, as it is meant to be. In such cases, the expression is generally non-zero.
We need to compute -range. What does it mean to negate an unsigned value?
When the variable range is an unsigned type, Visual Studio is likely to be unhappy at the expression -range. A recent Visual Studio returns the following warning:
warning C4146: unary minus operator applied to unsigned type, result still unsigned
Nevertheless, I believe that it is a well defined operation in C++, Go and many other programming languages. Jonathan Adamczewski has a whole blog post on the topic which suggests that the Visual Studio warning is best explained by a historical deviations from the C++ standard from the Microsoft Visual Studio team. (Note that the current Visual Studio team seems committed to the standards going forward.)
My favorite definition is that –range is defined by range + (-range) = 0. That is, it is the value such that when you add it to range, you get zero. Mathematicians would say that it is the “additive inverse”. In programming languages (like Go and C++) where unsigned integers wrap around, then there is always one, and only one, additive inverse to every integer value.
You can define this additive inverse without the unary negation: if max is the maximum value that you can represent, then you can replace –range by maximum – range + 1. Or, maybe more simply, as (0-range). And indeed, in the Swift programming language, this particular line was represented as follow:
let threshold = (0 &- range) % range
The Swift language has two subtraction operations, one that is not allowed to overflow (the usual ‘-‘), and one that is allowed to overflow (‘&-‘). It is somewhat inconvenient that Swift forces us to write so much code, but we must admit that the result is probably less likely to confuse a good programmer.
In C#, the system will not let you negate an unsigned integer and will instead cast it as a signed integer, so you have to go the long way around if you want to remain in unsigned mode, like so…
threshold = (uint.MaxValue - scale + 1) % scale
This expression is unfortunately type specific (here uint).
To conclude: you can learn a lot just by examining one line of code. To put it another way, programming is a much deeper and complex practice than it seems at first. As I was telling a student of mine yesterday: you are not supposed to read new code and understand it right away all of the time.
Sorry, but I miss the point of such code, then. If the value of -range % range is always 0, what’s the intent?
It is only zero when you have signed types. It is non trivial when you have unsigned types. You should be able to test it out for yourself.
Petite coquille : “just by examining just”
I reckon a whole series of posts could be inspired by that statement.
That resonated with me, it’s the opposite of “software should be readable like proze” which always sounded odd to me.
Tiny amendment: % is the percent sign, not the ampersand (that’s &).
This is a neat trick! But I found something surprising with
unsigned short
: https://gcc.godbolt.org/z/4xWo1G. It looks like-n
is type-promoted (probably to a signed integer? I haven’t checked the standard) so unless you re-cast it (I’m not sure that cast isn’t UB…) you don’t get the expected result.You do an excellent job of explaining what this seemingly odd construction does, but it might be good to mention in the intro why “-n %n” is a common construction in the first place. Stealing from an HN comment making the same suggestions, “It computes (2^b) % n, assuming n is an unsigned b-bit integer. You can’t do this directly, since 2^b itself doesn’t fit into a b-bit integer.”
Yes, this fault in my blog post was pointed out to me on twitter.
A nit: I believe you meant to write ‘percent’ (%) and not ‘ampersand’ (&). cheers.
I dont understand. I tried it in C# and printed out the result of -8%8 and i got 0 so i dont get what is supposed to be happening
In C#, try (4294967295 - n + 1) % n if n is of type uint.
I noticed unexpected results with unsigned short and unsigned char. It’s worth noting that -n will be converted to a signed int if n is a narrower type.
I’m finding out the hard way that gcc 10 seems to optimize this expression to 0 for unsigned values. I’m having to cast to signed to get the unary minus to take effect, and then cast back to unsigned for the modulo to work the way it should. YMMV.
See my reference to the standard C++ library in my blog post. It is used in GCC 10.
This sentence is a bit confusing to read. If unsigned integers cannot overflow (i.e. unsigned overflow doesn’t exist), then there can be no (unsigned) additive inverse.
By the way, I expected this post to discuss something else entirely. In Python, you have the following:
>>> (-3) % 5
2
>>> (-3) % (-5)
-3
>>> 3 % (-5)
-2
But in C (and presumably other C-like languages), the results are -3, -3, 3 respectively.
I agree. I rephrased it as “wrap around”.
The division and remainder (modulo) between signed integers are defined differently depending on the programming. E.g., we all agree that 4/3 is 1. But what is -4/3? It could be -2 or -1. You can round toward zero or toward minus infinity.