Building better software with better tools: sanitizers versus valgrind

We often have to write code using  permissive programming languages like C and C++. They tend to generate hard-to-debug problems that can crash your applications. Thankfully, many compilers offer “sanitizers”. I discussed them in my post No more leaks with sanitize flags in gcc and clang. I strongly encourage the use of sanitizers as I think it is the modern way to write C and C++. When many people describe how impossibly difficult it is to build good software in C and C++, they often think about old-school bare metal C and C++ where the code do all sorts of mysterious things without any protection. Then they feel compelled to run their code in a debugger and to manually run through it. You should not write code this way! Get some tools! Sanitizers can catch undefined behaviour, memory leaks, buffer overflows, data races, and so forth.

Sanitizers are a game changer, they bring C++ closer to languages like Java and Rust. They do not bring the safety to production, since you probably do not want to use sanitizers in production or release, but as you are building and testing your code, they help you a great deal catch potential issues right away.

A competitive solution that people often use is a great tool called “valgrind“. It is a general-purpose tool that checks your software as it runs. Under Windows, you have related programs like Application Verifier and WinDbg. Under macOS, valgrind used to be supported, but that is no longer the case.

I believe you should almost always use sanitizers when they are available. Here is a comparison between tools like valgrind and sanitizers.

    1. With the caveat that valgrind needs support for all the instructions your software is using, valgrind can run pretty much any software, even when you do not have the source code. Sanitizers work at the compiler level, so you need the source code. Thus if you need to debug a closed source library, sanitizers are unhelpful.
    2. Sanitizers can catch problems that valgrind will not catch. For example, it will catch undesirable undefined behaviour: code that may work right now but may not work if you use a different compiler or a different processor. They can catch unsafe memory accesses that will look safe to valgrind.
    3. Sanitizers are more precise. You often can turn on or off specific sanitizers for specific functions.
    4. If your compiler has sanitizers, you can run your tests with the sanitizers on simply by turning on some flags.
    5. Valgrind is slow. Like debuggers, it often does not scale. If you are working over large data sets, it might take a really long time. People often dismiss “execution time”, and it is easy to do if you work on toy problems, but performance is an essential quality-of-life attribute. I do not think you can run valgrind in a simulated production setting. However, you can compile your code with sanitizers and emulate a production setting. Sure, your throughput is going to be impacted, but the effect is not large. Code with sanitizers is not 10x slower, valgrind is.
    6. Sanitizers are relatively new and so the support is sometimes missing.
      • For example, under macOS, Apple does not yet ship a compiler that can detect memory leaks, you need to install your own compiler.
      • Even if you compile your code with debug symbols, it is common for the sanitizers to report the errors without proper links to the source code, you often need to fiddle with the system configuration.
      • Under Linux, when using GNU GCC, I have found it necessary to use the gold linker to get good results (-fuse-ld=gold): the default link frequently gives me errors when I try to use sanitizers.
      • The “memory sanitizer” that check that you do not read from uninitialized inputs is not available under GNU GCC and under LLVM requires you to manually replace the C++ standard library and possibly recompile all of your software with the sanitizer enabled (including all dependencies) if you want to avoid false positives.
      • And Visual Studio has some of its own sanitizers, but it is largely behind LLVM. Better sanitizers may be coming to Visual Studio 2019. (Update: better sanitizers have arrived.)
      • Furthermore, you cannot freely use all possible sanitizers at once.

So, sadly, there are cases when sanitizers are just not available to you. Yet I think it is a safe bet that all competitive C/C++ compilers will soon have powerful sanitizers.

Daniel Lemire, "Building better software with better tools: sanitizers versus valgrind," in Daniel Lemire's blog, May 16, 2019.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

19 thoughts on “Building better software with better tools: sanitizers versus valgrind”

  1. I’m not sure how one would read that list and come to the conclusion you should almost always prefer santizers. Things like “install a new compiler” seem like they might be pretty major roadblocks to someone who just wants a quick solution.

    I use both tools and both are very useful in the areas where they overlap (so I’m only talking about ASAN-type sanitizers which roughly overlap with valgrind not things like UBSAN which have no counterpart).

    Valgrind catches one giant source of corruption that ASAN sanitizers doesn’t: uninitalized reads. That has to be one of the biggest practical memory safety issues in C and C++, so failing to catch that is IMO a showstopper for ASAN-only-never-valgrind. ASAN also doesn’t catch all problems even in the categories it supports because it tracks memory accesses only at a certain granularity, so wrong accesses that fall within that granularity aren’t caught.

    I just say “use both tools”.

    That said, I usually use valgrind first, because it’s 9 characters away: just replace foo with valgrind foo and you are off and running: no need to recompile, no need to worry about compiler support, recompiling dependencies, etc. For most small projects valgrind I need because I never have complicated enough issues to go beyond that.

    On most projects, just use both: build and run your test suite with the sanitizers for sure, and now-and-then with valgrind to catch things the sanitizers don’t. Making sure the santized versions of the binaries available in your build goes a long way to reducing the friction for santizer use. Note that most sanitizers cannot be enabled together, so if you want to use “all” the sanitizers you’ll need to build many binaries. Maybe concentrate first on the most interesting ones.

    Sanitizers are great but this idea they are so much better that Valgrind is just … weird. I find them very comparable (again, in the area where they overlap).

      1. Agreed, but to be clear almost no one uses MSAN (I have never seen it used in the wild), because it requires all transitive dependencies to be recompiled using MSAN including the standard libraries.

        Have you used a project using MSAN or have you used it yourself?

        When people talk about good santizers to use, as you have done here, I always assume they are not talking about MSAN. If my assumption is wrong, let me know.

        1. You mean -fsanitize=memory?

          There are two reasons why I do not use this flag. The first one is that, to my knowledge, it is not supported by GNU GCC. The second one, and most important one, is that for what I do, this sanitizer throws off false positives (just like valgrind). Now, unlike valgrind, and given enough compiler maturity, I will be able to use this sanitizer after white-listing code using appropriate attributes. This may even be possible today, but given the lack of support in GNU GCC, I have not bothered.

          It is possible that you are right, and that this sanitizer misses too much if you link with unsanitized libraries. But given that I am complaining about false positives, I would say that this has not been my concern.

          1. Yes, MSAN is -fsanitize=memory.

            No, I’m not saying that MSAN “misses too much” if you don’t recompile all dependencies. I’m saying that MSAN requires you to recompile all dependencies as a basic requirement for it even to work properly. There is no mode where you run it on a subset of binaries. The MSAN documentation tells you straight up “you have to recompile everything”. It is not like ASAN in this regard.

            Even for a basic project recompiling all the transitive dependencies of the various OS-provided libraries is not for the faint of heart and in some cases it is impossible (e.g., where source is not provided). On a larger project it becomes a bigger mess.

            I doubt you or I will ever use MSAN unless this requirement disappears, and in any case there is a lot of functional overlap with ASAN so there is little pressure to get this to work in a more friendly way. You can basically consider it a Google-internal tool that happens to be available publicly, but is of little actual use to people who don’t already have the entire forest of dependencies recompiled for MSAN.

            About false positives, I am not sure what you are referring to, but if it were true that Valgrind suffers from false positives that ASAN does not (in scenarios were both can detect non-false positives), it would be a big point in favor. Details?

            Are you referring to something like where you do something that is normally invalid (e.g., reading uninitialized memory, writing outside the bounds of allocated memory) but you somehow know it is correct (e.g., because you end up masking off the result of the uninitialized data, or because you expect to trap the OOB write with a signal handler and do something smart)?

            In that case I don’t think any sanitizer can avoid some types of false positives in those scenarios: you are doing something “wrong” by the rules of the game (language rules, or asm-level rules in Valgrind’s case), so you have to understand each case and whitelist them. In my experience, Valgrind has more sophisticated analysis in an attempt to avoid false positives: it doesn’t flag errors right away, but instead tracks “invalid bytes” through the code until it decides that an irreversible action uses the value of those bytes, such as jumping based on their value, has taken place. OTOH AFAICT ASAN flags errors right away. Both have their benefits: ASAN’s approach is simpler to understand and implement (probably the Valgrind technique relies on its VM architecture), while Valgrind’s approach reduces false positives.

            Of course, it may be entirely true that Valgrind has more false positives because it detects the “uninitialized memory read” issue that ASAN ignores, and that’s probably the largest source of false positives because uninitialized reads are more likely to be innocuous compare to out-of-bounds writes. You can hardly hold that against Valgrind though since it is additional functionality that leads to that: ASAN never false-positives on uninit reads because it never, ever detects such reads in the first place. If you really don’t want to detect uninit reads, I guess just turn that part off?

            Anecdotally I get zero false positives with Valgrind on every project I have used it on recently. I doubt the whitelisting capability of Valgrind are different than ASAN although I’d be interested to hear differently.

            1. My reading of the documentation is that if you don’t recompile everything, the memory sanitizers will issue false positives. Do you have another reading of the documentation?

              Because I mostly write C-like code, whether the extension is cpp or not, I have never encountered this issue of having to recompile the C++ library to do away with the C++-specific false positives. I agree that it is a crippling limitation. It is certainly an argument in favour of valgrind that I did not have before.

              As to the other point…

              I get false positives all the time with valgrind, or rather, I get emails from people who complain about bugs that valgrind found in code that I wrote… that are provably not bugs…

              I quickly made a toy example:


              (This is not code that makes sense on its own. It is an illustration.)

              To be clear, what I mean by false positive is not that valgrind is wrong per se. Uninitialized memory is indeed read in this example. What I mean is that there are legitimate uses of code that does not initialize everything. It is dangerous code, to be sure, but allocating a large buffer, and initializing only part of it, is a common paradigm in C (though, admittedly, STL/C++ makes it hard to do).

              My example will pass the memory sanitizers because I have white-listed the offending function. It is possible that I can do the same with valgrind. If so, I’d love to know how!

              1. We should be precise when talk about sanitizers. I think 90% of the time talk about “memory sanitizers” they actually mean ASAN (-fsanitze=address) and not MSAN (-fsanitize=memory) since ASAN is, in my experience, used an order of magnitude more than MSAN (because of its wide support and ease of use). MSAN got the better name, leading to confusion when people talk about lower-case-m memory sanitizers.

                That’s why I try to refer to MSAN and ASAN specifically.

                About the documentation, I looked it up twice. Once when I first started to use the other sanitizers, long ago, and I noticed the existence of both MSAN and ASAN. I stopped when I read that MSAN requires specially instrumented runtime libraries.

                The second time was fairly recently, when I spent a while debugging a sporadic uninitialized read issue using ASAN. Of course ASAN was never going to find it, because ASAN doesn’t find that issue ever, and it prompted me to look again at MSAN because it does. Again I found text like this:

                It is critical that you should build all the code in your program
                (including libraries it uses, in particular, C++ standard library)
                with MSan.

                That, plus anecdotal reports that you can’t use it without rebuilding means I didn’t go further.

                Yes, you can find text that suggests that if you don’t do this, you’ll get “false reports”, but I don’t think that puts it in the same category of “false reports” you get from (actual) uninitialized reads in Valgrind: I think it becomes totally unusuable since any memory written by un-instrumented libraries becomes invisible to the sanitizers. When un-instrumented libraries includes the standard libraries, you are are dead in the water.

                I quickly made a toy example:

                Yes, this falls into the category of “needs a human to intervene”.

                I would also argue that this function is not correct in general when it reads beyond the end of the initialized array: if the initialized part of the array has no zeros, the answer will depend on the uninitialized part, no? Your example has a zero in the array, so this particular call site will always return “true” but I think the function is bugged in the general case, so Valgrind’s warning would be a good thing here. I pick on this point to point out that correctness is subtle even for people, and you’ll never close this false positive hole permanently: even with perfect tracking of used bits or bytes you can have semantics that don’t appear in the code like “strings will be zero terminated”.

                I think 100% of existing sanitizers that can handle uninitialized reads will also trigger on this example. Certainly MSAN will.

                In fact if you write a lot of other toy examples that will trigger MSAN, you’ll find that Valgrind won’t trigger: because Valgrind has (a) a bunch of heuristics for common “innocent” real-world unit-read and OOB read patterns that are known-safe and (b) is able to track validity at the byte (bit?) level from the origin of the uninit/OOB read to see if something bad is actually done with it.

                Maybe you even wrote some other simpler examples first and found they didn’t trigger Valgrind: that is the validity tracking at work. In this example the _mm256_testz_si256 call “taints” the entire result if any bytes were tainted (after all, the result in principle depends on all bytes), and then you use the result, so that’s why it triggers. One might argue that Valgrind could apply a more specific rule here: calculate the result of the _mm256_testz_si256 on the valid bits alone, and then on the full registers including invalid bits, and only if the result is different should it trigger a warning. That would let you case pass, but still trigger if you had a[1]='b' instead. You can go pretty far down that rabbit hole though, and by “pretty far” I mean “forever”. One might also argue that it is better to fail always here since this is almost always a bug, so just whitelist the remaining cases.

                It is possible that I can do the same with valgrind.

                Valgrind definitely has whitelisting capabilities, and a search should turn up the details. Indeed, it comes with a default list of whitelist rules since otherwise maybe even “Hello World” wouldn’t run clean.

                Maybe you want to whitelist at the source level though? I’m not sure if that is possible with Valgrind since it works on the binary, not the source. That’s a nice advantage for sanitizers for sure, that you can include the whitelist “annotation” together with the source, and exactly at the place in the source where it applies.

                but allocating a large buffer, and initializing only part of it, is a common paradigm in C

                Agreed, but to be clear Valgrind has no issue with this in general, as it tracks which parts of the buffer have been initialized and only complains when you read and then subsequently use the unitialized part. That is much more rare and generally only appears in highly optimized code which is able to read slightly past the end of the buffer (yes, this is the code that you happen to deal with every day, so to you it probably seems like a lot of false positives).

                IMO it is practical to whitelist these few places, or do what I do as a mitigation: just initialize (e.g.) at least 31 bytes past the last actually initialized bytes, so that the over-reads fall in that range.

                1. That is much more rare and generally only appears in highly optimized code which is able to read slightly past the end of the buffer […]

                  Based on personal experience, I would consider such code to be broken. It tends to work most of the time, until your buffer happens to end on a page boundary and the next page is unmapped. The result is a segfault that is exceedingly hard to reproduce.

                  It is more effort to be careful, but doesn’t affect performance in practice. Basically you create a loop like this:

                  for (;;) {
                  if (!close_to_end(p))
                  else if (!reached_end(p))

                  You have to check for the end anyway. If you check for end-15 or end-31, it doesn’t make a difference in practice. The careful operation is slower, but you only use it a few times at the end of the loop.

                  1. Based on personal experience, I would consider such code to be broken.

                    I do not disagree with this sentiment. Yet it is a cheap one-time cost to check whether you are going to cross a page.

                    1. Is that test cheaper than the test for end-31?

                      You’d still have to test for end, so you’d end up with two conditionals instead of one. Also, given a multi-page buffer, you’d hit “false positives” every time you cross a page within the buffer and a) needlessly use the slow version and b) pollute the branch predictor.

                      Main annoyance is that you need two copies of whatever you are doing. I have memcpy_fast and memcpy_slow, etc. So the cost is primarily to the human coder, not runtime cost.

                    2. Is that test cheaper than the test for end-31?

                      You are thinking about a simple loop with no jumps or backtracking.

                      Main annoyance is that you need two copies of whatever you are doing. I have memcpy_fast and memcpy_slow, etc. So the cost is primarily to the human coder, not runtime cost.

                      Correct. I think that’s how it is going to play out. Either you build simple and slow code, or else you build more complex but faster code. The “safe buffer overflow” part allows you to save complexity and still get the performance.

                      Of course, experienced programmers know that complexity is not free. Throw enough complexity into a code base and soon enough progress slows down to a crawl.

                      Important note: I do not disagree with you.

                  2. Yes, I was talking about the type of code which reads past the end of the buffer, but makes sure it is safe first (typically by ensuring it will not cross into the next page).

                    The pattern you mention is one way. I don’t necessarily agree it’s free though: the careful_operation often involves a loop (i.e., to handle the last N bytes close to the end) which can be mispredicted, adding another mispredict to the one you usually suffered on exiting the main loop. Also, of course the careful code is usually slower too (since otherwise you’d just do everything the careful way), but maybe that can’t be avoided.

        2. I’ve used it internally at Google and have used it on a few personal projects. The biggest thing I’ve managed to use it with is Stan. It was a pain but only took me an afternoon to get running. You’re right that it doesn’t work at all for projects with closed source dependencies.

  2. One way valgrind has been useful in R is in monitoring incorrect use of the language’s own heap — you can tell it that particular parts of memory that are legal from the OS’s point of view are illegal from the point of view of your program.

  3. I find this interesting open source tool:

    Safe Memory-Leak Fixing for C Programs

    LeakFix is a safe memory-leak fixing tool for C programs. It builds procedure summaries based on pointer analysis results, uses the summaries to analyze each procedure separately, and then generates fixes by inserting “free(..)” at certain program points.

    Other resources on the matter:

  4. I’ve used Valgrind extensively and lately picked up the sanitizers.

    In my opinion Valgrind is much more mature. The sanitizers have bugs and way more false positives. The latter is a serious problem:

    In one of my code bases I had a 100% false positive rate and spent a
    week following false leads.

    The sanitizers have found exactly one thread issue in another code
    base. Valgrind rarely has false positives and has widely published suppression files for toolchain bugs.

    So, by all means, use the sanitizers, but I don’t think they are superior to Valgrind in any way.

    1. Part of the reason why sanitizers are more likely to have more false positives is that they can do much more. But though I am sure there are bugs, I have not encountered them. The false positives were logical.

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.