No more leaks with sanitize flags in gcc and clang

If you are programming in C and C++, you are probably wasting at least some of your time hunting down memory problems. Maybe you allocated memory and forgot to free it later.

A whole industry of tools has been built to help us trace and solve these problems. On Linux and MacOS, the state-of-the-art has been valgrind. Build your code as usual, then run it while under valgrind and memory problems should be identified.

Tools are nice but a separate check breaks your workflow. If you are using recent versions of the GCC and clang compilers, there is a better option: sanitize flags.

Suppose you have the following C program:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv)
{
   char * buffer = malloc(1024);
   sprintf(buffer, "%d", argc);
   printf("%s",buffer);
}

Save this file as s.c. The program should simply print out how many arguments were entered on the command line. Notice the call to malloc that allocates a kilobyte of memory. There is no accompanying call to free and so the kilobyte of memory is “lost” and only recovered when the program ends.

Let us compile the program with the appropriate sanitize flags (-fsanitize=address -fno-omit-frame-pointer):

gcc -ggdb -o s s.c -fsanitize=address -fno-omit-frame-pointer

When you run the program, you get the following:

$ ./s

=================================================================
==3911==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1024 byte(s) in 1 object(s) allocated from:
    #0 0x7f55516b644a in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x9444a)
    #1 0x40084e in main /home/dlemire/tmp/s.c:6
    #2 0x7f555127eec4 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)

SUMMARY: AddressSanitizer: 1024 byte(s) leaked in 1 allocation(s).

Notice how it narrows down to the line of code where the memory leak came from?

It is even nicer: the return value of the command will be non-zero meaning that if this code was run as part of software testing, you could automagically flag the code as being buggy.

While you are at it, you can add other sanitize flags such as -fsanitize=undefined to your code. The undefined sanitizer will warn you if you are relying on undefined behavior as per the C or C++ specifications.

These flags represent significant steps forward for people programming in C or C++ with gcc or clang. They make it a lot more likely that your code will be reliable.

Really, if you are using gcc or clang and you are not using these flags, you are not being serious.

Further reading: Building better software with better tools: sanitizers versus valgrind

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

36 thoughts on “No more leaks with sanitize flags in gcc and clang”

    1. The caveat is that they are only available on recent versions of the compilers but I stress that they are no longer “experimental” or “bleeding edge”. They work out-of-the-box without fiddling, without unnecessary bugs.

      I don’t know whether they work for all targets, but I suspect that they must.

  1. Will this functionality find memory allocations in more complex scenarios? Like when you allocate mem at some point, pass it around the app and the forget about deleting?

    BTW: Is there something similar for MSVC? You can use clang with VS 2015… so maybe that way you can take advantage of it somehow?

    1. Will this functionality find memory allocations in more complex scenarios? Like when you allocate mem at some point, pass it around the app and the forget about deleting?

      Of course. Though it will only tell you where the memory was allocated, not where it should have been freed.

      BTW: Is there something similar for MSVC? You can use clang with VS 2015… so maybe that way you can take advantage of it somehow?

      I don’t think you can “use clang with VS 2015”. As far as I can tell, Microsoft only allows you to use the clang parser. These sanitizers have to do with the generated code, not merely the parser. So it is different.

      1. aaa… right, so the sanitizers cannot be invoked from VS and thus you cannot use this feature.
        I hope VS will create something similar soon…

  2. It’s a bad idea to run with these sanitizers outside of testing environments though, definitely not in production.

    As far as I’m aware there’s been no effort to ensure the security of the sanitizer runtimes themselves, so even if they protect against memory bugs in application code, there are pretty huge security holes in the runtimes. See: http://seclists.org/oss-sec/2016/q1/363

    They’re great for testing though (we run address-sanitizer builds as part of our regular testing).

    1. It’s a bad idea to run with these sanitizers outside of testing environments (…)

      Though I was maybe not sufficiently clear in my blog post, I meant to refer to these sanitizers as superior alternatives (or complements) to other testing and debugging tools like valgrind.

      However, since they can help produce better code, I think that they may end up generating more secure software.

  3. hello Daniel, I would like to ask you a question. Do you know why the AddressSanitizer would be taking a whole different set of libraries.

    For instance, I was trying to recreate strcmp, but what I realized is that compiling it normally it just gives me the difference, but with -fsanitize=address it gives me 1, 0, -1 outputs.

    Thanks

    1. Do you know why the AddressSanitizer would be taking a whole different set of libraries.

      I very much doubt that it is what it is doing.

      I was trying to recreate strcmp, but what I realized is that compiling it normally it just gives me the difference, but with -fsanitize=address it gives me 1, 0, -1 outputs.

      Can you post your code?

      1. Here is my source in the left and the two different outputs in the right:
        http://imgur.com/uU2SZyB

        You can clearly see that output of libc strcmp changes from difference to hardcoded outputs of 1, 0, -1 only when the -fsanitize=address is used.

        Btw, this is my testfile:

        #include
        #include
        #include “libft.h”
        #include

        int a, b, i, n;
        char *ra, *rb;

        i = 0;
        n = 1000;
        while (i <= n)
        {
        if (i < n)
        {
        ra = strdup(ft_itoa(arc4random()));
        rb = strdup(ft_itoa(arc4random()));
        }
        else
        {
        ra = "cba";
        rb = "cba";
        }
        a = ft_strcmp(ra, rb);
        b = strcmp(ra, rb);
        if (a != b)
        printf("\033[1m\033[31m[ FAIL ]\x1b[0m: str1: [%s] \t| str2: [%s] \t| ft_strcmp: %d\t| strcmp: %d\n", ra, rb, a, b);
        else
        printf("\033[1m\033[32m[ OK ]\x1b[0m: str1: [%s] \t| str2: [%s] \t| ft_strcmp: %d\t| strcmp: %d\n", ra, rb, a, b);
        i++;
        }
        }

  4. For future readers, here a code sample to reproduce the issue:

    #include <stdio.h>
    #include <string.h>
    
    int main() {
      const char * ra = "1375154539";
      const char * rb = "-497308599";
      printf("%d \n", strcmp(ra, rb));
    }
    
    1. Were you able to get a different output as well with the -fsanitize=address?
      I am on OSX 10.11.6 btw

      And this is the configuration of gcc/clang in this machine:
      Configured with: –prefix=/Applications/Xcode.app/Contents/Developer/usr –with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/c++/4.2.1
      Apple LLVM version 8.0.0 (clang-800.0.38)
      Target: x86_64-apple-darwin15.6.0
      Thread model: posix
      InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

      1. Yes. I am able to reproduce the issue and it can be explained by looking at the code of the sanitizer in LLVM:

        https://github.com/llvm-mirror/compiler-rt/blob/35f212efc287a7b582afcb41d86bdff7a29e7367/lib/sanitizer_common/sanitizer_common_interceptors.inc#L757

        As you can see in this code, the sanitizer has its own implementation of the memcmp function. It calls CharCmpX which you can find in the file above, and that returns -1, 0, 1.

        There is a set of functions that are re-implemented with various safety checks in this manner.

        So it is not loading a whole other library, it is simply the compiler handling these functions as special cases.

        Note that if your code relied on getting specific values out of memcmp, then it was wrong as per the standard.

        1. Thank you very much for this answer.
          Everyone else I asked was handwaving it or not really caring about this by just saying that “you should not use strcmp that way anyways, so why bother”.

          But now my next question is, what would they decide to intercept strcmp and the other functions, is it really a security risk to be doing s1 – s2?

          1. But now my next question is, what would they decide to intercept strcmp and the other functions, is it really a security risk to be doing s1 – s2?

            It is definitively wrong for your code to assume a specific implementation of strcmp.

            1. Yeah, I get that, so are you saying that that the reason that they intercept the strcmp by addresssanitizer is to check if someone is using the strcmp implementation improperly?

              *Scratching my head*

              I would like to know what is the rationale for the devs to add these intercepts:

              #if SANITIZER_INTERCEPT_STRCMP
              static inline int CharCmpX(unsigned char c1, unsigned char c2) {
              return (c1 == c2) ? 0 : (c1 < c2) ? -1 : 1;
              }

              I am interested in the decision making process and the reasons behind them, thanks!

              1. I guess you would want to see…

                int CharCmpX(unsigned char c1, unsigned char c2) {
                  return c2 - c1;
                }
                

                As far as I can tell, the result of this function is not well defined in C. The subtraction is fine, but the assignment to a signed integer is implementation dependent.

                1. Btw my colleagues in school are telling me to just not use addresssanitizer and to use “leaks” or “valgrind” instead.

                  And that AddressSanitizer actually doesn’t work in OSX.

                  Thank you a lot for your patience.
                  What would you be ur input on that?

                  1. I comment on valgrind in my blog post and why I think that using the sanitizers in your compiler are better. And yes, the sanitizers work under macOS, they are officially supported by Apple (as of Xcode 7).

                    1. Yes, I just wanted to reconfirm, thank you very much.
                      If you have a bitcoin address, let me tip you!

                      I learned a lot!

  5. Hello,

    I’ve tried your C code on macOS Mojave, compiled with:

    clang -std=c17 -Wall -pedantic -g -fsanitize=address fsanitize=undefined -fno-omit-frame-pointer test.c -o test

    At runtime there is no leak info printed. Same code works as expected on Linux. Any suggestion ?

    Thanks

      1. Forgot to mention that in my previous comment I’ve used a custom build of GCC 8.2. You are right about Apple’s Clang, it doesn’t detect memory leaks.

  6. I’m not following the “separate check breaks your workflow” issue with Valgrind.

    Both sanitizers and valgrind require you to modify the default process for building and running your executable: sanitizers by running a modified build command and valgring by modifying the run command.

    In practice both when both are automated, there is little difference.

    If running this “manually” it is not clear to me that one is much better than the other: one requires you to rebuild with the new flags and then run your normal command, the other requires you to run your normal command prefixed with valgrind. One could argue that valgrind is somehow more convenient since you can choose on each invocation whether to use it or not, and without rebuilding the binary (especially useful when you don’t have the source for all the components). On other hand, some might like the always-on behavior of the santizers.

    That said, I have absolutely nothing against the sanitizers, they are great! IMO their main benefits are not what you mention by rather:

    They are much faster than Valgrind, in some cases making it feasible to leave them on in distributed binaries (although this is currently uncommon).
    They catch a different set of issues, even where the domain overlaps with Valgrind: by having source-level access they can detect invalid accesses, e.g., in between structures allocated together and on the stack, that Valgrind can’t.
    There are many santizers that do things totally outside the scope of Valgrind such as undefined behavior.

    On the other hand Valgrind works without source, doesn’t require a rebuild and is compiler-independent.

    Serious projects would do well to use both.

    1. I am not advocating against the use of valgrind. I use valgrind all the time (more later on that). I am arguing however that, for most people and most projects, sanitizers are the way of the future. And yes, it is in large part due to workflows.

      Valgrind is an extra tool not typically bundled with your compiler. Having an extra dependency as part of your build system is not ideal. I know it is fashionable to build projects with long lists of dependencies, but I think it comes at a cost.

      I can automate sanitizer flags as part of a cmake build trivially (I have do so in many projects). As far as I can tell, a tool like cmake does not come with support for valgrind. You will probably need to instruct your users to install valgrind if you depend on it. Note that you need to check the valgrind version because valgrind needs to interpret all instructions in your binary (an old valgrind won’t do).

      If you are using gcc and clang already, you have the sanitizers at your disposal, just one flag away. At least under Linux, the sanitizers will catch many more problems than valgrind.

      The instrumented code will run faster. I did not point it out but I am glad you did: valgrind is unusable for some use cases. A sanitize test that would run in under a second could take a minute to run under valgrind thus, effectively, breaking your workflow.

      It is even better with sanitizers because you can also easily instruct the compiler to only check some of the code with sanitizers (at least under clang). In a large project, this can make a massive difference.

      Sanitizers have been getting fancier and more useful over time. It is now not just undefined instructions. There are more sanitizers that get added. You can also detect use them to debug data races. I really stand by my statement that serious projects should use sanitizers. I also make the prediction that they will keep on getting more useful with each new generation of compilers.

      There are benefits to valgrind, of course. You point one out: it allows you to run a check on an unmodified binary. That’s important because sanitizer flags, as far as I know, always create a modified binary. So you are not checking the binary that will actually run (assuming you run code without sanitizers), and that’s a concern.

      Also, sanitizers are still kind of new and sometimes flaky. So, for example, the sanitizers under macOS do not detect leaks (they do detect access violations however). Valgrind does detect leaks.

      Implicit in my assessment is the predictions that the sanitizers will fix their problems and become easier to use. So far, this prediction has stood.

      Valgrind will always remain useful but, in my view, not as a central components of the regular workflow of most projects, the way sanitizer flags should be.

  7. I think you are wrong that it is easier to add sanitizers to a typical workflow (unless you are talking about the scenario where you turn the sanitizers on always, i.e., only produce a sanitized binary). Of course it is easy to add sanitizers to CMake, because CMake is a build system and sanitizers are added at build time. All versions of CMake (even those that don’t natively support sanitizers) perfectly supports Valgrind at build time because there is nothing you need to do to your build to support Valgrind.

    Every type of continuous integration framework is going to support running the command as you want it, which includes prefixing valgrind. I have worked on many such systems and adding Valgrind would never be harder than adding the sanitizers! Adding the sanitizers is often harder because it requires a different set of specially built-time instrumented binaries. Imagine you are testing 100s or 1000s of changes a day: you’ll have some type of sophisticated system for moving the build artifacts around to the places they need to get to, and suddenly multiplying the number of build artifacts by N (for the N types of sanitizers you want to run) is a big deal. Setting up Valgrind is trivial: it works just like any another test on the existing binaries.

    Also, at least in the early days, and to some extent now, the “dependency” for sanitizers was much worse that Valgrind: you need specific compiler versions or even a different compiler (clang had it first and is still ahead of gcc)! Valgrind is an independent component that works with any compiler. Yes, it’s a dependency – but most serious projects have dozens or 100s of them. Changing the compiler version is a far bigger, cross-cutting concern compared to adding a dependency.

    Sanitizers are better because they cover many more issues, and do it faster (and this difference is fundamental because they work at the source level). They will probably “win” in the long run due to those advantages and because the amount of resources poured into clang is orders of magnitude more than a tool like Valgrind. I reach the opposite conclusion as you though: sanitizers can win despite being harder to integrate into existing workflows!

    BTW: it’s the same story for profilers: some profiles require you to generate a specially instrumented binary, and then run that to get your profiler. These are almost always harder to integrate than profilers that work “as is” on any existing binary (including those that do runtime re-instrumentation). I don’t see any reason for sanitizers to be different.

    1. unless you are talking about the scenario where you turn the sanitizers on always, i.e., only produce a sanitized binary

      I suggest you develop your code using sanitizers.

      My view is that sanitizers basically change the game entirely. They bring C/C++ to a level closer to Java, Rust and so forth. That is, they make it easier to produce C/C++ code that is safe and bug free. They can become tightly integrated in your programming.

      So checking leaks is not a separate step. The check is right there, each time you run the program. Same with overflows and so forth.

      Of course, it is possible to also release the code with sanitizers, and I bet that many teams do that, but I imagine that the release would do away with the sanitizer. Then, of course, you should probably run the release code with valgrind.

      Other than that, yes, I agree that sanitizers are flaky, capricious. But it has been getting better.

      Annoyingly, Microsoft does not seem to be interested in introducing the equivalent in its compilers.

      1. I see!

        Yes, that is a reasonable approach for local development, and in this case I agree: it is totally transparent to your workflow since you just change a compiler flag and leave it like that. Sanitizers are fast enough that it’s reasonable for most work. You’ll catch bugs quickly this way.

        Unfortunately last time I checked you couldn’t really do this with all sanitizers since some were mutually exclusive, but you can still pick a reasonable default set. See this question for example.

        1. Yes. Sanitizers are not as easy to use as they could be and my understanding is that they are just not available under Visual Studio. That’s a shame.

          I am not putting down valgrind in the least, but I think that sanitizers are underrated.

          1. I think they’ve come a long way in popularity since 2016, at least. I certainly hear more about sanitizers that I do about Valgrind.

            They are definitely a “competitive advantage” for compilers. I often use clang over gcc (which was always my default) because of its better sanitizers (and because originally only clang had them at all), and on Windows there is a big incentive to use clang-cl or whatever rather than MSVC cl.exe so you can get access to the sanitizers. The MSVC compilers are progressing fairly rapidly (at least compared to the past 15 years), so maybe they’ll get this stuff soon.

  8. Hi Daniel, could you please elaborate a bit why you think sanitizers are better than valgrind? Not that I disagree, but I’d like to know your reasoning.

  9. Interestingly, especially regarding “sanitizers versus valgrind”:

    Valgrind isn’t available for armv5(tejl), because the architecture is missing some instruction. So I found that Debian ships libasan for arm-linux-gnueabi (which this armv5 is) – hurray! But now, your above case (malloc() without free()) does not trip on arm5, and neither on armv7(l) with gcc-5.4.0 or gcc-6.3.0, while it does on x86_64.

    At the same time, libasan generally does work on armv5 (unlike valgrind), as introducing an out-of-bounds access (e.g. to array element n in array i[n]) to the source does trigger an error. So I wonder why it’s partially working on arm. It depends on the class of address violation? I’d really like to catch all possible errors. Well, better than nothing so far…

Leave a Reply to Tim Cancel reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax