May 302011

In my computer architecture course, I was told that a processor is like a car. The steering and the pedals are the ISA, the engine is the microarchitecture, and the software is the driver. I will extend that analogy further, and say that using a compiler is like using a remote-controller to operate the car. While remote controls are great, it is also important to understand their inner-workings. I see a lot of code –in professional software– that can confuse even the smartest compilers. In this post I cover three common tricks to confuse a compiler.

Before I fully dive into compilers, I want to add a quick disclaimer: While I do agree that our compilers need improvement, I also believe that compilers will forever have melting points and can always benefit from a little help from the programmers.


A compiler takes your high-level code and converts it into machine code. I view the compiler as the following:

Screen shot 2011-05-30 at 12.56.00 PM

Optimizing the code requires the compiler to perform some extensive analysis. A simple example is dead code elimination (DCE). In DCE, the compiler eliminates code that is not reachable. For example, the loop on lines 3 and 4 will be eliminated by the compiler because the loop’s results are never used.

1: void foo(){
2:   int x;
3:   for(int i = 0; i < N; i++)
4:     x += i;
5: }

In DCE, the compiler performs reverse data flow analysis (def-use) to determine which variables are unused. It then eliminates the lines that are writing this unused variable. For the compiler to perform DCE and other similar optimizations, it must be able to decipher, with 100% confidence, whether or not a variable is going to be accessed. Herein lies our opportunity to confuse the compiler. We can exploit some well-known weak points.

Function calls

Compiler analysis is a very compute-intensive task. For reference, the code of the compiler itself is perhaps the most irregular and complex code in existence. Moreover, the optimization problems emerge very quickly. To keep compile time under control, compilers often limit the scope of analysis to functions and only perform limited–or even no–inter-procedural analysis (IPA). As an added benefit, treating functions as independent entities also allows compilers to compile files in parallel.

Adding multiple layers of function calls, ideally split across files, is often a good method to confuse the compiler.

Lets look at this example:

1: void foo(){
2:   int x;
3:   for(int i = 0; i < N; i++)
4:     x += bar(i);
5: }

A different file contains:

1: void bar(int i){
2:   return i;
3: }

A programmer may think that the only overhead is the function call to bar. However, when compiling foo, the compiler is unable to see that bar does not have any side-effect like writing to a global variable or I/O. For this reason, it disables DCE and the entire useless loop now needs to execute.  This can result in a substantial slowdown.


I have seen a lot of programmers use pointers unnecessarily. In general, the pointer dereference (if anything) is considered to be the main overhead of using pointers. In reality, pointers pose a major overhead by confusing the compiler.

A pointer, in theory, is allowed to point anywhere in memory. As soon as the compiler sees a pointer, it can no longer be sure which variables are being accessed.  Consequently, compilers take a pessimistic assumption that several variables in scope can be accessed using the pointer. This often creates many false dependencies in the eyes of the compiler. My classic example is the following loop I found in an image processing application:

1: for(int i = 0; i < N; i++)
2:  *a++ = *b++ + *c++;

After careful inspection, a human can decipher that the code is really just doing:

1: for(int i = 0; i < N; i++)
2:  a[i] =b[i] + c[i];
3: a += N;
4: b += N;
5: c += N;

The first code confuses the compiler. Even though the loop iterations are independent, the compiler assumes that they are not; thus, the compiler does not vectorize this code. In contrast, the compiler is able to vectorize the second code.

For reference: The vectorized code runs 5x faster. Recall that multi-threading this kernel gave us 4x on 4 cores with some programmer effort. This auto-vectorized version gives 5x speedup with a single core and no programmer effort. I tried both the Intel C Compiler (ICC) 9.1 and  GCC 4.1. I quote results with ICC only because it outperforms GCC in all my examples.

If all else fails, use global variables

Global variables are considered evil for many reasons. I am about to provide one more. Since they have global scope, code involving global variables cannot be fully optimized by the compiler. For example, I noticed that the following code runs 30% faster if N was a local variable than if it was a global variable.

1: for(int i = 0; i < N; i++)
2:  a[i] =b[i] + c[i];


When I declared N as a global variable, the compiler left it in memory and added a load instruction to the loop. When I put N as a local variable, the compiler loaded it into a register.   While I do blame the compiler for this particular behavior (because I did not declare N as volatile), we have to work with what we have.


It is important to keep IPA and pointer-aliasing in mind when writing C/C++ code. This simple consideration can help the compiler generate code which runs much faster.

A related tip: I have observed that both GCC and ICC are sensitive to the link order, i.e., the generated code is a function of the order in which the input  files are specified in the linker command. This can inflict a major impact (up to 10%) on performance due to caches and branch prediction.  If you truly value performance, it is worth manipulating the link order.

  53 Responses to “How to trick C/C++ compilers into generating terrible code?”

  1. Shouldn’t dead code usually be eliminated by the programmer? :-)

    I understand that you’re just trying to make things simple in these examples, but it would be great to see a real-world example where the compiler can eliminate code that shouldn’t have been blatantly obvious to the programmer.

    It would seem that dead-code optimization would be much more useful if it worked on libraries, so that code paths in a library function would be removed if they were never used by the actual program being compiled. To do so you’d need to link the libraries to the main code either at the source or the intermediate representation level. OK, that would be a whole lot slower, but does it really matter if the final build takes 48 hours?

    • Hi Harry,

      Thanks for the comment. You are right:) I try to keep things simple for illustration purposes. I really like questions like yours as they provide me with the opportunity to explain things further for the curious readers, without overloading the casual ones.

      Here is a simple use case where the programmer will not see it easily but the DCE will. It is inspired from code I found in the Linux kernel.

      const int i386 = 0; /* in a different file */
      /* somewhere in another file */
      a = some compute;
      x = a;
      y = a + some other computation ;
      . /* some code that doesn’t access a, x, or y */
      } else {
      /* end of code */

      Now, this entire code can be replace by just bar() but the programmer is unlikely to realize that.

      I agree with your comments that libraries can use extensive optimization. DCE is just one examples but there are a dozen others. I have not even touched instruction scheduling which can really benefit from more computes.

      For completeness, I must mention that there are compilers that do exhaustive searches to optimize code. The first company I know who wrote such a compiler was for VAX by DEC. (It is possible that it wasn’t the first one and I am mistaken).

      • Hmm. Unless the compiler is looking at both files at once it won’t be able to do that optimization, although I guess if the constant is defined in an include file it would be a valid example. In fact, I suppose in cases like this the programmer typically *can’t* remove the dead code because it’s only dead for certain builds – although I have to wonder why i386 would be a constant variable rather than a #define. :-)

        Are you by any chance talking about the compiler that was released when DEC moved to a RISC processor for the new VAX models? I wasn’t doing much programming at the time, but I was using a VAX system and I remember reading about the changes. (VMS is still my favorite OS in many ways.)

        • I was thinking include files actually so DCE will work. In fact, you have constructed a great example there. This is yet another way to trick the compiler and deny optimizations:-)

          Most (non-stupid) use cases of DCE involve different builds. In fact, one can argue that DCE is what allows different builds to co-exist efficiently in the first place. Otherwise, co-existing builds will have higher performance costs.

          I agree that #define will be a more conventional way of declaring i386. However, #define should not be used for a good reason: unlike #defines, consts do not eliminate the code pre-compilation thus even the dead code gets checked for syntax errors. This helps in maintaining code with different builds. My example should still be fine if i386 was #define. The programmer will still not remove the dead code for the reasons you describe and DCE will.

          I have never worked on a VAX myself. I do think that VAX’s virtual memory system is the most elegant VM system I have seen.

    • A lot of dead/redundant code can also appear from the compiler’s code generator.

      • And from generic code generated by things like macros or templates. It’s common to rely on compiler optimization in order to treat code constructs as compile-time determinable when the meta-language itself doesn’t offer such facilities.

        A good example of dead code elimination being important in compile-time generated code is when functions are inlined. Constant arguments can vastly simplify the inlined code after liveness analysis and propagation passes.. which is a big reason why inlining is about more than just removing function call overhead.

  2. pointer example:
    isn’t the ‘restrict’ key word designed to solve this problem? if a,b and c where marked are restricted then the compiler can assume that they do not overlap. does this help the vectorisation?

    • ssam, excellent comment! Yes restrict does help with pointer aliasing issue. I was really hoping someone says that:) You have just made the point stronger. A programmer who understands the compiler better will know and remember to use restrict. Unfortunately, many of us do not. We need more awareness…

  3. also could you comment on link time optimisation. it has been recently included into GCC (-lto). maybe it helps in the function call example.

  4. I have trouble with your pointer example. Without restrict in the pointer declarations, the compiler is surely unable to vectorise either loop. They both suffer from the same problem i.e. that the ‘a’ array may overlap either or both of the ‘b’ and ‘c’ arrays. I have tested both with the ARM C compiler and this seems to be the case. It will happily vectorise both loops with restrict’d pointers but neither without this. Interestingly, without the restrict, it will compile two separate versions of the loop, one vectorised and one not, plus a preceding runtime test to determine whether the arrays overlap. The result of the test determines which version of the code is actually executed.

    Am I missing something?


    • Hi Chirs,

      Sorry my response is a bit late. I agree with you that I need a restrict. However, I have omitted the restrict intentionally to make a point that most programmers do not think about such compilers’ constraints by default. We need more compiler-aware code and it should be second nature to all programmers.

      The Intel Compiler vectorized A[i] = B[i] + C[i] even without restrict. Its a dangerous optimization but I guess it trusts the programmer to stay within array bounds.

      The ARM compiler seems really smart in case of generating both versions and a dynamic test. Its impressive. Actually, it also brings out a great point about Dollar being the only metric in our field: ARM has more financial incentive than Intel to make their compiler better because ARM’s cores are leaner and a good compiler can help ARM more than it will help Intel. Hence, the smarter compilers. As Intel makes leaner cores like Knights, I would expect their compiler to get smarter.

      On that note, I also have a question for you. I really want to play with an ARM core, where can I get a development board? I have been benchmarking A8 using my iPhone 3GS but iOS comes in the way.


      • beaglebaord, pandaboard and gumstix are quite cheap.

      • Thanks for the response, Aater.
        I have to say that the I believe the Intel compiler to be seriously in error for vectorizing the case without restrict. It’s not just dangerous! I would be interested in what other compilers do in these cases but sadly I don’t have the time to investigate…

        ssam’s responses about development boards are all good. Pandaboard and beagleboard are both excellent and very reasonably priced.


        • Maybe ICC doesn’t do it in general, but was able to determine via static analysis in this case that the operands can’t overlap. For instance, if the parameters were tied exclusively to static arrays, and that their indexing is limited to valid ranges known by the loop variable. Arrays do seem a little easier to perform this analysis on than raw pointers.

  5. These are just some of the reasons why C/C++ irritates me more and more these days. Working on a compiler for a language without these flaws will do that :)

  6. 1) void bar(int i){
    return i;

    should be

    int bar(int i){
    return i;

    so it compiles.
    2) Microsoft C/C++ compilers since (at least) Visual Studio 2005 can do Link Time Code Generation and Whole Program Optimization, which changes your flow diagram and eliminates the issues you mentioned related to functions in separate source files.

  7. Nice info …..

    Thanks ..

  8. I’m sorry I didn’t catch this thread until now. Is there a language out there that takes advantage of all of the latest multi-threaded, multicore processors with SIMD hardware?

    I have been looking at OpenCL C and C++ for general purpose, highly parallel computing. However, is this the ideal? I mean C for dealing with vast arrays?

  9. Nice site. On your blogs extremely interest and i will tell a buddies. facdakdcdeck

  10. I read this paragraph fully about the comparison of newest and preceding technologies, it’s amazing article.

  11. Very great post. I just stumbled upon your blog and wished to say that
    I’ve really enjoyed browsing your blog posts. After all I’ll be subscribing in your feed and I hope you write once more
    very soon!

  12. Hey there, You hzve done a great job. I’ll definitely digg it and
    personally recommend to my friends. I’m sure
    they will be benefited from this website.

  13. This results game servers in a student house in the house etc.
    Counter Strike, you will need to do is join a game server.
    China, Taiwan and Western countries server is the worst in the history of
    the CD version does not. If you do is join a game server route to VPNwith FlyVPN client, click here.

    You can set up many service providers, namely those that are played
    on the Server Usability.

  14. It’s an awesome paragraph designed for all the internet
    viewers; they will get advantage from it I am sure.

  15. I quite like looking through an article that will make men and women think.

    Also, thanks for allowing for me to comment!

  16. Thanks for sharing your thoughts. I really appreciate your efforts and I
    am waiting for your next post thank you once again.

  17. To add to the benefits, women do not have to wear mineral products heavily to get
    their desired look. Aloe Vera may be best known for its use in treating burns and skin lesions.

    Minerals are mined and then refined; then they are mixed,
    blended, and tested to produce chemical free makeup products.

  18. Selling your broken auto for hard cash to a good and dependable
    damaged car dealership is likely to lower out a whole lot of problems.

    When looking to sell your junk car, it is a
    good idea to first establish ownership of the vehicle, as
    many companies that buy junk cars only do so from titleholders.
    While continuously offer us the finest customer service.

  19. Then work on your meta description so that it appeals
    to searchers. There are two types of Search Engine Optimisation SEO.
    Your SEO professional will be up to create an online presence and back links which will help improve your ranking position. For companies who are new to and are looking for SEO services, finding a provider who can give professional SEO services can be a great
    challenge. If someone selects any Local SEO firm, all the keywords will
    be easily searchable in local search engines.

  20. Selling your broken auto for hard cash to a
    good and dependable damaged car dealership is likely to lower out a whole lot of problems.
    The experts, who have many years of experience in this field, often suggest junk car owners to
    sell out their vehicles to charitable institutions. Venturing to
    a business that involves making custom cars.

  21. Hurrah! Finally I got a weblog from where I can in fact get useful facts concerning my study and knowledge.

  22. When some onee searches for his essential thing, thus he/she desires to be
    available that in detail, therefore that thing is maintained over here.

    Feel free too visit my web site … short url

  23. Before buying a sex toy, you should always consider the opinion of your partner.
    is part of a growing community of sex positive medical doctors, who treat sexual health related issues
    in both men and women as a holistic team approach. The best tip to improve sex life is to be
    honest with your partner.

  24. La diferencia primordial entre la versión gratuita y la de pago es
    que la de pago permite múltiples cuentas y además de
    esto publicar en G+, páginas de empresa de Linkedin, y Pinterest

  25. As a homeowner, you will have maintenance and other related costs to
    keep your drain system running efficiently. * Edwin Hubbel Chapin once said, “Every action of our lives touches on. These microbes will settle in the tank and digest the waste in the tank.

  26. There is no need to purchase bottled water, which is expensive; you can store your own fresh drinking water for purposes of drinking in vertical
    storage tanks. It is necessary to wisely choose carpets to
    be placed in a room so that they can add color, style and warmth.
    Experience the difference an experienced, reliable
    home theater setup company makes.

  27. Because of this capability, this equipment is mainly used in drainage cleaning where solids have already mixed in with the fluids.
    This tool is a long, hollow, transparent piece of plastic
    branded with one-foot accretion. Many of the Earthships
    have propane tanks for cooking and a backup hot water supply; these are the only utility bills.

  28. When getting rid of drain clogs, avoid using chemical cleaners.
    But make sure that you consider the style your home is built with.
    The most popular types and styles of these include the following:

  29. Thanks for ones marvelous posting! I definitely enjoyed reading it, you will be a great author.

    I will be sure to bookmark your blog and definitely will come back from now on. I
    want to encourage you continue your great job, have a nice weekend!

  30. Through car ramp, raise is possible in the vehicle.
    With the rising temperatures of summer, outgassing becomes
    increasingly problematic. This is probably one of the highest inverter generator rating you can have
    out there at a price of three times lower than any well known other

  31. Most of those supplements work by stopping the cause of baldness.
    Almost all bodies anticipate that bodybuilding supplements is an above allotment of the blueprint back it comes to accretion beef mass.
    To attain this level in the blood, most people must supplement with an additional 4000 to 5000 IU of vitamin D3.

  32. Most of those supplements work by stopping the cause of baldness.
    Dog glucosamine can also cure a dogs inflamed discs, sciatic nerve and inflamed joints-conditions that are usually
    connected with aging. As after the tiring workouts session the body of bodybuilder is malnourished then the
    requirements of post workout supplements increases.

  33. Every sports person whether he is a man or women needs sports supplements in order to avail best results.
    Excess consumption of zinc may cause bleeding stomach and severe abdominal pain. They offer the consumer more
    freedom of choice, an ecological capsule, and a low price.

  34. Another 10% do it kind of well although not at a professional stage.
    I set an example of such information to demonstrate the formatting of a
    typical contact page as shown below:. Predicting i – Phone
    6: Size, features and release date.

  35. This is how quite a few website owners make substantial
    6 and 7 figure incomes each year. Small claims
    court cases are much cheaper than superior court cases for both the plaintiff (the
    person doing the suing) and the defendant (the person being
    sued) because the parties are not allowed to have any attorneys represent them and other
    rules that simplify the lawsuit process, making the
    whole thing much cheaper, faster, and easier.
    However, there are certain things which can assist you differentiate which one is best and which is not.

  36. This is especially useful if the person you’re inquiring about has a less
    than common name. Not only do you want lots
    of people to visit your website but you also want your visitors to stay on your
    site and see if the deal is worthwhile by simply checking out your site.
    A money making attitude plus a money making opportunity
    almost always result in a money making reality.

  37. Make money on autopilot is available for all of us.
    That means you literally can make money 24 hours a day.
    I discovered the way to make money on the internet.

  38. Quality posts is the key to be a focus for the visitors to
    pay a quick visit the web site, that’s what this web page is providing.

  39. This article is truly a good one it assists new the web visitors,
    who are wishing for blogging.

  40. If some one wishes expert view regarding blogging and site-building after that i recommend him/her to pay a quick visit this website,
    Keep up the good work.

 Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>