Optimized gcc compiling (cflags/cxxflags, distcc, ccache)

From LinuxReviews
Jump to navigationJump to search

How to make your programs compile and feel faster by doing some small adjustments to your compiler flags.

What are the flags CHOST, CFLAGS, CXXFLAGS?[edit]

These environment variables are used by the build process (./configure && make) to optimize the program code when you compile software.

Program files are usually made so they can run on any x86-64-compatible computer. But they can be optimized for just only one CPU type, making them run faster on that CPU. Perhaps noticeably faster...

For the best performance, use binaries made for your cpu or compile yourself, using optimized settings for CHOST, CFLAGS, CXXFLAGS

The important flags[edit]

The most important flags are -mcpu and -march.

-mcpu=pentium3 means the code will be optimized to run on Pentium3, but will also run on i386.

-march=pentium3 means the code will only run on a Pentium3.

when -march=arch is set, -mcpu=arch is honored.

-O[n] (the letter O, and a number) enables various levels of optimization. -O1 , -O2 and -O3, where -O3 is highest. -O2 is the default.

The available flags depend on your compiler supports. Check the gcc man page for your compiler to check available flags for -march and -mcpu.

Generally, only using -O2 (or -O3) is recommended. Some code will fail when more optimization is applied, some fail even at -O2. Distributions like Gentoo Linux filter away flags on specific packages because of this. The higher the optimization, the higher the risk of errors.

GCC has a flag called -native which will apply the most useful flags for your CPU automatically.

In most cases this is enough:

CFLAGS="-O2 -pipe -march=native"
CXXFLAGS="${CFLAGS}"
Lovelyz Kei ProTip.jpg
TIP: gcc -march=native -E -v - </dev/null 2>&1 | grep cc1 will tell you exactly what flags -march=native activate on the architecture you are on.

As an example, gcc -march=native -E -v - </dev/null 2>&1 | grep cc1 outputs the following on a first-generation AMD Ryzen chip:

/usr/libexec/gcc/x86_64-redhat-linux/9/cc1 -E -quiet -v - -march=znver1 -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -msha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=znver1

There are some other flags you may want to add:

-pipe speeds up the compilation process (no gain at runtime)

-fomit-fame-pointer makes programs faster at runtime, but makes debugging impossible on i686. -O turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging. x86 processors need the frame pointer for debugging, so -fomit-frame-pointer is not turned on by default.

-m3dnow -msse -mfpmath=sse -mmmx is generally not safe, required or desired. It's an optimisation analagous to -ffast-math, in that programs that are expecting 80-bit results may be confused (sse can only do 64-bit precision math). -mfpmath=sse is only the default on x86_64, not on x86. (James Harlow)

-funrollloop will give you a bigger executable that may be a tad faster

-O2 -O3 Various grades of optimization. O3 is recommended for most CPUs. NOTE: It's the letter O, not the number 0

-Os This makes the executable as small as possible, but does not optimize for CPU performance. This is a flag worth considering if you are still using a very slow HDD. It used to be a really good option in the early 2000s.

The file $HOME/.bashrc is a good place to put a CFLAGS= variable.

Optimized CFLAGS for modern processors[edit]

CFLAGS="-O2 -pipe -march=native" makes gcc enable all the features of the CPU it is running on. That's great as long as you do not plan on running the resulting binaries on another computer ever. You may want to do that and optimize for modern processors.. Having your cake and eating it is tricky so some compromises are required. One you may accept is to optimize for desktop computers made after 2015. That gives you:

  • -mcx16 # CMPXCHG16B, AMD64 2005
  • -mmmx -msse -msse2 # Intel 2001, AMD 2003
  • -msse3 # Intel 2004, AMD 2007
  • -mssse3 # Intel 2006, AMD 2011
  • -msse4.1 -msse4.2 # Intel 2008, AMD 2011
  • -mavx # Intel 2011, AMD 2011
  • -mavx2 # Intel 2013, AMD 2015
Kemonomimi rabbit.svg
Note: Low-powered Intel processors like Baytrail, Goldmount, etc DO NOT have avx/avx2 instructions.


Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.