Linus Torvalds: AVX512 Is "A Hot Mess" "I hope AVX512 dies a painful death"
Linus Torvalds is not happy with the variety of instructions that may or may not be present on a processor that supposedly supports what Intel refers to as AVX512. "I hope AVX512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on.", Linus wrote in a message on the RealWorldTech forums. He went on to call it "special-case garbage", "power virus" and "not reliably enough".
written by 윤채경 (Yoon Chae-kyung). published 2020-07-12 - last edited 2020-07-12
Intel stock price 2011-2020: Intel closed at $59.53 per share this Friday. Intel nowhere near being bankrupt and finished, Intel is valued like the integral part of the US military industrial complex that it is.
One of the major annoyances with Intel's in reality family of instruction sets under the banner "
AVX-512" is that it is highly variable what "
AVX-512" actually is. The
AVX instruction set is well defined and so is
AVX2 and most other standardized instruction sets.
AVX-512 is a confusing mess. Consider the following table:
|CPU Family||AVX-512 Instructions Supported|
|Knights Landing (Xeon Phi x200)||F, CD, ER, PF|
|Knights Mill (Xeon Phi x205)||F, CD, ER, PF, 4FMAPS, 4VNNIW, VPOPCNTDQ|
|Skylake-SP, Skylake-X||F, CD, VL, DQ, BW|
|Cannon Lake||F, CD, VL, DQ, BW, IFMA, VBMI|
|Cascade Lake||F, CD, VL, DQ, BW, VNNI|
|Cooper Lake||F, CD, VL, DQ, BW, VNNI, BF16|
|Ice Lake||F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES|
|Tiger Lake||F, VL, BW, DQ, CD, VBMI, IFMA, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES, and VP2INTERSECT; WBNOINVD, MOVDIRI, MOVDIR64B, CLWB.|
Ice Lake is the only "AVX-512" CPU with the
VDPBF16PS. A program using any of those "
AVX-512" instructions will only run on a Ice Lake processors. A program taking advantage any
VPOPCNTDQ instructions will run on the old Knights Mill family of processors and the new Tiger Lake processors. No other "
AVX-512" processors has support for those particular instructions.
AVX-512" is not a standard, it is more like a marketing term Intel can use to benchmark whatever random set of instructions they decided to throw into the "
AVX-512" bucket in their latest CPU family launch.
"I hope AVX512 dies a painful death"
Linus Torvalds has long been a fan of the old-school technology forum Real World Tech started by Dean Kent in 1996. David Kanter has been managing and moderating it since 2002. Real World Tech is the kind of forum where readers and participants are and expect you to be an industry expert or a professional or Linus Torvalds before you post. It is not /g/, installing Gentoo Linux and watching a few good anime shows like "Gabriel Dropout" and "Dragon Maid" is not enough to make you qualified to post on Real World Tech.
Linus Torvalds has thrown himself into a long discussion about Intel's "AVX-512" support on their latest processor series on the Real World Tech forum in response to an article written by journalist and kernel expert Michael Larabel from the leading Linux publication Phoronix titled GCC 11 Compiler Lands Intel Sapphire Rapids + Alder Lake Support".
Linus Torvalds has this to say in a message posted with time-stamp July 11, 2020 11:41 am:
> Hope you didn't get too attached to AVX-512. The GCC 11 compiler target for Alder
> Lake doesn't enable it, only AVX2. As Phoronix mentions the target likely is what
> the small cores support and not necessarily the big... but it makes me wonder...
I hope AVX512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on.
I hope Intel gets back to basics: gets their process working again, and concentrate more on regular code that isn't HPC or some other pointless special case.
I've said this before, and I'll say it again: in the heyday of x86, when Intel was laughing all the way to the bank and killing all their competition, absolutely everybody else did better than Intel on FP loads. Intel's FP performance sucked (relatively speaking), and it matter not one iota.
Because absolutely nobody cares outside of benchmarks.
The same is largely true of AVX512 now - and in the future. Yes, you can find things that care. No, those things don't sell machines in the big picture.
And AVX512 has real downsides. I'd much rather see that transistor budget used on other things that are much more relevant. Even if it's still FP math (in the GPU, rather than AVX512). Or just give me more cores (with good single-thread performance, but without the garbage like AVX512) like AMD did.
I want my power limits to be reached with regular integer code, not with some AVX512 power virus that takes away top frequency (because people ended up using it for memcpy!) and takes away cores (because those useless garbage units take up space).
Yes, yes, I'm biased. I absolutely destest FP benchmarks, and I realize other people care deeply. I just think AVX512 is exactly the wrong thing to do. It's a pet peeve of mine. It's a prime example of something Intel has done wrong, partly by just increasing the fragmentation of the market.
Stop with the special-case garbage, and make all the core common stuff that everybody cares about run as well as you humanly can. Then do a FPU that is barely good enough on the side, and people will be happy. AVX2 is much more than enough.
Yeah, I'm grumpy.
Linus Torvalds did not stop there. He went on elaborating on the practical problems caused by "
AVX-512" being a random bucket of instructions that may or may not be present on a given CPU that supposedly supports "
AVX-512" in a message time-stamped July 11, 2020 8:34 pm:
"> But we have some SIMD-based table lookup
> stuff that's way faster than the integer equivalent both because you're doing a lot of stuff at once, but you're
> also doing stuff where there's no integer equivalent (there's no PSHUFB for a GPR register).
Yeah, and we might even use some of it. We have places where we do "vectorization" by hand and use integer registers to hold as many bytes as possible, and look for '/' or the terminating NUL byte (obviously I'm talking about filename copies) and create a hash of the result at the same time, one (integer) word at a time.
We could possibly even have an AVX512 version.
If it was available, and if it didn't tank performance due to frequency issues.
But it isn't, and it does.
Fragmentation kills your market. The fact is, AVX512 isn't worth it, because it's not reliably enough there. And I don't think it's reasonably ever going to be, because it was never designed to work on low end.
With a new not-even-released-yet CPU's not supporting it being a case in point.
And that makes AVX512 actively bad. It was literally designed not to be used in any generic code, and is basically only useful for "hey, we have this kernel of code that is so hot that we'll just create five different versions of it.
What part of that is hard to understand? It sure seems to be something Intel cannot get its head around, since Intel keeps making that mistake over and over again.
Linus did admit that there are some parts of "
AVX-512" that he actually likes, even though he thinks it is overall a "hot mess", in a message July 11, 2020 7:46 pm:
"> No! Keep up the good work adding sweet SIMD instruction sets that give you an edge
> over AMD, who have been kicking your ass in most other ways.
The thing is, those sweet SIMD instructions aren't making up for the fact that AMD has been kicking ass everywhere else...
That kind of proves my point.
> Actually put them on all your chips, not just some weird random subset so people actually use them.
.. and that's actually a (very big) part of the problem with AVX512.
I'd argue that you simply can't put it in a smaller chip. Not while actually making it worth using. It wasn't designed that way.
Some of the AVX512 parts I like: intel did make things more generic with it. I think the masking stuff is new to AVX512, no?. But that's less about the width, and more about cleaning things up and making them more useful in general.
But I think the 512-bit part is a hot mess, and I think ARM potentially did things much better with SVE2. Exactly because hopefully you can have a small and large implementation co-existing, instead of the fragmentation that is the AVX world.
Are there any actual SVE2 chips and users out that validate that point? Not that I know. I haven't really followed it. But I appreciate people trying to do it right. Maybe SVE2 won't work out well, but at least ARM tried.
Not like Intel.
I'll give kudos to Intel when they do things well, and they do do many things well (well, used to, and I'm still hoping to see the old Intel come roaring back, because it's been so depressing lately). But I'll also point out when I think they've screwed up. AVX512 and transactional memory have been bad, I think. They've been bad both from a technical standpoint, but equally importantly from that "fragmenting the market" standpoint.
All the concerns Linus Torvalds has raised about Intel's "
AVX-512" umbrella appear to be valid, it does seem like "
AVX-512" can be accurately described as "a hot mess", "not reliably enough", "power virus" and "special-case garbage".