Well, it is a nice 3-operand, RISC-like approach, so you can finally do A+B=C in a single opcode: AMD's proposed SSE5 is supposed to go along this line as well. Later, with fused multiply-adds, this could even become A*B+C=D. Then, you got a more efficient instruction format with a lot of baggage (and length) reduced - again, one of major problems of X86 on efficient fixed-opcode length RISCs. No need to mention the good ship Itanic here, it can have more instruction FORMATS than some RISCs have instructions - that's how 'elegant' it is.
Then, AVX doubles the SSE register length to 256 bits - doubling the amount of data fitting in and, matched with doubled data paths, providing twice the FP throughput per clock in the Sandy Bridge CPU some two years from now. And, one day maybe, you could fit two quad-precision 128-bit FP numbers into each of these registers. Marvellous!
But then, these innovations aren't that new? From the turn of the century, there was something called EV9 - a 2146 4 Alpha CPU somewhere in 2005. The thing was proposed to have one (possibly two) 8-way superscalar EV8 cores, each multithreaded of course. And a dedicated vector engine with 16 MB L3 cache. Now, that was to be an interesting beast for a general-purpose CPU: a 1024-bit wide monster, with matching L3 cache width, and 32 1024-bit wide vector registers (yeah, four kilobytes of numbers in there).
Closer look at Intel's AVX instruction set
Posted on Saturday, May 03 2008 @ 14:15 CEST by Thomas De Maesschalck