What constitutes a CPU core? That’s the question on the minds of many in the wake of the news concerning a class action lawsuit against microprocessor company AMD. In a suite filed by Tony Dickey of Alabama, Dickey alleges that AMD falsely claimed that it’s Bulldozer based CPUs had a higher core count then they really did. Essentially, he claims all the eight core, six core and four core CPUs using Bulldozer cores are actually only quad-core, tri-core and dual-core, respectively and that AMD intentionally hid this fact from consumers.
Dickey’s assertion is that since a Bulldozer cluster (two Bulldozer cores on a single module) only has one FPU (floating point unit) shared by both cores on the module, only the cluster itself, not the Bulldozer cores inside, should count as a complete CPU core, thus cutting actual core counts in half.
Dickey claims that all of this is AMD’s ill conceived attempt to match Intel’s CPU performance by hacking their own version of hyper threading together. Hyper threading is an Intel technology that allows two different programs to perform work simultaneously on the same CPU core, or for a single program to do twice the work on a single core.
Dickey also claims that even in tasks that do not make use of the FPU, CPUs with Bulldozer cores do not scale the way one would expect over a given core count. The example he cites is that 8 core Bulldozer CPUs exhibit a similar performance to Intel quad-core CPUs in the same tasks.
Ultimately, Dickey alleges that AMD intentionally lied about the performance characteristics of its Bulldozer based CPUs, even though it was common knowledge since day one that Bulldozer cores shared FPUs, and that AMDs Design provided lower FPU performance than Intel parts. At no point has AMD ever tried to claim its Bulldozer parts had exceptional Fpu performance.
What is a core?
So we’ve discussed the basics of the case, but we still haven’t answered the question at heart of the matter, what is a CPU core?
That is unfortunately not an easy question to answer, partly because no two manufacturers designs its cores the same way or with the same principles in mind. It’s much the same way automobile engine manufacturers all make engines for the same task, IE driving, but they each go about it differently, with similar but different parts in different configurations for different styles of vehicles and different workloads. They could all make the same part, but none of them would ever make a profit. It’s the differences between the engines, and similarly CPUs, that drives sales.
This presents a problem, how then do you define a complete CPU core when nobody makes them the same way? If you take Dickey for his word, no CPU core is complete unless it has its own FPU. But the history of computer technology is littered with CPU designs that didn’t have an FPU at all and still everyone called them CPUs without question.
Maybe we should keep it simple and stick to just the x86 instruction set, after all Bulldozer is an x86 part, and the suit is comparing Bulldozer with x86 cores made by Intel. But then again history tells us some interesting things about Intel’s CPU cores. In fact, Intel’s first CPUs didn’t have their own… Well anything really, they relied on numerous support chips like external MMUs and yes even an external optional FPU.
And that is really a telling point, up until the mid 90s Intel didn’t even offer an integrated FPU as standard in their own x86 cores. Up to that point, as a consumer, you would buy an x86 PC with an x86 Intel or AMD CPU and on the motherboard would be a second socket to add an optional FPU add-on chip. Sometimes it would come pre-installed or you had to buy one on your own. Today, virtually every x86 CPU comes with some kind of integrated FPU, but how the FPU(s) is used in or shared in each part is completely up to the designers, with no real rules or regulation to determine how the FPU is to be implemented.
That leads to another question, does any CPU actually need an FPU? The answer to that question is quite easy, NO. No Turing Complete x86 CPU ever made actually needs a dedicated floating point unit to do floating point math. Does it make it easier? Yes. But with the right programming even ENIAC, an early digital computer from the 1940s, could do floating point math. AMD’s Bulldozer parts could be custom made right now to not include FPUs at all and they would still be capable of computing floating point math. It would be a lot slower at it, but it would still get the job done.
In regards to the case, Dickey asserts that the shared FPU does not behave like the FPU in Intel parts. This too is a well known fact, and one of many reasons why AMD targeted Bulldozer based parts toward gamers and enthusiasts who wouldn’t need allot of FPU capacity. Simply put, Intel designed its CPU to fit certain workloads, ie office and server work, and AMD designed its CPU to fit other certain workloads, like gaming and streaming the results live on the internet.
However, all that being said, there is more going in AMD’s and Intel’s FPUs then the case let’s on. For instance, Intel’s FPU in each of its own CPU cores operates on 128bit floating point instructions, whereas Bulldozer’s FPU operates on 256bit floating point instructions. Another difference is that Bulldozer’s FPU is capable of working on two 128bit instructions simultaneously or doing one 256bit instruction for increased precision. In fact a single Bulldozer FPU outperforms Intel FPUs when used on 256bit operations, but when used to calculate two 128bit instructions, as in when two cores are sharing the FPU, it lags behind two Intel FPUs each doing one of the 128bit instructions. This isn’t a surprise, and is indicative of how Intel and AMD both have different priorities when designing CPUs.
The important thing to take away here is that the FPU in Bulldozer parts can be used by both cores simultaneously as though it is two FPUs working individually. It doesn’t quite work as well as Intel’s solutions, but nobody designs the same exact part unless somebody wants to get sued. One will always outperform the other because they are never identical.
So, if we base our definition of a CPU core on all the CPUs that have come before, and the fact that CPUs do not need FPUs to do floating point math, then we would logical not require any core to have an FPU in order to meet that definition.
The second part of Dickey’s case hinges on his assertion that 8 core Bulldozer CPUs do not scale in performance the way one might expect an 8 core CPU to scale up when 8 threads are being worked on at one time. Essentially, Dickey believes that Bulldozer parts behave more like true quad cores rather than true 8 core parts.
This, luckily, is not a difficult claim to prove or disprove. Anyone with an AMD FX CPU can test this out. I happen to have an 8 core FX-8350 to test and a few programs capable of utilizing 8 cores.
I’m using Handbrake for this test as it uses a single process with 8 threads. This is important, because if we use a program that uses multiple processes to utilize all 8 threads, then its not truly a test of multi threaded scaling. Handbrake was given a 1080p video to convert into an MP4 file. To produce the results, I started with handbrake having access to all 8 cores, then I began cutting off access to each core, one at a time, by changing the affinity settings for the process in Task Manager. Below is the results.
What we see here is exactly what one would expect if the FX-8350 was a genuine 8 core part. I could have used a bunch of other programs to back up these results, but with how clean the handbrake data is, I don’t see any point in continuing to collect data. If AMD was really making CPUs with something like Intel’s Hyper threading, you wouldn’t see this kind of nearly perfect linear scaling. That only happens when dealing with single thread cores. Based on performance scaling data, the FX-8350 most certainly is an 8 core CPU.
Based on My own experience with the Bulldozer microarchitecture, I can say with zero doubt that Dickey’s claims regarding performance scaling to be completely false.
Follow the money…
It is, in this writers opinion, highly suspect that Dickey has filed his case now, almost 4 years after the Bulldozer microarchitecture was released in late 2011, just as AMD has announced that it’s new Zen microarchitecture will arrive early next year. Zen is predicted to bring in large profits for AMD if it’s performance claims hold true.
It’s been a matter of public record from day one that the Bulldozer microarchitecture was built around modular clusters, and that, outside of vendor customized parts, each cluster would consist of two Bulldozer cores, and a single FPU. This information has been available for years, and only now, in the wake of recent news, has somebody decided to do something about that information. Again, even if Dickey’s claims are true, it seems highly convenient for this case to appear just as AMD is about to come into some decent profits.
I would be surprised if the judge handling the case doesn’t see right through this case for what it really is, one big, well timed money grab.