With AMD closing the gap with Intel competition in the CPU market is back after a ten year abscense.
What I'm waiting for is an AMD GPU that can compete with a top-tier NVidia offering. Vega is nice, but not really a contender on the mid to top end. The G series cpus with vega inside are great, but where is the 2080ti, or even 1080ti, killer? Even something a bit slower, but close, would be great.
Is it too much to ask of AMD to handle both? I am unsure, but I would love to see NVidia in a price and performance war at the same time Intel is. Competition makes the resulting products better. Do you think the 9th Gen Intel chips would be octocore without Ryzen?
> MIOpen is a step in this direction but still causes the VEGA 64 + MIOpen to be 60% of the performance of a 1080 Ti + CuDNN based on benchmarks we've conducted internally at Lambda. Let that soak in for a second: the VEGA 64 (15TFLOPS theoretical peak) is 0.6x of a 1080 Ti (11.3TFLOPS theoretical peak). MIOpen is very far behind CuDNN.
The market made it pretty clear that as long as Nvidia was increasing performance substantially with each release, they could collect a healthy margin of profit. There was no incentive to artificially slow down their progress. (Additionally, the benefits of GPU power were clear to gamers, and new benefits from parallel processing units became apparent in the areas of cryptocurrency and machine learning.)
Intel, however, ran into engineering issues with their process improvement, while having limited market-driven incentive to spend and innovate improvements to their efficiency (IPC) beyond a certain level (or to increase core count.) So their "tick-tock" cycle was broken.
Nvidia came to my place of work to do a presentation on the stuff they were working on around AI and Deep learning. More than anything, I took away the message that regardless of how large your budget is, they've got a way for you to spend all it on GPU compute.
AMD is still behind per core in both clocks and per clock performance. Intel has demonstrated the ability to just keep pushing cores to match AMDs largest offerings by just jacking up architectures they have been milking cheap for high margins for years.
TSMCs 7nm node will be the first legitimate tech advantage over Intel and if Intel does get their act in gear and manages to deploy their 10nm next year they will at least keep parity.
I am a huge AMD fan and will almost certainly be considering a Zen 2 build next year (though the fact I can disable Intel ME and can't remove AMD PSP will temper my interest) but thats more to support the underdog and get better value for money than getting the absolute best possible performance.
Intel has been the undisputed champion of process since the 80486 days so it's surprising to see how badly they're scrambling to hit even 10nm, let alone 7nm.
I would never have bet money that the tiny little CPU designed by Acorn Computers that ended up powering the Newton would be the first CPU to jump two nodes ahead of Intel in terms of process, but here we are.
ARM's doing great work and I hope they continue to push core counts to even more ridiculous levels.
it's surprising to see how badly they're scrambling to hit even 10nm, little lone 7nm
The feature size there is a bit misleading – "Intel 10nm" and "TSMC 7nm" are roughly equivalent, with IIRC the Intel 10nm process actually having a higher transistor density. The TSMC chips aren't really "two nodes ahead" – but it is likely that Intel will lose its process lead for maybe a year or so.
I get the impression that Intel overextended on their 10nm process, in that they were perhaps a bit more ambitious that other manufacturers and it came back to bite them when there were scaling problems. On the other hand, last I heard was that the scaling problems experienced with the 10nm node haven't held up Intel's 7nm node, which could well see them re-establish their process lead.
At the end of the day, it's great that the market is seeing some more competition, so hopefully we will all be able to enjoy the benefits from a variety of manufacturers soon!
> I get the impression that Intel overextended on their 10nm process, in that they were perhaps a bit more ambitious that other manufacturers and it came back to bite them when there were scaling problems.
I’m no micro-electronic expert but I wonder if we are hitting limits in clock speed scaling with regards to feature size - i.e. shrinking pass a certain feature size clock speeds actually have to drop for the chip to be stable.
Intel’s priority is clock speed first and foremost due to what they produce - desktop and server CPUs. A new process is pointless for them if they can’t get at least equal clock speeds out of it as their old process.
TSMC caters to mobile CPU and GPU production - those will never boost to 5Ghz like CPUs; the former for power efficiency reasons (and heat) and the later tends to go for more “cores” as it focuses on parallizable workloads.
As I understand it, it's not chip speed. It's chip voltage. Everything is a conductor if the voltage is high enough, and the closer the traces get, the less resistance the insulation provides. The problem is that at the temperatures we run computers at, the conductor traces need a fair bit of voltage to push the current through the entire chip.
The ratio of conductivities of insulators and conductors stays the same.
It's more that making many very long conductors with very short insulators between them becomes problematic. But that was the case at any process size, but now we are pushing the limits as far as possible to try to make bigger chips.
I don't doubt that Intel will get it together and remain competitive but right now they're really in a bad place. They're usually a step or two ahead, even when pushing ridiculous designs with no merit at all like the Pentium 4 or Itanium. To see them scrambling now to catch up is pretty much unprecedented.
I'm talking about how ARM got their first on the process and now AMD has a chance to fab using that as well. AMD got out of the fabrication game, they couldn't keep up, which means they can use specialists like TSMC which are killing it now.
Intel's largely "secret sauce" process has been their greatest asset. Now it looks like a huge liability.
> AMD is still behind per core in both clocks and per clock performance.
Yes, that's true but the per clock performance is close. They are only 5-10% behind in single threaded tasks without AVX (depending on the workload). The IPC increase is expected to be 10-15% (will of course depend on the workload). And their achilles heel, the AVX performance, will also improve with Zen 2 (256 bit instead of 128 bit etc.)
Due to 7nm the clocks (for consumer hardware like Ryzen and Threadripper) will probably also increase (not 5 to 5.2 ghz after overclocking like Intel, but up to 4.7 ghz overclocks could be possible seeing that 4.3 ghz is possible on the current node which is mobile optimized).
Depending on how much the clocks increase I believe they can close the gap. Maybe even pass Intel. The future certainly looks promising for AMD.
I wonder if they will revive their X APUs for the Server. In the past they had Opteron X APUs to increase the compute density of servers. Now with Zen and Vega this could be a nice combo in addition to discrete GPUs.
Imagine replacing these two 32 core Epyc CPUs made in 14nm with two 32 core Epyc APUs (Zen 2) made in 7nm, which would use the saved space due to 7nm for Compute Units, and you might get an additional 10-16 TFlops per System. Which is basically one additional GPU.
People keep thinking AMD's 7nm is this amazing thing, like everyone is talking about the same thing when it comes to CPU's and nm. When in reality nm has just been marketing fluff for a decade now, just like response times in monitors.
AMD's 7nm might get them close to on par with Intel's current 14++ nm chips, but it's not like AMD has really figured out how to make the entire CPU half the size.
People who follow the tech press talk about the same thing when talking about nm, and they know TSMC 7nm competes with Intel 10nm not Intel 14nm.
It's been all over the town for months that TSMCs 7nm is estimated to be worse than Intels ambitious failure that is 10nm  but quite a bit better than Intel 14nm (with the exception of clocks), and that 7nm+ with EUV for cost savings (which TSCM already taped out last month) is estimated to be equal or even slightly better.
So I'm not really sure what to make of your comment?
> AMD's 7nm might get them close to on par with Intel's current 14++ nm chips, but it's not like AMD has really figured out how to make the entire CPU half the size.
No they didn't, but they don't claim that, do they? From what they say they decided on the IO die exactly because IO doesn't scale as much, and that decision allowed them to double the number of cores. Since 7nm is expected to be much more expensive than previous nodes this seems really clever from a money standpoint as well. The core only Zen 2 chiplets are expected to be around 70 mm² which is mobile SoC territory.
AMD is already close in IPC to Intel even though AMD uses a worse node (GloFos mobile optimized 14nm is more like Intel 22 then Intel 14nm) and wins in multithreaded workloads because their SMT implementation seems to scale better than Intels. They also seem to have better performance/watt when under load. I have not seen numbers for idle wattage for Xeons but Intels desktop CPUs are slightly (5 - 10 watt) better when idle.
So I'm looking forward to them having the better node for the first time ever.
>For whatever reason, Nvidia hasn't stagnated the way Intel has.
Nvidia doesn't fab their own chips and never had a process lead.
People underestimate just how huge a deal Intel's traditional lead in fabrication tech was. I've long argued that the real casualty of Intel's anticompetitive tactics in the early 2000s was AMD being forced to spin off GloFo. Far more than AMD's near term marketshare at the time, it lead to a situation where AMD couldn't really even fall back to their traditional position of competing on price at the low end of the market and played a direct hand in Intel's decade-plus domination of the market.
>Nvidia doesn't fab their own chips and never had a process lead.
I would argue that is nothing to do with Intel stagnation. Look at Intel's leadership and management. Look at Jensen Wong. The last time Intel had any energy at management level were Pat Gelsinger, and they pushed him out.
When I built my PC it seemed like Intel had a pretty small single core edge but amd had twice the cores per dollar. Easy choice if you do anything but gaming imo. I really hope they catch up the GPU space soon.
“A lot of productivity PC” builds are going for $1700 Intel CPUs... really? Even if the $1700-CPU PC market was really popular, tell me: why would they choose the slower of the $1700 CPUs, other than corrupt or beurocrafic business practices?
And don’t try to argue how Intel’s $1700 18-core CPU is faster than AMD’s similarly priced 32-core CPU because Intel’s has slightly faster per-core performance. Such an argument would be absolutely absurd: the point of an 18-32 core CPU is NOT the single threaded performance :)
> Intel is still king in high-end gaming performance
Not really if you consider the price per core. Normally designed games utilize all cores, and something like Ryzen 7 2700X provides a major benefit. Comparable Intel CPUs are a lot more expensive. Their only advantage is higher overclock frequency. But if you need to overclock your CPU to play something, that game is already poorly designed and is probably not using all cores properly.
It means they are poorly designed which is exactly my point. It's not really a measure of CPU quality, but rather the measure of those games quality. Normal games today use something like Vulkan to saturate the GPU and should not be CPU bound.
So if you need a single thread performance that requires overclocking, it's a poor engine design.
> It’s moot regardless since Intel’s i9-9900K has 8 cores and 16 threads too
And costs a lot a lot more. That's why I mentioned price per core above. I'd use such price difference to get a better GPU instead.
Unfortunately, time is at a premium in game development - the amount of crunch is already absurded.
Multithreading hasn’t gotten any easier.
Even when games are forced to multithread like on consoles. Said games run on PCs with half the cores (admittedly at nearly twice the clock speed and higher IPC) outpace consoles with 2x the frame rates.
> And costs a lot a lot more. That's why I mentioned price per core above. I'd use such price difference to get a better GPU instead.
Of course, it’s the best on the market. Intel would be stupid not to charge a premium. It’s how such things are priced.
My point is, for gaming there is no need to spend so much money just to get higher single core frequency. There are some games that are very poorly optimized, but I see them as edge cases which you can skip if it becomes an issue. Most games don't require overclocking really.
That’s correct. I never buy the top of the line because I know it doesn’t have a good cost:benefit ratio.
BUT there are people that want the absolute best available and have the money to afford it ... /shrug
> There are some games that are very poorly optimized, but I see them as edge cases which you can skip if it becomes an issue.
There are a lot of games that aren’t well threaded.
Well multithreaded games are primarily by rich AAA developers - and not even all of them do it; some just don’t have the programming talent for it and some have games that have ran for decades that are too old to multithread without rewriting the whole game.
PS: Sorry for late reply. Apparently people disagreed with me and I had negative Karma for a while. Which slows down posting?
Most game source code I've seen has exhibited this "poorly designed" trait. Some because it was originally written in a single-threaded context and continued to provide shareholder value, and others because it didn't have high enough performance needs to utilize parallelism.
I think that will slowly change over time though, especially for big-budget titles that want to scale with performance better. Architectures like Unity's Job System and the specs package in Rust with a stronger emphasis on staged data processing can help with utilizing cores and cache.
> They're ahead of Intel and are widening that lead.
For awhile in the early 2000's, AMD's CPUs were supposedly better than Intel's CPUs. There was a lot of doom and gloom predicted.
I briefly worked for Intel during this period. At an internal quarterly meeting, they shared some confidential information. It was very simple, and very damning to AMD. (In short, Intel very quickly came back on top for reasons that were very obvious to anyone paying attention.)
I'd love to be a fly on the wall at Intel right now. I wonder if they really are falling behind, or if they know some things that we don't?
I'm sorry, that's deeply revisionist history. Intel kept their dominance through illegal market practices , despite AMD's tech advantage at the time. Intel eventually paid out >$1B, but by then the damage was done and it would take AMD almost 10 years to come back.
> But illegal business practices only have impact on economics.
That's patently false. As OP stated, the biggest impact of Intel's illegal business practices was getting rid of any competition for over a decade in spite of having a technically inferior and underperforming product line.
The basic principle of GPU, performance scales linearly with transistor count and die size. Since GPU is nothing more than a massively parallel beast, the more you throw in the better.
You can't really expect a 400mm2 GPU to compete with 800mm2 square GPU. So unless AMD made a monster size GPU they will never be able to compete directly with Nvidia.
So why doesn't AMD make one? Economies of scale. Nvidia could afford the huge price of design, testing and (relatively) low yield of an 800mm2 chip, as long as they have customers buying bulk of it. Nvidia is basically enjoying all the Deep Learning / Machine Learning Money buying to their CUDA ecosystem, they could afford to make such bet and they are selling it as fast as they could make them.
AMD doesn't have this luxury, and Lisa Su knew that well, that is why they could only compete in segment that makes sense. Until the day ROCm can compete directly with CUDA, and its demand are high enough before AMD could afford doing a monster die size chip. But AMD already has plan to use the same Chiplet strategy for GPU, and hopefully everything learned with EPYC will fully be used for these GPU.
AMD's resources is limited and they selected a proper priority
1. CPU first, GPU next. as a break through in CPU side is easier than GPU side - just go with more cores with chiplet, since intel basically stopped innovation, while GPU side will be much tougher.
2. data center first, consumer/gamer second. Vega is not meant to compete with best Nvidia card, but it was designed to handle both data center/ML needs and gaming need, maybe the gaming version is just a space holder. The data center version will bring more profit and buy time for AMD to develop the software ecosystem -- CUDA is the moat of Nvidia, and AMD need time to overcome that.
So 7nm is used on data center version instead of a gaming card, which make perfectly sense for AMD.
Well, not really, only low end and a bit of midrange. No chance to buy a trivial retina display with any Ryzen U. Like they just dump excess inventories of TN and HD IPS panels at whatever AMD has to offer, even if Ryzen U is far more suitable to 2.5k/3k/4k screens than any Intel UHD.
AMD, please consider making your own premium notebook brand to teach your 3rd party manufacturers what your APUs are capable of!
Specifically retina-style display. I have a perfect vision, can't go back to 1080p and use all-retina/HiDPI screens exclusively for 5 years already. Even got the very latest LG 5k2k ultrawide display yesterday.
I think it's impossible, because the whole concept of Zen architecture is to produce 4-core dies, and then glue them together to create a gigantic 8-core or 16-core processor. There isn't 6-core (or 8-core) Ryzen mobile processor, because it wouldn't physically fit inside the laptop.
But Rome architecture, unveiled yesterday uses 8-core dies, so there's hope.
I was looking into getting an A485 Thinkpad which has a Ryzen in it. Sadly, from reviews it sounded like AMD's platform isn't as good at getting into really low power states as Intel is and that shows up in battery life.
I saw some pretty slick business models come out from HP. Actually held one that somebody showed me, it seemed well built.
It'll obviously take time for high quality AMD-based laptops to become a normal thing, but with the CPU being en par and the GPU clearly being better than Intel's offering, it should really only be a matter of time.
Yo! I just bought an E485 from Lenovo, and it's definitely not "low". 4C/8T 2.5 GHz, Vega 10 graphics, 16GB RAM, 128GB SSD and 1080p 14" matte display.
The A485 is better spec'ed with docking capability, ports, and an external battery.
Linux works on it with some tweaks, and in 4.20 full support is added.
Granted, it's harder to find, but I am hopeful that with Ryzen performing well, and their Vega GPU beating the crap out of Intel integrated and competing with Nvidia MX150, that we'll see more respect from laptop makers.
First there are 'node wars' where what it means to be a "10nm" process or a "7nm" process has become rather murky. This is because transistors themselves don't work well at these sizes and you start getting novel structures which make it hard to compare things. Back when all transistors were flat rectangles it was easier but now they all have some amount of a verticalness to them (FinFets) and there are various patents around this stuff and so nobody talks about 'transistor' size any more they talk about 'feature' size. But what is a feature? Is it a polysilicon line? (equivalent to a trace width on a printed circuit board) Or is it the smallest thing you can render with your lithography process?
But all of that explains when you step back, what has happened to Intel. It used to be that what Intel was doing in there fabs other fabs would take 2 - 3 years to do, and they were big differences, like copper, or finfets, or smaller feature sizes. But as time goes on, the features get harder and harder to develop so when you're two years behind the difference appears smaller and smaller.
What is more the cost of trying something that doesn't work out is more and more expensive and time delaying. And costs are huge here.
That introduces part three of the puzzle, as fabs have been closed while companies switch to using TSMC we have gone from having a dozen semi-conductor companies spending their R&D budgets on their own process improvements in competition with Intel, to those dozens of companies sending their R&D dollars to TSMC who then in aggregate can spend more on R&D than Intel does while still being profitable enough.
So the bottom line is that the market has settled out and there are just few giant foundries (Global, TSMC, Intel, Samsung, Etc.) and one of them doesn't make a business out of others use of their equipment (Intel). Worst, the biggest consumer of silicon has become phones and Intel isn't a serious player.
Intel is under siege and I think they know it (they certainly act like they know it).
In my experience, no company is uniquely talented forever. The reasons for that are complex as well (Christensen did a good job of sketching the mechanisms in the Innovator's Dilemma). Every time you have a CEO change there is a new vision of the "secret of this company's success" which is likely not the same as the previous CEOs vision. As a result different things get emphasized, or priorities get shifted, and then the next thing you know that center of excellence isn't as excellent as it once was.
From an organizational dynamics point of view it really shows the value of process as a means of preserving institutional integrity and durability, but we're humans and communities change. So companies change, some employees leave, some new ones show up, and the mix may not be as effective as the previous mix in getting stuff done.
I really admire Dan Warmenhoven's management philosophy which was very low politics. People serve their ambition in one of two ways, by lifting up the community around them or by pushing everyone but themselves down. The latter type destroy companies and senior management's role is to be the antibody that detects and then removes the offending folks.
So you need to build a healthy community of employees who are working toward lifting the company further. The people who do that, and the community itself, however move on eventually, and the special quality of the group can erode over time.
>I really admire Dan Warmenhoven's management philosophy which was very low politics. People serve their ambition in one of two ways, by lifting up the community around them or by pushing everyone but themselves down.
Just because a company loses its competitive edge doesn't mean that they weren't better at what they did at one point. Sears just filed bankruptcy, but that doesn't take away the impact they had on retail throughout the 20th century.
The explanation I've seen most frequently is that process sizes aren't really comparable between fabs any more. In this case, that would mean that Intel's 10nm process is equivalent to another fab's 7nm process.
Comparing processes between fabs might not make sense, but the real question is — does it still make sense within a single fab? If so, then it's still very much the case that Intel, for whatever reason, has been stagnant while AMD has been moving forward at a fair pace, which still lends credibility to the narrative that AMD is closing the gap and Intel is losing some of its lead. Of course, as has been the case for decades, it still remains to be seen whether this will translate into AMD taking more of the market.
I think the HDL guys were running around with their hair on fire for Spectre bugs, hence why it was a straight transistor shrink with next to no logic changes. It's previously unheard of to not take advantage of a process shrink with logic changes; so much of your logic decisions are ultimately rooted in the process node.
You could still specify the maximum possible transistor density for the process. It doesn't mean a concrete design actually has to use it. Or make it an SRAM bit, because caches take up the bulk of the area anyway.
It's a useful number but its not the whole story. Fitting more transistors into a given area lets you put more chips on a wafer which is good economically. And it correlates with performance but, for example, Intel has traditionally accepted more restrictive design rules in exchange for more performant transistors and that has hurt their effective transistor density even if their individual transistors have been fast.
The problem is you need to be able to produce it in volume. The original projected 10nm is better than TSMC 7nm, that is assuming the 2019 10nm is still the same, which rumoured is not. Will be up against TSMC 7nm+, the next generation of 7nm.
Let's just assume they are both equal in absolute terms. By Late 2019, Intel would have barely launched 10nm and possibly shipping in 30 - 50M quantity ( And I think even that is an optimistic number ). TSMC wold have shipped more than 300M 7nm across their entire 7nm generation.
And TSMC has 5nm ready in 2020. I don't think Intel will have their EUV 7nm ready even in 2021.
Combined with the fact there is exactly only ONE, one EUV equipment maker on the market, ASML. And they have limited capacity in producing these ASML machine. As far as I am aware all of the 2018 and 2019 capacity are already locked to Samsung and TSMC.
Yeah, I don't think anyone would argue that Intel has been having issues in recent years shrinking their process. The interesting question would be whether other fabs would end up facing similar issues with their next process node.
A nanometer is the same everywhere, but what you’re measuring isn’t. When they say 7nm, are they talking about the smallest feature they can produce, the minimum wire size, the minimum transistor size, the average transistor size, or...?
For an analogy, a GHz is a GHz everywhere but that doesn’t mean a 3GHz CPU is always faster than a 2GHz CPU.
If AMD can suggest that they are on smaller process size because they are measuring a smaller feature, why wouldn't Intel just start measuring the same feature on their chips? I have trouble believing they would stick to some principle about what is the right feature to measure at the cost losing out on marketing themselves.
7nm does not refer to any feature size. Process node names have continued to follow the pattern of the next node being named as roughly the current node divided by sqrt(2), even though density increases are no longer coming from simple uniform horizontal shrinks.
Like others wrote, there's no longer a standard for what the measurement actually means. Most structures aren't actually 7nm in a 7nm process.
For example, a typical metal pitch on the low metal layers is 40nm, meaning you get one wire every 40nm, or 25 wires in parallel in a 1um channel.
What Intel is calling 10nm does indeed appear to be close to the others' 7nm. Then again, Intel is seriously behind on 10nm, so the bottom line remains the same: they seem to have essentially lost their process advantage.
My best guess is a combination of historical inertia and the fact that the names are actually meaningful, just not in the way that one might naively expect as an outsider.
When the foundry sends you a design kit which contains all their design rules and tooling around a process, then this process has some codename that appears everywhere (think filenames, names of library elements, and so on). This codename tends to be something like GF14 (for GlobalFoundries' 14nm) or N7 (for TSMC's 7nm) plus cryptic suffixes for different revisions of the foundry process.
So the 14/12/10/7nm terms are actually part of the design engineers' everyday work flow. They just also filtered through to marketing for whatever reason.
I could imagine that at some point in the future, foundries will switch to a year-based versioning similar to what happened with a lot of software. So you'll have a GF2027 process and so on. That's pure speculation on my part though, and inertia is definitely a thing.
Nanometers is a marketing term now, just like frequency used to be a marketing term when talking about CPUs.
In reality, size of various features in a CPU differ widely. Intel 10nm could have a transistor gate pitch of 50nm, while TSMC 7nm could have a pitch of 60nm. All the meaningful parts you care about, like size of the transistor components and interconnects, are _not_ small, and it every company designs their own tweaks on these building blocks for reliability/manufacturability/performance/power/etc.
Semiaccurate actually has some pretty detailed dives into exactly what is going wrong with Intel's 10nm but it's all subscriber only. They've also done some reporting on leadership problems inside Intel that I think make the execution failures more understandable.
Intel wanted to do a number of ambitious things with their 10nm node such as putting contacts over gates and using cobalt for wiring. These were gambles of the sort they had made before but this time the advances just didn't work out. Now they're trying to do a 10nm process without CoG and I hope it works out this time.
But the story we want told is who got complacent? Was it the C-levels making budget decisions? Was it engineering staff that left for greener pastures or was it R&D decisions that ended up at dead ends with nothing to show for it?
Its easy to point to the big player and say they lost their lead, but the fine details about who made decisions to land them there is the story we want told.
The best way to compare CPUs would be performance/watt benchmarks which is what matters ultimately to the customer; in this case, data center customers. Let me clarify - we are not talking about Geekbench type benchmarks. Each customer has their own validation & qualification process for new datacenter chip procurement.
Public gets hung up with all kinds of marketing terms (7nm, 7nm+, 10nm, 10nm+, 10nm++, etc). What does 7nm+ even mean!? It is purely a marketing term and large datacenter customers know this. They run their workloads on test samples and make a decision to go with Intel or AMD.
Furthermore, there is also the aspect of maintainability, servicing and infrastructure inertia that is priced into Intel's server chips. Apple-to-Apple chip comparison (sorry for the pun, not intended) from Intel & AMD would not be priced the same since Intel knows that there is a giant amount of switching inertia for a customer to switch to AMD Epyc. Furthermore, datacenter customers want predictability and proven performance. In this case, Intel again wins with its history and you betchya its modeled in the pricing.
So, this is all business as usual. HN loves beating on Intel but their numbers in quarterly reports depict a different story.
Let me repeat: No sane customer gives a shit about 7nm or 10nm. My comments are only applicable to datacenter customers. Desktop/Client chips are a whole another enchilada where marketing plays a bigger role (have you seen the ridiculous packaging from AMD & Intel? This is to please the RGB Gamer crowd).
You can always use your 1nm tech to craft more precise circuits of the same size, packing them better. When it comes to the working chips that are made, assuming some tighter lithography that's equally functional to the previous one, it's only going to benefit.
Basically it seems like Intel kind of over-extended on their 10nm process by trying to introduce a bunch of new techniques, and they had trouble scaling this to volume production. But I think it's generally accepted that the Intel 10nm process and other manufacturers' 7nm processes were broadly equivalent, and it seems unfair to accuse people of being shills for thinking so!
These are very nice and detailed feature figures, but I do not see that intel has any advantage from them (possibly my reading comprehension is failing here...). Can you cite a source that clearly shows intel is manufacturing and selling something better than the competing fabs?
Increasing the vector width to 256 bits (assuming no crazy thermal throttling) is a pretty big deal and would get me to move off Intel, unless Intel can figure out 512 bit widths without massive throttling.
That's really a matter of Intel's 10nm process (which is roughly equivalent to TSMC 7nm).
AMD used 128-bit and simulated 256-bit by doing 2 passes. This reduced peak power consumption and allowed them to keep clock consistently high. That matters because while your AVX is going slowly, your non-AVX is also going slowly. There was simply no way x86 could do vectors that wide on 14nm without throttling.
With the 7nm shift, AMD can use the reduced size to increase to native 256 at full speed (and they may do 512 in 2 parts). I expect Intel to do the same when they get replace their 10nm process with something that works. It'll probably be a couple more shrinks though, before 512 can be run at full-speed.
This AMD card can compete with NVIDIA's high end Tesla V100 accelerator.
At 7.4 TFlops of double-precision, it is smack in the middle between the PCIe version of the V100 at 7.0 and the NVLink version at 7.8.
Memory bandwidth for the MI60 is a bit better at 1000GB/s, compared to the Tesla V100's 900GB/s.
However, AMD's problems are usually not the actual hardware, but the software around it. NVIDIA has done amazing work with CUDA and the surrounding frameworks, while AMD has not really. They really need to catch up on software that makes writing code for their GPUs more trivial.
It is pretty crazy. I felt the same way. Individual x64 cores tend to be so much more powerful than other architectures, and now single chips will effectively have 128 logical cores.
For my purposes (large builds and rendering), I think RAM prices are holding back AMD here. To feed that many cores, you want really big RAM sticks. The CPUs have become a comparatively small cost compared to the RAM these days.
I've recently built a TR-based DL/ML workstation and bought 128GB ECC 2,667MHz UDIMMs for ~$1600, roughly the same price as 2990WX, but would have vastly preferred to get 256GB instead. Unfortunately, only Samsung is now sampling 32GB ECC DDR4 UDIMMs - I haven't seen them anywhere yet, and I expect the price is going to be insanely high :-(
The most important bit WRT to TR3 is going to be the central I/O chiplet instead of dividing memory controllers between individual Zeppelin dies. No more NUMA headaches to deal with on their workstation/enthusiast CPU's, I'm glad that AMD saw that such an approach wasn't going to work long-term (at least not for the time being when basically anything outside large database systems and hypervisors lack even basic NUMA-awareness).
It's really hard to say what the memory latency is going to be, but at the very least this will mean that latency will remain consistent for access to every installed DIMM regardless of which CCX the request originates from.
On that note I'm really interested to see if a dedicated I/O chiplet will help with the memory frequency scaling issues with see with the IMC on Zen/Zen+. I'm not sure what made the integrated controller on Zen so finicky compared to Intel's IMC, but this move will at the very least allow AMD to bin memory controllers if they want to or maybe work around some issues with their design.
Is anyone knowledgeable to comment about the memory bandwidth. I thought Zen-1 was eight channel with 32 core, now the Zen-2 is the same eight channel with 64 core. Wouldn't that cause issue or can the new memory system be that better?
The other interesting thing is that they said the memory access would be more uniform kind of like NUMA independent given that the controller is no longer part of the individual chip but a common element. Which definitely makes good performance easier with such a beast of cheap but does it do so at the cost of the lowest possible latency as in when in Zen-1 the memory access was from a channel in the same CPU. I would hope that a massive single piece of IO chip would allow them to design the thing better but does anyone know or care to guess?
Probably bodes for worse memory performance for (most) VM hosts and better memory performance for (most) bare metal workloads. I don't think anyone is concerned with many-VM class workloads not having the highest possible memory throughput though and I doubt going to even 16 channels would make a big differences anyways. It'll be somewhat easy to find out if memory bandwidth needs to scale linearly or you hit massive perf losses in the real world though as Intel's competing 2x24core die was announced to have 12 channels.
There was a longstanding rumor that the IO chip was going to have ~512mb of l4 cache. Considering it wasn't announced I'm thinking that turned out to not be true but from a pure performance perspective that is probably more useful than a couple more memory channels (though likely more complicated).
I'm really curious about cache on the IO die. Ian Cutress of Anandtech said Rome is ~ 1000 mm² in total.  Based on that and the pictures of the Rome die that were shown some users  estimated the size of the IO die to be 387 - 407 mm².
- Zen 1 (8 cores) with IO is 213 mm².
- the Zen 2 core only chiplets are estimated to be around 70 mm².
- if we assume IO scales as well as the rest (which it doesn't) Zen 1 would be ~106 mm² on 7nm.
- let's just say the difference between Zen 2 core only chiplets and the imaginary Zen 1 on 7nm is the size of the IO per Zeppelin die => ~36 mm²
- now double the area again because the IO die is on 14nm => 72 mm²
- now quadruple the size because we have 8 memory channels and 128 PCIe 4.0 lanes => 288 mm²
Going by my flawed layman estimation this would mean we still have a budget of ~100 mm² for additional functionality. Either PCIe 4.0 takes much more space than PCIe 3.0, they have some secret sauce in there, or maybe just a large L4 cache.
If they use EDRAM instead of SRAM like Intel did with some of their Broadwell and Skylake CPUs they could probably fit quite a bit cache in this area. Intel used 128 MB EDRAM fabbed on a 22nm node which required 84mm ² 
I don't think it's one or the other rather both. TSMC is definitely starting to lead the foundries but e.g. nobody expects the next gen Qualcomm Snapdragon chip to beat the Apple A12X even though both are coming out of TSMC.
IMO Intel is lagging on both fronts, AMD is catching Intel a bit, ARM is steamrolling year/year perf increases compared to x86, and Apple remains +25% ahead of every other ARM chip.