NVIDIA Ampere A100 GPU Specs, Ray Tracing, $200,000 DGX-3, & EGX | GTC 2020 Keynote Recap

by birtanpublished on September 2, 2020

– might have some competition from Nvidia CEO Jensen Wan who today hosted his GTC 20/20 keynote from his kitchen so he's going for the old lioness kitchen set look as opposed to the bill – kitchen in your office look and the keynote focused on ampere and videos new

Architecture and the launch thereof there are a lot of points to go through today Andrea's GTC events are typically focused on data center deep learning AI things of that nature but a lot of the

Technology comes to gaming and there was explicit mention of gaming technologies at Nvidia's GTC 20/20 keynote this year including of course RT X and some mention of DL SS as well so we're going to talk through all of it today

Including ampere and its data center installment right now and how that will come down to gaming later before that this video is brought to you by CD projekt red and the cyberpunk 2077 PC modding contest the cyberpunk 2077 team

Is hosting a case modding contest that gives winners the opportunity to work with professional case modders to build the ultimate system you don't have to do any physical modding to enter the contest just a mock-up with three views

Of the mod the theme is the future is recyclable the contest ends on May 17th and cyberpunk steam will select five winners to partner with pro case motors to make it a reality learn more and enter the contest using

The link in the description below or go to cyberpunk net cyber up based on key notes from Nvidia in the past we end up with things like a Volta Torian situation where Volta wasn't really it technically had a

Deployment in the Titan VIII but for the most part it wasn't really a type of card our audience would buy taurine did end up picking up where Volta left off though and so ampere is probably going to be a scenario where it looks like

There's going to be a direct gaming deployment of the ampere architecture as opposed to something like a Volta Torian split but we'll find out later in the year for that it's still relevant though and going through the initial

Installment of ampere is going to be important for setting the stage for what anybody is working on for the rest of the year we ended up watching all of the company's uploads it had about eight at the time of filming to its YouTube

Channel and we thought that it was going to be a live stream it's a pre-recorded video but instead they just ended up doing a straight upload with a part admittedly it was at times difficult to listen to because the editing was

Pretty rough there were a lot of correction phrases that were thrown and spliced into it we're obviously now ampere is the largest most complex processor the world has ever made thousands of engineers worked on it for

Several years and it came together in this one incredible chip someone decided that they wanted to add clarifying statements into an existing cut and as we know from recording you record voiceover at different times of the day

And your voice sounds different so we can pick it out but for the most part it doesn't really matter it's just two keynote all right so there wasn't a lot of r-tx news let's start with the gaming stuff about time stamp but hopefully

Below and there were some RT X news where the company mostly reminded everyone of DL SS and its existence it showed off the LSS 2.0 again this was not an announcement to DL SS 2.0 that's already been announced and from the

Deployments we've seen so far it is genuinely improved over DL SS and its original deployments and even the more recent 1.9 or whatever it was Quanah acknowledged that quote most people didn't think this would work

Speaking of DL SS and showing the initial blurrier installation of it and he even acknowledged that quote the first generation was a little bit blurry and it really was so it's good to see that Nvidia his self-aware Nvidia claims

That DL SS 2.0 did a better job than 1080p native did we just zoom talking about 720 p dl SS regarding RT ax and ray tracing products it was mostly re-emphasis of the product stack and the focus of ray tracing in terms of just

Concept and getting it into gaming more no new r TX cards today not really that surprising they said quote when we launched people were skeptical but now it's here and just to be fair here people were skeptical because when an

Video launched the RT X cards were dying left and right we had viewers sent us like 13 or 14 or 16 hours a lot about RT x cards that were dead and also to be fair when you launched Nvidia it was about 55 days

Until the first RT X game came out for a card that had RT X in the name there's another marketing issue or just like what a hamdi has done you end up in a scenario where you're my getting a product on something with at

Launch at least for RTX questionable existence so the naming really mattered there that's why people were skeptical to give an Vidya credit absolutely they are correct that DL SS 2.0 is massively improved and RT X in a lot of ways is

Has finally gotten wider spread and gaming look at control for a better implementation of it or minecraft for a look at the most recent implementation that we've tested on the roost Nvidia talked about its omniverse solution

Which is an RT x server filled with RT x 8000 Quadros none of these feature new GPU architecture the company showed off a fully playable physics based ball game kind of like spectra ball or other classics camera movements in that were

Somewhat nauseating to watch it was really hard to look at but the graphics were the focal point and you can look at the demo on the channel if you want Nvidia amp here is the big one here so Nvidia did its usual world's largest the

GPU announcement you've likely seen the Jenson pulling a video card out of his oven clip that circulated the other day and that's what this was Nvidia ampere and the 8 100 processor board is what got the unveil in Jensen's

Kitchen and this is built for data center and enterprise use and it isn't a gaming product clearly but it is something that will eventually feed into them and it's worth covering just because it's a major advancement in its

Respective space we learned more about that board in today's video keynote the a100 processor board weighs 50 pounds it hosts 8 GPUs via the new NV link 600 gigabyte per second interface and it has 6 switches some additional interesting

Facts that were thrown in just for fun include one kilometer of copper traces connecting all the hardware and 1 million drill holes to hold it all together that last fact isn't much of a surprise

Given how much Nvidia likes its screws and r-tx cards but Nvidia also noted that it's comprised of over 30,000 components so those were some busy SMT lines and it also noted quote that it has the most transistors on one computer

Ever mate and videos ampere also features a new MIG architecture or multi instance GPUs for elastic GPU computing after a somewhat strained rocketship analogy Quan ended up explaining that MIG allows each a 100 GPU to be split

Into up to seven instances you cannot run it as one GPU or as any subset of one to seven and most cards as they're used now are typically in a one GPU solution but not sure if virtualization is the correct terminology to be used

Here but it's sort of virtualizing seven instances isolated for applications or isolated users in those instances for data center the implication is that you can have down costed access to less computing Hardware for applications that

Don't need as much horsepower or you can run standalone GPU solutions to host a higher end user and they're just sub users on a card so if you're Amazon you may end up doing something like selling six instances to a major AI company and

Then selling one instance to a university student or something like that amperes focus is on inference and training and the partitioning into smaller GPUs seemingly is its primary

Selling point other than the just direct speed improvement over the previous generations and be 100 Nvidia says that data centers can be architected such that smaller GPU partitions are used for scale out applications or that large GPU

Instances are used for scale up applications this is out of our coverage scope so we're just giving you the keywords and if you work in this area hopefully that means something to you for those in our audience who may do

More artificial intelligence deep learning or machine learning applications we know there's a few of you we'll go through some of the stats that Nvidia presented in its keynote today and videos performance slide had

Only the data labels for peak performance but most of the peaks were close enough to the sustained average and at Cambridge Nexus we admittedly aren't sure exactly what researchers for this type of card want inspects anyway

It's not our coverage area but for the numbers given Nvidia claims that a 100 FP 64 double precision performance is 20 teraflops versus 8 on V 100 volt o with FP 64 assuming the same approach to measuring that number that's obviously a

Big jump but how that translates to real-world performance will hinge on the application again in the world of gaming that we it's certainly not linear to go from tea flops translating it to something like

FPS or frame times but we aren't sure exactly the deployments here the tensorflow performance was listed as 16 teraflops on the v 100 or for FP 32 tensor float or 160 on the a 100 and sparse data optimisation at 310

Teraflops for peak FP 32 attempts our float rep pete 16 and video noted the 8 100 sparse data performance at 625 non sparse performance at 310 teraflops and v 100 at 125 and video separately noted that most people use FP 32 for their

Work in the space and so it focuses on FP 32 and not FP 16 into 8 or integer 8 performance has Nvidia claiming that the a 100 is quote the first processor over 1 petaflop marking the a 112 50 teraflops peak for sparse 625 non sparse

And using the V 160 teraflops 4 into 8 as a reference point Nvidia also used a speech recognition demo to identify Birds based on the sounds they made primarily using this as an example of how the instances

Processed data when split or combined on the a 100 cards with all seven MiG's working as a single GPU just as a reference point and video noted 500 queries per second whereas it compared this to Voltas 80 queries per second for

The same application the next major announcement was D GX which is Nvidia's mini supercomputer that it solves to business and enterprise and datacenter clients we've actually seen parts of the definitely not DG acts being made at the

Definitely not Coolermaster Factory as well and the new one features the same gold mash faceplate as the previous generation of DG X and videos new DG X a 100 solution is the third installment of its dgx line it's had to before this

Nvidia says that it's been optimizing for training data analytics and for inference and because the DG X has 8 of the Nvidia a 100 GPUs it can be split into 56 instances for simultaneous users or it can be used as a GPUs if you want

To do it that way the machine has nine Mellanox ex6 interconnects this is worth pointing out just because Nvidia only recently acquired Mellanox emerged or at least well eight Mellanox I was gonna say merged but didn't really

Do that they just consumed that their Knicks are at 200 gigabits per second and 4 CPUs NVIDIA has tapped AMD is at 64 core Epic Rome processors it's got 2 of them in the new dgx version 3 and those are running 120 8 cores total for

Each DG Xbox including one terabyte of memory between them for system memory the new NV link also makes an appearance again at six hundred gigabytes per second really important note here and V link in these applications is not the

Same as that bridge that you get for gaming cards and V link for gaming is still using SLI in terms of its architecture but NV link and data center is dealing with a lot more data going across the bridge so when they say new

600 gigabyte per second on V link don't think you're gonna get a 600 gigabyte per second NB link bridge for gaming because it's not gonna need all that data they're already at cut-down rates for the existing deployments and that's

What makes them so much cheaper for gaming versus for the professional bridge as you can buy so other specs on dgx include 15 terabytes of PCIe nvme storage for SSD storage and 4.8 terabytes per second of bi-directional

Bandwidth the dgx will cost one hundred ninety nine thousand dollars making it actually pretty significantly cheaper than previous dgx as we've seen where they've been four hundred grand for example and if you felt buyer's remorse

From the twenty seventy super launch about buying an RT X twenty eighty and maybe being able to have spent $200 less if you had waited a few months imagine being the guy who bought a four hundred thousand dollar mini supercomputer and

Then hearing that the new ones two hundred grand Nvidia gave an example of a twenty five rack AI data center priced at eleven million dollars estimated and said it required in this example six hundred

Thirty kilowatts to run noting it had 50 DG x1 system so that's not the most recent but two generations ago with six hundred CPU systems for inference and said that that's compared to the newest dgx a one hundred solution which would

Be 1 million dollars for one rack at 28 kilowatts with a major space reduction anybody's the PageRank algorithm as a benchmark here and and crawl dataset to test performance of the two devices noting 2.6 terabytes of

Data and 128 billion edges calling this a small fraction of the Internet there's definitely correct on that and video said that it usually takes three thousand servers and 105 racks the same time to analyze fifty two billion edges

Per second versus four dgx a 100s via and vlink combining them to one giant dgx basically as a as Jensen Wan described it to process 688 billion edges per second Wan then made the usual the more you buy the more you save

Comment but this time kind of laughed at himself so he's apparently and on the choke now and Vidya also announced it's Nvidia egx a 100 solution and this seems to have a heavier focus on security and

Authenticated boot and IOT to some extent eg X has a Mellanox connect X 6 network card that's integrated on to the PCB onto the a 100 and this solutions 100 gigabit per second Ethernet or InfiniBand onboard solution the two

Together are what differentiate this as egx as opposed to the other a 100 deployments that we've seen in videos keynote today it noted that its focus for these is on automation and advancement in training and highlighted

That it's got a partnership with BMW now where Nvidia provided some stats that are actually kind of interesting for car enthusiasts where they said Nvidia builds 40 car models with 100 options every day it apparently imports 30

Million parts to do this across 2,000 suppliers distributes those two apparently 30 factories and then referenced just-in-time manufacturing where new crates of parts are dropped off as the old ones are getting emptied

And it's almost at least this the assembly part is almost entirely done by automation at this point or human assisted automation so this is all done to assemble one car in 56 seconds which is insane for a number of reasons well

That's really more of a BMW stat than it is an Nvidia stat either way Nvidia is partnering with BMW USPS and some others for robotics and automation training so that'll be an interesting interesting thing to watch videos of you like to see

Robots doing stuff they have some clips of it they're keynotes about this that will cover us for all of Nvidia's announcements for GTC 2020 the big ones obviously ampere it is not currently a

Solution that we're going to be getting here and overclocking or working with for gaming tests but it's something that should lead into the next gaming architecture which is definitely due out this year for another r-tx

Probably three thousand threes launches everyone's calling it right now so that's it for this one thanks for watching and subscribe for more as always get a patreon our communes Nexus or store documents access net said

Directly and we'll see you all next time

