6th HLF – Turing Lectures: A New Golden Age for Computer Architecture

Video in TIB AV-Portal: 6th HLF – Turing Lectures: A New Golden Age for Computer Architecture

Formal Metadata

6th HLF – Turing Lectures: A New Golden Age for Computer Architecture
History, Challenges and Opportunities
Title of Series
No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties.
Release Date

Content Metadata

Subject Area
David A. Patterson: "A New Golden Age for Computer Architecture" In the 1980s, Mead and Conway democratized chip design and high-level language programming surpassed assembly language programming, which made instruction set advances viable. Innovations like Reduced Instruction Set Computers (RISC), superscalar, and speculation ushered in a Golden Age of computer architecture, when performance doubled every 18 months. The ending of Dennard Scaling and Moore’s Law crippled this path; microprocessor performance improved only 3% last year!  In addition to poor performance gains of modern microprocessors, Spectre recently demonstrated timing attacks that leak information at high rates. The ending of Dennard scaling and Moore’s law and the deceleration of performance gains for standard microprocessors are not problems that must be solved but facts that if accepted offer breathtaking opportunities. We believe high-level, domain-specific languages and architectures, freeing architects from the chains of proprietary instruction sets, and the demand from the public for improved security will usher in a new Golden Age. Aided by open source ecosystems, agilely developed chips will convincingly demonstrate advances and thereby accelerate commercial adoption. The instruction set philosophy of the general-purpose processors in these chips will likely be RISC, which has stood the test of time. We envision the same rapid improvement as in the last Golden Age, but this time in cost, energy, and security as well as in performance. Like the 1980s, the next decade will be exciting for computer architects in academia and in industry! This video is also available on another stream: https://hitsmediaweb.h-its.org/Mediasite/Play/8832b244b24f4030ad62aca59c4dbeff1d?autoStart=false&popout=true The opinions expressed in this video do not necessarily reflect the views of the Heidelberg Laureate Forum Foundation or any other person or associated institution involved in the making and distribution of the video.

Related Material

Slide rule Perfect group Open source Video projector Set (mathematics) Hidden Markov model Mass Rule of inference Revision control Architecture Goodness of fit Googol Series (mathematics) Alpha (investment) Turing test Touchscreen Projective plane Computer Instance (computer science) RAID Microprocessor Word Befehlsprozessor Internet forum Computer science Musical ensemble Row (database)
Turing test Multiplication sign Interface (computing) Computer Computer Product (business) Subset Architecture Broadcasting (networking) Software Computer hardware Architecture Software Computer hardware Interface (computing) Buffer overflow Row (database) Computer architecture
Turing test NP-hard Game controller Divisor Datenpfad 1 (number) Set (mathematics) Data storage device Stack (abstract data type) Mereology Computer programming Computer Sequence Architecture Array data structure Different (Kate Ryan album) Microprocessor Computer hardware Logic MiniDisc Gamma function Computer architecture Operations research Execution unit Content (media) Computer Independence (probability theory) Bit Line (geometry) Control flow Computer Computer programming Number Process (computing) Software Logic Compiler Read-only memory Video game Condition number Family Physical system Row (database)
Turing test Game controller Digital electronics Multiplication sign Datenpfad Set (mathematics) Mereology Computer Moore's law Read-only memory Semiconductor memory Logic MiniDisc Mainframe computer Clique-width Server (computing) Decimal Bit Machine code Control flow Computer Surface of revolution Microprocessor Read-only memory Order (biology) Row (database)
Intel Building Group action Mainframe computer Multiplication sign 1 (number) Set (mathematics) Compiler Function (mathematics) Computer programming Formal language Measurement Blog Scalable Coherent Interface Kolmogorov complexity Decimal Open source High-level programming language Computer Bit Computer Formal language Microprocessor Time evolution Compiler Interface (computing) Computer science Self-organization Resultant Turing test Translation (relic) High-level programming language Average Computer Local Group Architecture Moore's law Term (mathematics) Average Microprocessor Software Operating system Energy level Compilation album Computer architecture Server (computing) Projective plane Usability Machine code Binary file Computer programming Compiler Similarity (geometry) Personal computer Software Interpreter (computing) Video game Speech synthesis
Group action Reduced instruction set computing Divisor Multiplication sign Set (mathematics) Compiler Content (media) Computer Machine code Number Architecture Mathematics Cache (computing) Read-only memory Semiconductor memory Well-formed formula Interpreter (computing) Analogy Computer programming Software Microprocessor MiniDisc Resource allocation Computer architecture Self-organization Computer program Sound effect Physicalism Numbering scheme Grass (card game) Compiler Cache (computing) Word Resource allocation Software Graph coloring Computer hardware Interpreter (computing) Routing Reading (process)
Presentation of a group Reduced instruction set computing Digital electronics Multiplication sign Gradient Moment (mathematics) Student's t-test Reduced instruction set computing Computer Demoscene Microprocessor Architecture Independent set (graph theory)
State observer Intel Server (computing) Building Reduced instruction set computing Overhead (computing) Divisor Scaling (geometry) Multiplication sign Translation (relic) Compiler Food energy Area Number Power (physics) Architecture Moore's law Mathematics Interpreter (computing) Befehlsprozessor Computer programming MiniDisc Physical system Area Block (periodic table) Server (computing) Projective plane Physical law Computer Food energy Line (geometry) Reduced instruction set computing Power (physics) Microprocessor Category of being Population density Moore's law Software Computer hardware System on a chip Revision control Point cloud Endliche Modelltheorie Task (computing) Convolutional code
Scaling (geometry) Multiplication sign Computer Product (business) Time domain Architecture Different (Kate Ryan album) Term (mathematics) Microprocessor Computer hardware MiniDisc Programming language Physical law Computer program Mereology Computer Limit (category theory) Microprocessor Single-precision floating-point format Process (computing) Software Architecture Problemorientierte Programmiersprache Figurate number Task (computing) Resultant
Intel Digital electronics Multiplication sign Set (mathematics) Programmer (hardware) Single-precision floating-point format Matrix (mathematics) Multiplication Area Point (geometry) Parallel port Computer Formal language Microprocessor Latent heat Architecture Buffer solution Right angle Problemorientierte Programmiersprache Cycle (graph theory) Programmschleife Resultant Game controller Divisor Computer Field (computer science) Time domain Architecture Sic Cache (computing) Read-only memory Band matrix Operator (mathematics) Computer programming Software Computer hardware Energy level Domain name Multiplication Characteristic polynomial Parallel computing Independence (probability theory) Core dump Cartesian coordinate system Computer programming Compiler Vector potential Cache (computing) Word Software Revision control Multi-core processor Table (information) Integer Matrix (mathematics)
Virtual machine Field (computer science) Surface of revolution Product (business) Architecture Moore's law Tensor Googol Query language Process (computing) Dean number Area Covering space Machine learning Execution unit Matching (graph theory) Computer Computer network Translation (relic) Virtual machine Microprocessor Tensor Number Latent heat Googol Query language Telecommunication Inference File archiver
Divisor Weight Multiplication sign Execution unit Virtual machine Bit rate Dynamic random-access memory Power (physics) Product (business) Measurement Architecture Benchmark Read-only memory Semiconductor memory Befehlsprozessor Maß <Mathematik> Graphics processing unit Machine learning Graphics processing unit Execution unit Standard deviation Electric generator Suite (music) Cartesian coordinate system Benchmark Measurement Microprocessor Befehlsprozessor Energy conversion efficiency Matrix (mathematics) Resultant Data buffer
Intel Building Thread (computing) Multiplication sign Execution unit Artificial neural network Virtual machine 1 (number) Computer FLOPS Supercomputer Power (physics) Wave packet Architecture Inference Cache (computing) Read-only memory Different (Kate Ryan album) Googol Befehlsprozessor Software Computer hardware Binary multiplier Graphics processing unit Intel Floating point Core dump Field programmable gate array Bit FLOPS Measurement Thread (computing) Computer hardware Architecture Binary multiplier Moment <Mathematik> Right angle Energy level Whiteboard Electric current
Reduced instruction set computing Arm Open source Multiplication sign Open source Mereology Student's t-test Arm Architecture Tablet computer Software Compiler Architecture Software Computer hardware System programming Social class
Standard deviation Reduced instruction set computing Opcode Digital signal Product (business) Mikroarchitektur Tablet computer Different (Kate Ryan album) Architecture Different (Kate Ryan album) Spacetime Modul <Datentyp> Extension (kinesiology)
Implementation Reduced instruction set computing Freeware Proxy server Multiplication sign Set (mathematics) Translation (relic) Feldrechner High-level programming language Open set Arm Computer programming Time domain Architecture Read-only memory Interpreter (computing) Software Computer hardware Speichermodell Implementation Computer architecture Scale (map) Standard deviation Interface (computing) Expert system High-level programming language Computer Core dump Field programmable gate array Group action Control flow Reduced instruction set computing Open set Formal language Latent heat Software Compiler Computer hardware Architecture Order (biology) Inference Interface (computing) Interpreter (computing) Partial derivative Information security Block (periodic table) Fingerprint
Building Modal logic Multiplication sign Combinational logic Open set Field programmable gate array Mereology Food energy 32-bit Geometry Mathematics Machine learning Semiconductor memory Single-precision floating-point format Personal digital assistant Core dump Logic Office suite Physical system Social class Area Texture mapping Gradient Floating point Computer Student's t-test Mereology Control flow Expandierender Graph Computer Reduced instruction set computing Degree (graph theory) Process (computing) Architecture System programming Computer science Right angle Relief Laptop Slide rule Reduced instruction set computing Virtual machine Student's t-test Computer Field (computer science) Product (business) Architecture Moore's law Computing platform Computer architecture Graphics processing unit Artificial neural network Weight Forcing (mathematics) Projective plane Planning Cartesian coordinate system Mathematics Algebra Software Integral domain Personal digital assistant Factory (trading post) Internet forum Video game Videoconferencing Family
Existential quantification Feedback Multiplication sign System administrator Decision theory Maxima and minima Disk read-and-write head Field (computer science) Sign (mathematics) Goodness of fit Term (mathematics) Bus (computing) Mathematical optimization Position operator God Vulnerability (computing) Family Prisoner's dilemma Feedback Moment (mathematics) Physical law Electronic mailing list Sign (mathematics) Process (computing) Logic Personal digital assistant Internet forum Natural number Musical ensemble Videoconferencing Family
good morning I'm Raymond Seidel i'm the victor of schloss dark stool for those who don't know what that rule is it's the computer science version of over alpha for those who don't know what either of those two is just ask people around you I have the honor to deat through the first session this morning and to start out with a question to the young researchers here who is interested in advice on having a bad career if so David Patterson our first speaker today is a perfect source for advice along with lots of good advice to have a good career actually here's a source for good advice for all kinds of things like how to run a big research project and all those pieces of advice are kind of reduced to the essential to the important so coming to the reduced he also reduced the instruction sets of processors to the important and the efficient among many other things he has done like for instance here was one of the co-inventors of raid so redundant array of independent disks something that many of us maybe even use at home today and I'm very happy that he'll tell us a little more about reduced or computer processors and what they're going to be this morning so please Dave after stage a very very long word okay I am going to attempt to give a talk the where I will explain the jargon which not a lot of mass but we've got a lot of jargon in our talks but I would need the slide that's on the projector to be projected do I need to I don't think I have to do anything it should be on one of those two screens but I could tell you while we're waiting for the screening come up we are doing something at Berkeley right now to celebrate our record and touring awards as remarkably there have been seven research projects at Berkeley that have led to Turing awards and so we're starting a series in fact it started in the middle of last night with a Shafiq goldwasser doing it hmm do the people in the back I'm up okay so this series is
going to be every Wednesday it's one o'clock in the morning German time but four o'clock on the west coast and you can see several the people here are gonna be speaking Shafi gave the first talk last night too and oh an overflow audience that's in the back and then in the room out outside so I think this will be well
attended and it's broadcast as well as videotape so you can follow along this record so what I'm going to do today is give you the history of computer
architecture 50 years half a century in ten minutes and then talk about some of the challenges and opportunities but it's a subset of the talk of the Turing lecture that's already been presented you can see there so and I think from that little century half century there's a three lessons that will drive the future as well as it drove the past is that software advances inspire Hardware people that when we raise the hardware software interface we can that leads to great opportunities in architecture and finally the way we settle debates in computer architecture is we spend billions of dollars building products and then the winner winter one I guess so let's go back more than 50 years and IBM had this problem
they had four independent lines of computers and what principally they had different instruction sets so what's an instruction set so when the hardware is directed by software it has a vocabulary and that vocabulary is what we call an instruction set so they had four independent vocabulary so they had four different software stacks for different market issues marketing teams on so IBM engineers decided that was a bad idea so they wanted on one vocabulary for all markets 1 instruction set to rule them all so that was the IBM 360 now the problem with those 4 independent lines of computers and to get them do the same thing they needed a a some kind of gimmick some kind of idea why they could do it so they went back to one of the pioneers of computing maurice wilkes who is the second person to win the Turing
award and he had an idea that was inspired by software so then and now the hard part of computer design isn't the data path that's kind of the Brawn that's that that's the thing that does the work it's the brain it's the control that's the hard part so his idea is that we could specify the contents of the brain as a two-dimensional array where every row of that two-dimensional array is he called that a micro instruction and you put zeros and ones of the control it and then the process of filling in that array was called micro programming and the technology to build that is typically a read-only memory that its contents is filled at manufacturer and that was much cheaper than doing it in logic so that was it that was the first example of software inspiring hardware so IBM took that idea and in April 1964 made the biggest announcement in the life of the company they announced these new idea of a family of computers that were all could run the same vocabulary the same instruction set the IBM system/360 here's four examples you can see the data path to think that doesn't work varied from 8 bits to 64 bits so that's a factor of eight and the speed varied and the Hoff's varied a lot so the the 8-bit one was one and a half million dollars in today's things and the big one was eight eight million dollars so IBM bet the company on this idea and they won the bet this led who led this the person in the
front row that's Fred Brooks a couple of years ago and but he got the he's another turing laureate and part of the reason he did it was for the system/360 one of the most famous computers of all time this is where the if you want to know why a byte has eight bits it started with this computer no computer before that had it approximately and all computers since then have used the 8-bit computer so that's microprogramming now
Moore's law is coming along and we're using integrated circuits and the before where the read-only memory was a lot cheaper and faster now they're built all out of the same transistors so everything's about the same instead of read-only memory it was about the same cost as memory that you could alt order called Ram but with Moore's law this idea that every year you'd have twice as many transistors the memories were a lot bigger so this meant that the vocabulary the instruction set could be much richer a much bigger vocabulary because the control memory was bigger and a classic example of this was the digital equipments vats a so-called mini computer where the 360 was a bigger computer mainframe and it had a lot of micro code in it next comes the microprocessor revolution
the so instead of lots of chips we could do a processor in a single chip but now the people who did this companies like Intel didn't know a lot of computer architecture they were chip designers so they just pretty much followed what the big computers did and what they would do is compete against each other and add instructions to the vocabulary and the way they would say it was a good ones look here's an example we write in this very low level it's showing you an example of how would do that it was pretty easy because the micro code to add these things so they were battling in minutiae here Gordon Moore of Moore's law was a visionary he thought back then when they had 8-bit microprocessor that the next instruction set the next vocabulary they did Intel would be stuck with it for the life of the company so he hired a bunch of PhDs in computer science sent them up to Portland Oregon from from California to invent the next great instruction set and it was certainly the most ambitious project of the 1980s and it would be our 1970s and it would even be ambitious today they had what was called instead of 8-bit computers it was 32-bit and they had an idea called capability based addressing which was more secure and if they even wrote their own operating system in Ada which was a new language at the time so it was ambitious but sadly it was it was late it was and had lots of performance problems it was several years late so they had to go tell Gordon Moore I'm sorry you know you sent us to Oregon to invent the next instruction set we won't be ready for several years so Intel was forced to start an emergency project to provide the 16-bit microprocessor that the market was going to need so they spent they had 52 weeks to do the project and everything instruction sets vocabularies and building chips so they spent three whole weeks on the instruction set three weeks and so and with three people so it's about ten persons weeks on the instruction set so they basically just extended the old 8-bit microprocessor instruction set to 16 bits and then they went off and built it and it was announced - not much fanfare - Intel's great fortune IBM had decided to compete with Apple with the personal computer and they needed a microprocessor the there was another one the Motorola 68000 that was much like Fred Brooks's instruction set more elegant but they were late they didn't have time for that so they went with 86 so the sales at the time when IBM did it thought they'd be 250,000 which is pretty good but they were wrong they were dead wrong it was a hundred million so overnight the emergency replacement 886 became a huge success and because software that ran the PC would run on the 86 it suddenly became the future and Gordon Moore was right it is the instructions that didn't tell it's been stuck with ever since if but it's just the emergency replacement alone not the one that the organ people were supposed to do all right now we can't start
looking at these micro coded instruction sets so you can think of them what computer people think of it does what's inside is called an interpreter and it comes from language translation right if somebody is giving a speech through an interpreter it's kind of slow so the one way you could do it is through interpretation but the other way you could translate the speech in that in advance and you could read the speech much faster so the question is how good are these micro code interpreters this is a picture of Johncock you know another turing laureate and one of the pioneers from both computing and compilers by this time in so in the 1980s we've gone away from programming in a semi language at the very lowest level to high level languages the UNIX operating system demonstrated you could even write an operating system in a high-level language so now what we cared about was the output of these translators or what we call them compilers not what somebody would program in the low-level details so Johncock built a kind of a mini computer at the time that had some of these ideas about a simplified instruction set but it maybe even more importantly he had made advances in compilers so he asked this question or his group asked this question suppose we take this compiler technology for the IBM 360 instruction set and only use the simple instructions don't use the complicated ones what would happen well it ran up to three times faster by leaving things out so that's that's a strange that's a disturbing result and over at Digital Equipment where they had the VAX engineers measured that computer and to their surprise found that on average it took ten micro instructions to get work done when people thought it was four or five but it was actually ten but then in terms of the instructions themselves this vocabulary twenty percent of the instructions were 60% of the microcode and they were almost never used so why is this here so this leads the
transition in the in these vocabularies from the so-called complex instructions to reduced instructions or a simpler instruction set so we have these alterable memories that used to have these micro code interpreters and the big change is to go what's called the cash so cash is simply a small fast memory that has recent instructions and it turns out the way we write programs if you've used instructions recently you're likely to use them again so that's a cache so now it just doesn't have an interpreter it has whatever the routes running on the computer so that's again when these software concepts of about compiling or translating versus interpreting and this instruction set was really simple it's about as simple as the micro instructions that maurice wilkes talked about they're just not as wide there's this idea of pipelining that it enables which is a computer architecture goto idea it's kind of like manufacturing a car where you do one step at a time and you can do them in parallel and by the way you know the complicated instructions weren't used all that much anyways and then finally there was a breakthrough in compiler technology based on making the analogy between grass coloring and what's called register allocation so registers are kind of the bricks of computer architecture that the fastest memory that we have there's not very many of them and so you have to use them efficiently and this compiler breakthrough based on the graph coloring of the room made them run much more effectively so that led to this transition to risk how do we explain why
a simpler instruction set was going to work better than a complicating struction said if you think about you know vocabularies you could think of the complex and one is having lots of five-dollar words so you it shouldn't take as many of those to complete a program you should kind of read or execute fewer of those and that's the number of instructions per program so what happened was it's that's true but how many fewer so it was typically about 25% fewer of the complicate instructions but then the question is if you're reading this vocabulary how much faster could you read it well it turned out the wrist designs they could read it maybe five or six times faster so the net effect was about a factor of four advantage so this formula which we didn't have and kind of anew intuitively but it wasn't really spelled out was actually first appeared in in a book like this too I'm surprised I can't find it earlier and so this book attempted to make computer architecture more quantitative and in fact the endnotes tried to be like a physics textbook that had lots of formulas that could use to design a computer so this is when
Berkeley and Stanford kind of entered the scene the first microprocessor using
the RISC ideas was called RISC one that happened in 1982 and then in 1983-84 there was a second
design and also then that same time
Stanford these last two were presented at the leading computer circuits conference that's held in once a year that thousands of people attend and I could still remember being there in the audience when the Berkeley and Stanford presentations were done there was a murmur after the presentations and it was a remarkable moment in time when a handful was grad students at Berkeley or Stanford could build a microprocessor that was arguably better than what industry could build so what's bringing
up today wrapping up our 50 years of history ib Intel - didn't take this lying down they had this complex instruction set that was compatible with all PC software so it was very valuable so what they did remarkably enough is I told you about that translation that would typically be done in software they did it in hard work they put a front in on their hard work that while the program was running would translate it from the complex instructions to basically the RISC instructions and then thereafter anything that the RISC people did they could do so with more engineers and better technology they basically took over the marketplace and at the peak was 350 million chips a year and they dominated both the desktop and the server's so that's the PC era we are in the post PC era where things are in the cloud or your pocket that's a big change instead of buying chips from a company like Intel you have designs called intellectual property that goes into a system on a chip that you're building it's like would go into your phone and now it's not just performance but its die area and energy so this extra overhead that x86 has to do this translation to five is expensive I'm remarkably last year there were twenty billion ships that had microprocessors in them and 99% of those at risk so again the marketplace settled these debates the PC here I went to the the Sisk in this post piece here is risk okay so that's 50 years what's going on now Moore's law is really over so the
reason so this are remarkable projection in 1965 he said doubling number of transistors the building blocks of chips every year and then he amended it in 1975 to doubling every two years using his his slower projections were off by a factor of 15 using Intel technology so the reason this is shocking is that people have claimed this many times in the last couple decades it's over to be wrong but this it's really over so this is a for those of us in the who had 50 years of more law this is a kind of a shocking event but we're now in the post-mortal era a lesser known but maybe equally important observation was by Robert Denard that he said well how could you keep being putting more transistors on the chip without the chip burning up and his observation was the voltage would shrink so the same area of silicon would remain the same power and you could see if he goes back to the 70s was very flat that blue line and suddenly goes up so now we're limited by power on the chip more than hurt by the number of transistors when you multiply those
together the first figure of that textbook that we talked about there was the good old days when performance doubled every year and a half and for those of you who are too young for this we would throw away perfectly good computers because your friend's computer was three times as fast as yours and you wanted to be as fast and and so these days we keep laptops on until they break right because as far as we can tell there's no difference that's the result at the end of the nartz scaling something called Vaughn Gauss law where because we switched to lots of these processors per chip does a law of diminishing returns and in this last year the performance improved only 3% so that's double so we went from doubling every 18 months to doubling every 20 years so this looks pretty bad so what's what advice do we have so this is from Bertha Callaway we cannot direct the wind but we can't adjust the sails okay so this is a reality we have to live what are we going to do and that's the I'll talk
about a couple of opportunities here the software centric approach modern programming languages not like what some of us learned are amazing in terms of their productivity but they're really slow and I'll give you an example of that so Python is probably the most popular programming language I'll give you an example about how slow Python is from the hardware centric things computer architects the only thing left is what's called domain-specific architectures we can't make single processors any faster there's a limit to how many when you put lots of processes on our chip the only thing left is to do something specific not general-purpose they do a few things really well but they don't do everything and you can do combine these two together which it gives another one of these lessons by using a domain-specific language that will raise the level and allow us to innovate so let's talk about Python so
this is an example of matrix multiply in Python and how much faster can go so if we rewrite it from Python to C we can go 50 times faster now because it's running on a on a multi core computer multiple processors if we go in by hand and identify the parallelism we can get another factor of seven now these caches I talked about these temporary fast buffers those are very important for performance so if we carefully lay things out so it uses those caches well then we get enough factor 20 and then finally there's something called single instruction multiple data where put special-purpose instructions that can do sixteen 32-bit operations every clock cycle if we take advantage of that we can get another factor of nine multiply those all together and what's being left on the table a factor of 63,000 as John Hennessy says what this if you're dealing with C or C++ and you're a compiler person and you can make it go twice as fast you're a hero right you're a hero to be able to do this so there's like there's opportunities there's Touring Awards waiting to be picked up here if you only make it go a factor of a thousand faster that would be an amazing result but that's the seems like that potential is there now domain-specific architectures so you tailor to a somewhat narrow area and it runs a set of applications some people are confused by the phrase application specific integrated circuit it doesn't run only one thing that runs you know a domain well it's still there's still software running on it why does it run faster so this is the one I'm gonna have
a hard time avoiding jargon so forgive me so basically there's two kind of parallelism often done one instruction at a time lots of data that's more efficient than when you're running lots of independent programs so-called Mindi and there's something called very long instruction words where it's under compiler control versus modern computers do out-of-order so basically the parallelism is easier to control them more efficient using those caches well the caches are great they're probabilistically guessing what you want but under software control you can have better idea them in caches and the last one which is pretty easy to understand is instead of using very wide precision that you need often and general-purpose computing in in domains you can get away with narrower data which makes it run faster the problem has always been for special-purpose architectures where are you going to get the software but because of the domain-specific programming languages which are attractive to make the programmer more productive they raise the level of extraction which is allows us to map that on to special purpose hardware so who are we going to help here's a field where the number of
papers grows by Moore's law this and that field not surprisingly is machine learning deep learning so this is plotting this it's doubling every year if you want to keep up as of the end of last year you need to read 50 papers a day on archive okay 50 papers a day so this is a pretty exciting area a lot of talking about it what can we do to help it so this is
called the TP from Google this is the tensor processing unit it was announced in May of 2016 so just two and a half years ago if you use Google you're using it right it's been in production for three years it it's using queries that was helped use to win the alphago match and it happens to be on the cover of this issue of the communications AC I'm the ACM flagship if you want to learn more about it why is it so much why are
people excited about it amazingly enough on this single microprocessor it has 65,000 multiplied accumulate units 65,000 that's like a factor of a hundred more than most most processors today why can they do that because they throw away the general-purpose stuff and they're dedicated to the application it also has a lot of memory on the chip which is important too for both speed in energy efficiency how much faster is it it's
you know 15 to 30 times faster but a particularly interesting measure is the performance per watt is a limited by power and then it's a factor of 30 over GPUs at 80 over CPUs google measured this on production applications but that's internal how are we going to do this and so there's recently a attempt to have a standard benchmark for machine learning so that everybody can measure them and those that's starting to happen is happening you know I guess in a couple of months you'll see those results so I talked about way back in May 2016 what's been going on in Google lately they've announced two more generations the in May of 2017 they
announced TPU v2 this is not just for inference like the first one but it's for training and not only is there that board but you can put them together in racks in the peak performance is 11 petaflop s' the following year they announced one that was liquid cooled so he could do more power and it's a bigger a bigger multiprocessor and it's more than 100 peda flops so how much is petaflop so there's something called the top 500 supercomputers using a different measure but it's it's called a pedal flops but they're they're bigger petaflop sand there's 32 and 16 bits but ignoring that a 64-bit floating-point so the sixteenth fastest computer in the world is 11 petaflop and the 4th fastest in the world is a hundred petaflop s-- and Google's not building just one of each of these they're building a lot of these so this is these are going to be the kind of super computers for machine learning that you can program in Python and tensorflow and high torques and things like that ok so what's happening right
now it's a very exciting time to be a computer architect there are these big debates going on with companies placing different bets with different architecture schools and we're see what's going to happen Google's Google tends to do these big multiply units NVIDIA has lots of course and they do what they say lots of threads Microsoft is using these flexible hardware devices they're betting on that and Intel's betting on everything Intel's bought companies so they have some of those and some of these and their own designs and and amazingly enough there's at least 45 companies startup companies doing machine learning hardware and so what's going to happen I don't know who's gonna win but I can tell you who's going to determine it it's gonna be the marketplace we're gonna spend billions of dollars put these things out there and people are gonna decide which ones which ones the most effective ok and
then I'll go through one more opportunity here that's again inspired by software so open-source why is there open source hardware open source software but no open source architectures this takes us to risk five
this was the fifth berkeley instruction set we risk instruction said it was time for us to do something the proprietary runs from these big companies Intel an arm they were not only really gated and so unattractive but we it was illegal for us to use them they wouldn't they wouldn't let us use them so we started building our own it took us several years it was led by my colleague Krista Tsarevich and two grad students who start graduate students who did this work it took us four years to do it we had built some chips along the way but then we had this funny thing happened look at companies were complaining about us changing our instructions that our vocabulary inside our corsets why are you complaining about us changing our instruction set for our courses and what we realized and talking to him there was thirst for an open instructions that they looked all over the world they liked ours started using it once we heard about that we thought that would be great and one would make it happen so
why do they like it it's a real simple instructions it's a very simple vocabulary we're 25 years later so we could avoid the mistakes of what had happened in the past it it allows it to be expanded it it supports these domain-specific architectures but the biggest difference is it's run by a foundation so it's not owned by a company but by an open foundation so this foundation it started three years
ago and now there's I think over 200 members including big companies that are promising to ship it with their products interestingly all the way it's designs
being done it's something like a standards committee where when a topic comes up you bring experts from all over the world to talk about it and then they reach consensus before it's embrace we're typically for proprietary instruction sets the companies announce it and then everybody tells them what's wrong with it so we do it in the other order in it's not just as five there's
also nvidia has an open instruction set it's when these domain-specific architectures where everything is open the software the hardware the implementation so this is a new thing so
I'm purposely kept us talk short too so there's time for questions but so the lessons of the last 50 years software can inspire architecture and innovations that was this micro programming idea from the IBM 360 days RISC ideas about this translation first interpretation and open architectures raising the interface enabled going to high level language labeled the risk ideas and now looks like it's enabling these domain-specific architectures and then the marketplace sadly is how we settle these things it's not fair but that's the way it does and right now you know IBM was a big winner and the Intel Architecture in the PC era and risk in the post PC era open versus proprietary instruction sets which domain-specific architectures can win who knows we'll find out in a few years but it's a really exciting time in computer architecture and I think it's going to be another golden age and with that I'm open to questions [Applause] and when the at the end of the questions I'm going to try something so I see a hand up I see hands up assuming microphones are on their way yeah microphone is on its way thank you for a nice history come talk what I was wondering about is what do you think is
the role of academia versus industry given that this is such a market driven
area yes so I think one of the things I really like about computer science is we have a synergistic relationship with energy I think some fields it's antagonistic but we have I think we think of our academics think of the mother industrial colleagues but it's our former students at these companies and we're all trying to make you know better computing technology it's the companies that embrace these ideas but I certainly my career is an academia and you know I felt I had plenty of opportunities to influence what was going on I don't see any reason you can't you can't change the world from academia continually so companies turn these into products you know we don't turn into products but you know we can influence things what's particularly exciting right now is the risk 5 and so there's an open architecture before you'd have to convince our more Intel to use your ideas now you can demonstrate your ideas and risks 5 immediately and other people can brace it so it I think it's even easier now which is another reason why I think it's a golden age buts a great yes it's great time either in academia or in industry for architects other questions thank you so much and I want to find out I saw when you wrote something on GPU and NVIDIA GPU and when you're working with artificial intelligence techniques something like deep learning and neural nets I mentioned it there too they tell us that is computational intensive and everything we talk about is expensive listen yeah computational intensive which means takes a lot of the resources and as the memory they were sorry the GPU the RAM now I want to know what is it that makes it so difficult for people to use it though waste need a system that has GPU okay and such systems are usually expensive so what is it about this resource that is so so expensive a great question so you can do all your work on a standard computer you can do it all you can just use a standard laptop if you want to go fast you got to do something else so two Intel's and the two invidious good fortune they built a chip that was special purpose for graphics which accelerated single precision 32-bit floating-point performance and that's why they're building it turns out from machine learning 32-bit floating-point is funny enough so they had a market-driven force to build a chip that cost hundreds of dollars with very high single-precision floating-point performance so Geoff Hinton and his students in 2012 when they were doing machine learning used a GPU that was intended for graphics to do machine learning so it's it's a lucky combination of a being expensive I think hard to use but inexpensive high-performance chip that they're mapping software to it so conventional wisdom today is that is the best hard work for doing machine learning but many architects thinks there are better ways to do it so Google Microsoft Intel and 45 startup companies thinks there are better ways to do it who's right you know we're the marketplace will will help figure it out that it's a part of the excitement here okay I think I'm okay one more and then I'm going to try this experiment here so what do you think about many called chips chips with say thousands of course this is a way to overcome some obstacles of the most you know and my feeling is that this technological solution did not deliver what we expected because we could not effectively use all the computer power offered by many core platforms what do you think about that yeah I would I think that's a fair you know it would out of necessity we went to multi-core just because we couldn't make faster single core and it was hard for software to take advantage of it the comment though is because if now that the future is domain-specific ideas that wouldn't work for general-purpose computing might work well in narrower areas so certainly all the old ideas in computer architecture or in computer designer being reevaluated right now on domain-specific architectures particularly for machine learning so ivan sutherland loves asynchronous design there's asynchronous companies there's people who are doing multi-core designs there's field programmable gate arrays kind of all possible ideas of being reevaluated given this opportunity in domain-specific architectures so it's hard to say something won't work because it didn't work in the past and it may work now and you know well it's an exciting interesting time ok so I'm going to try this idea that for the why ours I guess is some people think that because if you had some success your road has been perfectly smooth right to here you just woke up with a silver spoon in your mouth and as a baby and life has been beautiful all the way so I thought I want to tell you how I got here but I need the slides so I'm an accidental Berkeley professor so how did how's that I seem to have a delay here ok so what happened so I'm the first of my family to graduate from college my my dad went to college but he didn't finish before I did but I had no plans for graduate school or no plans of computer science there was no computer science no computer science major I was a wrestler in high school in college so what happened was I I was a math major and then that last of end of my junior year a math class was canceled I had to find something in that time slot and it happened to be a computer science class in intro class it was only a half half of a class even but I despite how primitive was it I was hooked so my senior year I took as many classes as I could in business and in engineering and kind of did my own informal degree so in my senior year in the middle of my senior year I took a class from a person and I was working in my dad's factories to support us going to college and I said casually at the end of the class Bart should rather do computing stuff than work in a factory they go to school and on his own he went and found me a job as an undergraduate and so I started working with grad students and back then I could apply for grad school I talked to my wife and said what about a master's degree seems pretty cost-effective it only takes four quarters and I get a master's degree she said sure why don't you do that and then I was put in an office there was four of us in the office the other three people were all getting PhDs so that seemed like a good idea so I talked to my wife and she said well if you think you can do it go for it we had two sons and then we're living on a 20 hour week research assistantship so we're pretty poor and then there are a ship ran out which was bad but my advisor helped get me a job at a aerospace company building airborne computers and it took me another three years or so to get my PhD now my wife was from Northern California I was from Southern California and she always wanted to move back to Northern California so while I was a graduate student I was starting with job offers but I hadn't heard from Berkeley so she forced me as a grad student to call the chair of the department of UC Berkeley to find out what happened to my application and I can still remember the dread of getting on the phone and talking to Ellen Burley Camp was a famous mumbler and trying trying to you know I hadn't heard from Berkeley you know this one place that I would consider if I'd heard from so he told me well Dave you're in the top ten but not the top five and I remember this tremendous sense of relief that that wasn't as bad as I thought hung up so it turns out I found out later he said that to anybody who called Papa in that fight but he took my application gave it to a person who was visiting Southern California you know we hit it off I got an interview and got a job there and then the first project that I did with that professor was way too ambitious we didn't have the resources you know we were gonna build operating systems and build chips and stuff but it's kind of remarkably I took a leave of absence from Berkeley and went to Boston to read and I used that time to rethink my career and came back and did the RISC stuff but even tenure wasn't easy because you know computer science is strange and that it values conference papers instead of journal papers and everybody else bottles journal papers so I think my my case was difficult to make but it was you know then they made it and since then things things were better so it's not it's not a smooth road all the way do I started look I was hoping if they don't stop me I'm gonna do so I was gonna say what worked for me so what
went well for me so you know in America you think wealth is equal to happiness it's not there's a lot of unhappy wealthy people I think one of them is our president right now so I just maximize happiness directly and not always for wealth family first I had some time to think about it and you know in a busy job like this if you don't make the family first the family can end up pretty far down the list passion and courage I would like to think I'm a logical person but I'm really very passionate person and and ironically the physical courage that came from wrestling has translated into intellectual courage and I feel it's my individual job to stand up if something's not right in and and try and stop it so the passion means in using baseball terms may not work here is we swing for the fences we try and hit homeruns rather than play it safely that's when I was a new professor they say weak value positive impact in that paper sigh I believe in that and so that's what we tried to do that one things one of my colleagues told me this was a lot these odda and I think he told this to me because I'm a passionate guy and would confront people but he said friends may come and go but enemies accumulate so let me pass that on your friends you know kind of fall apart you don't see anymore oh yeah we used to be friends enemies never forget okay they're always your enemy I far as I I know of two enemies I made and it was big difficult one was a big difficult decision about hiring a person and that person was an active enemy for the rest of my career but the other person I picked was the head of DARPA under the Bush administration because I thought he was doing things that were bad for our country and bad for our field so I took him on that was not a popular idea because he was a vengeful person but you know and you know so many people hated the guy it turned out to be a good career move but that was it didn't seem like it at the time winning as a team versus an individual I'm a wrestler which is an individual sport but my coaches said if he bond as a team we'll be more successful so I just believed this in my DNA and quoting Fred Brooks again who quoting quoting Fred Brooks reporting the basketball coach at his University there's no losers on a winning team and no winners on a losing team so if you win everybody wins if you lose everybody loses and I just believed that in my DNA I think I'm good at getting honest feedback and learning from it it's easy to avoid feedback you know it's easy to get but you know when somebody doesn't like what I'm doing I give him my papers because I'll tell you what they think for sure and you need to learn from that at the warning sign that I would pass along if somebody thinks that the smartest person the Roone run away so uh examples are the president of Harvard thought he was the smartest one in the room and he was fired the founders of Enron are both convicted felons and I actually have a relative of a brother in law who thought he was the smartest person in the room and he's in prison so why would this be I think it's because if you think you're the smartest person in the room you think why would I what could I get from any of you I'm smarter than all of you why do I want your feedback and if you don't get feedback on your news bad things can happen so but in a case if you learned somebody who thinks the smartest person in the room don't do anything that they do and then I still remember kind of waking up you know in the morning and it was like God spoke to me in the is in the sunny day and he said it's not how many things you start it's how many things you finish it was like I was thunderstruck and since that moment I tried to do only one thing at a time so when Hennessy and I did our textbook that was the one big thing I did when I was president of a besom ACM that was one big thing I did but if you only do one big thing say a year over 40 years you can get a lot of things done and a lot of academics just say yes to everything and it's hard to finish that's and then finally I'm a natural-born optimist I think optimism is a better policy you know if you're trying to like run to get a bus if you don't try to run you're not gonna catch the bus you have to be cautious but optimistic and my story goes back to when I was 16 years old to illustrate what an optimist I am is I started dating a girl and and she had dated lots of other people and wasn't interested in being exclusive or what we said that in going steady but I screwed up my courage and asked could we go steady and so she said to me because she felt pity I don't know how to say no well as a logical person in an optimist not a no was that yes so I said great and and so she didn't know what to do but figured she'd let me down but we've been married 51 years now and has let me
go [Applause] [Music] you


  740 ms - page object


AV-Portal 3.20.1 (bea96f1033d39fbe77f82542458e108105398441)