It has been nearly 3 months since I joined TSMC. As with anyone joining a new company, I have been drinking from a firehose of information and data. One of the key topics that I first dug into was Moore's Law, which simplified states: The number of transistors in an integrated device or chip doubles about every 2 years.
Moore's Law is actually misnamed as a law as it more accurate to describe it as a guideline of historical observation and future prediction of the number of transistors in a semiconductor device or chip. These observations and predictions have largely held true for the past several decades. As we approach a new decade, some appear to share an opinion that Moore's Law is dead.
It would appear that some have conflated Moore's Law to mean that the performance of a chip, given the same area, doubles every 2 years. For many years, particularly in the development of CPUs and GPUs, this appeared to hold true. From the 1970s to early 2000's, there was an explosive increase of transistor clock speed going from single megahertz to multiple gigahertz, respectively. However, since the 2000's, compute performance has largely increased not through the improvement of the transistor clock speed but rather through both silicon architecture innovation and the threading or parallelization of computing workloads. Companies that have developed CPUs and GPUs have responded to this software parallelization with further architecture innovations and by adding more compute cores. The more compute cores, the more threads a chip can handle offering higher overall performance.
With the example above, compute performance is not improving because of individual transistor clock speed but rather compute performance is improving by throwing more transistors at a compute problem. What is the measure of squeezing more transistors in the same area? Density! Moore's Law is about density! Density is about the number of transistors in a given 2-Dimensional area. Why do we care about the chip area? Chip cost is directly proportional to chip area. Moore's paper in 1965 made it clear in Figure 1 that there is a relationship between the manufacturing cost per component and the total number of transistors on a chip.
Let's explore some of the compute problems that we are seeing today and how improvements in density will continue to improve performance.
First, let's discuss the elephant in the room. Some people believe that Moore's Law is dead because they believe it is no longer possible to continue to shrink the transistor any further. Just to give you an idea of the scale of the modern transistor, the typical gate is about 20 nanometers long. A water molecule is only 2.75 Angstrom or 0.275 nanometer in diameter! You can now start counting the number of atoms in a transistor. At this scale, many factors limit the fabrication of the transistor. The primary challenge is the control of materials at the atomic level. How do you place individual atoms to create a transistor? How do you do this for billions of transistors found on a modern chip? How do you build these chips that have billions of transistors in a cost effective manner?
To address this squarely, TSMC has recently announced our N5P node which further expands our leadership beyond the N5 node that will feature the world’s highest transistor density and offer the fastest performance. After being exposed to our technology roadmap, I can safely state that TSMC has many years of pioneering and innovation ahead of us where we will continue to shrink the individual transistor and continue to improve density. You will hear more from us in the coming months and years as we progress to new nodes.
Beyond the individual transistor, we also need to look at the system level density. Circling back and looking at the classic compute tasks of CPUs and GPUs, the modern chip has extremely fast transistor clock speeds that approach 5 gigahertz and beyond. The central challenge to these compute tasks is actually to keep the CPU and GPU cores fed with data. While this is classically a software challenge, modern architectures and methods for threading have squarely put the performance bottleneck at the hardware level. We have finally seen the limitations of memory caching in the era of big data analytics and AI.
To feed modern fast CPUs, GPUs and dedicated AI Processors, it is critical to provide memory that is both physically closer to the cores that are requesting the data for improved latency, in addition to supplying a higher bandwidth of data for the cores to process. This is what device level density provides. When memory is collocated closer to the logic cores, the system achieves lower latency, lower power consumption and higher overall performance.
Some of you may think that this is a system level concern and not an intrinsic attribute of a device technology. This may be strictly true in the past but the line is already getting blurry between the definition of a chip and the definition of a system. The line will continue to get more blurry and eventually will be completely eliminated. We have now transitioned from an era of design-technology co-optimization (DTCO) to system-technology co-optimization (STCO).
Advanced packaging today brings memory close to the logic. Typically, logic cores are fed through standalone memory chips through interfaces such as DDR or GDDR. The physical distance between the memory device and the logic cores limit performance through increased latency. Bandwidth is also limited with discrete memory as they only offer limited interface width. Additionally, power consumption for discrete logic and memory also govern a device's overall performance, especially in applications such as smartphones or IOT devices as there is limited ability to dissipate the thermal energy radiated by discrete devices.
Other applications such as machine learning, both training and inferencing, are pushing the boundaries of power, bandwidth and latency.
Artificial Intelligence (AI) is often treated as one type of compute problem but there are two distinct aspects of AI: training (machine learning) and inferencing. For any AI system to work, a neural network must be first trained. Training requires intensive compute operations such as Feed-Forward and Back-Propagation, where logic cores are fed copious amounts of data. The faster the logic cores can be fed, the faster the learning – bandwidth is critical here. The act of training a neural network consumes extreme amounts of energy. Many people are concerned about the carbon footprint of datacenters and AI training. Most of the energy is actually consumed by the memory and the memory interfaces. By packaging memory with the logic cores, you can greatly reduce power consumption of AI Training while greatly increasing memory bandwidth.
AI Inferencing is the application of the trained neural network in the real world. This is computing at the Edge. Once you have a trained neural network, Edge devices need to use the training and execute its task in the shortest time possible. One clear example of the need for improved latency are image classifier neural networks found in autonomous driving cars. It is critical for the operation and the safety of the passengers of an ADAS 2+ car to have low latency and fast execution of the neural network to recognized threats. As a car drives down the road at highway speeds, every millisecond counts for safety. Locating memory close the Edge processing core is vital to reduce latency.
Tight integration of logic cores with memory through advanced packaging techniques are already available today from TSMC. The line between a semiconductor and a system solution is blurry as the new advanced packaging techniques are silicon wafer based. TSMC has pioneered advanced packaging techniques that allow our customers to deliver a complete system with a silicon-based interposer or fan-out-based chiplet integration. We also have advanced packaging techniques that allow us to stack chips on wafers or stack wafer on wafer prior to integration into packaged modules. These advanced packaging techniques allow TSMC customers to deliver much higher density and more performance. We will continue to drive innovation in advanced packaging technologies.
Moore's Law is about increasing density. Beyond the system level density achieved through advanced packaging, TSMC will continue to grow density at the transistor level. There are many paths available to TSMC for future transistor density improvements. One possible path forward is the use of transistors made of two-dimensional materials instead of silicon as the channel – we are raiding the periodic table. By potentially using these new materials, one possible future of great density improvements is to allow the stacking of multiple layers of transistors in something we call Monolithic 3D Integrated Circuits. You could add a CPU on top of a GPU on top of an AI Edge engine with layers of memory in between. Moore's Law is not dead, there are many different paths to continue to increase density.
I cannot do justice to this topic through a simplified blog. TSMC will be providing a keynote on the future of the transistor at Hotchips on August 20th on Stanford campus. Our lead researcher, Dr. Philip Wong, will be providing the keynote called “What Will the Next Node Offer Us?” Come join us and see Dr. Wong's presentation on the future of the semiconductor and why Moore's Law is not dead!