GPU Architecture

Text Practice Mode

created Nov 18th 2023, 08:28 by Ibrahim Elshahed

Rating

760 words

5 completed

00:00

Graphics processing units (GPUs) are one of the most widely used accelerators today. As of 2021, 7 out of top 10 world-class super computers are powered by GPUs (Top500 2021). After gaining popularity in high-performance computing domain since mid-2000s, GPU has been conquering emerging domains such as deep learning, security, virtual reality, and so on by proving superior performance than other general-purpose computing platforms, easier programmability than specialized accelerators, and higher affordability than server systems. It is now impossible to understand the performance of virtually most of the computing systems, from servers to mobile devices, without a knowledge about GPU. This chapter aims at providing a thorough description of the full stack of GPU computing, from execution model and programming interfaces to hardware architecture details that includes organization of compute cores and memory subsystems. The readers will be able to grasp unique characteristics of GPU computing and the architecture. However, the limited space was not sufficient to cover all the latest designs of this quickly evolving architecture. Thus, this chapter focuses more on describing the fundamental architecture components and their design details that have been maintained across all generations of GPU architectures, such as SIMT execution, batched processing (in warp or wavefront), diverging memory types and their characteristics, etc. A few recent studies are also introduced that motivated architectural advances. The authors hope that this chapter can be used for developing a basic understanding and finding ways to navigate advanced features of GPUs. The chapter is structured as follows. In section "Graphics Pipeline," the graphics pipeline is overviewed by exploring the core functions of graphics processing and the architecture of the traditional GPU that only supports graphics applications. This section will provide historical background of the baseline architecture of GPUs today. The readers can understand the limitations of traditional GPUs and how the mitigating efforts led to a brand-new architecture that made GPUs become one of the most important high-performance computing engines. In section "GPU for General-Purpose Computing," the full stack of general-purpose GPU (GPGPU) computing is introduced, from execution model to microarchitecture components. The section begins with describing two-level parallelism and show example GPU programs in different programming interfaces. Section "Hardware Architecture" introduces computing components in GPU architecture such as overall processor organization, shader pipeline, banked register file, warp scheduler, and SIMT stack. Some of the unique architectural characteristics and their limitations described in this section will be useful for understanding the research trend discussed in section "Recent Research on GPU Architecture." Section "Memories" explains the types of GPU memories and the characteristics of each of them. It shows how to utilize different memories according to the access patterns of individual data by using an example code in section "Optimization Use Case: Access-Aware Variable Mapping to Memory." Section "Recent Research on GPU Architecture" discusses research trends to improve performance, energy efficiency, and reliability of GPU architecture. Due to the important role of GPUs to many computing fields, GPU architecture is one of the most actively researched domains in last decade. Due to the limited space, only limited number of important studies are included in this chapter. The authors hope the introduced assorted studies make readers to gain good insights and help the readers to navigate related work. Graphics Pipeline As GPUs were originally designed for handling graphics processing, traditional GPUs were equipped with a few specialized cores dedicatedly designed for the graphics pipeline. The common pipeline steps of graphics processing are vertex, geometry, pixel, and rendering, as illustrated in Fig. 1. Vertex step recognizes the end points of edges of an object in the virtual space to two-dimensional screen. Geometry step identifies the curves and lines that connect any two vertexes. Pixel step fills each unit space on the surface recognized by the former two steps with color values. Rendering step smoothen the color and the shape of the surfaces to make the objects look more realistic. The rendering output is compressed to be shown on the screen through framebuffer. For better understanding, suppose that one has two triangles to show on the screen. The vertex step identifies the three edge points of each triangle. Then, the geometry step draws lines between any two points that belong to a triangle. The pixel step fills the pixels within each triangle boundary with specified color values, one red and another blue with gradients in the example of Fig. 1. After the rendering step smoothens out the boundaries of pixels, the screen projections of the two triangles are sent to the framebuffer to be shown on the display device (e.g., a monitor).

Graphics processing units (GPUs) are one of the most widely used accelerators today. As of 2021, 7 out of top 10 world-class super computers are powered by GPUs (Top500 2021). After gaining popularity in high-performance computing domain since mid-2000s, GPU has been conquering emerging domains such as deep learning, security, virtual reality, and so on by proving superior performance than other general-purpose computing platforms, easier programmability than specialized accelerators, and higher affordability than server systems. It is now impossible to understand the performance of virtually most of the computing systems, from servers to mobile devices, without a knowledge about GPU. This chapter aims at providing a thorough description of the full stack of GPU computing, from execution model and programming interfaces to hardware architecture details that includes organization of compute cores and memory subsystems. The readers will be able to grasp unique characteristics of GPU computing and the architecture. However, the limited space was not sufficient to cover all the latest designs of this quickly evolving architecture. Thus, this chapter focuses more on describing the fundamental architecture components and their design details that have been maintained across all generations of GPU architectures, such as SIMT execution, batched processing (in warp or wavefront), diverging memory types and their characteristics, etc. A few recent studies are also introduced that motivated architectural advances. The authors hope that this chapter can be used for developing a basic understanding and finding ways to navigate advanced features of GPUs.

The chapter is structured as follows. In section "Graphics Pipeline," the graphics pipeline is overviewed by exploring the core functions of graphics processing and the architecture of the traditional GPU that only supports graphics applications. This section will provide historical background of the baseline architecture of GPUs today. The readers can understand the limitations of traditional GPUs and how the mitigating efforts led to a brand-new architecture that made GPUs become one of the most important high-performance computing engines.

In section "GPU for General-Purpose Computing," the full stack of general-purpose GPU (GPGPU) computing is introduced, from execution model to microarchitecture components. The section begins with describing two-level parallelism and show example GPU programs in different programming interfaces. Section "Hardware Architecture" introduces computing components in GPU architecture such as overall processor organization, shader pipeline, banked register file, warp scheduler, and SIMT stack. Some of the unique architectural characteristics and their limitations described in this section will be useful for understanding the research trend discussed in section "Recent Research on GPU Architecture." Section "Memories" explains the types of GPU memories and the characteristics of each of them. It shows how to utilize different memories according to the access patterns of individual data by using an example code in section "Optimization Use Case: Access-Aware Variable Mapping to Memory."

Section "Recent Research on GPU Architecture" discusses research trends to improve performance, energy efficiency, and reliability of GPU architecture. Due to the important role of GPUs to many computing fields, GPU architecture is one of the most actively researched domains in last decade. Due to the limited space, only limited number of important studies are included in this chapter. The authors hope the introduced assorted studies make readers to gain good insights and help the readers to navigate related work.

Graphics Pipeline
As GPUs were originally designed for handling graphics processing, traditional GPUs were equipped with a few specialized cores dedicatedly designed for the graphics pipeline. The common pipeline steps of graphics processing are vertex, geometry, pixel, and rendering, as illustrated in Fig. 1. Vertex step recognizes the end points of edges of an object in the virtual space to two-dimensional screen. Geometry step identifies the curves and lines that connect any two vertexes. Pixel step fills each unit space on the surface recognized by the former two steps with color values. Rendering step smoothen the color and the shape of the surfaces to make the objects look more realistic. The rendering output is compressed to be shown on the screen through framebuffer. For better understanding, suppose that one has two triangles to show on the screen. The vertex step identifies the three edge points of each triangle. Then, the geometry step draws lines between any two points that belong to a triangle. The pixel step fills the pixels within each triangle boundary with specified color values, one red and another blue with gradients in the example of Fig. 1. After the rendering step smoothens out the boundaries of pixels, the screen projections of the two triangles are sent to the framebuffer to be shown on the display device (e.g., a monitor).

saving score / loading statistics ...

Text Practice Mode