Logical memory hierarchy diagram

Device code can

  • R/W per-thread registers
  • R/W per-thread local memory
  • R/W per-block L1 cache/shared memory
  • R/W per-grid global memory
  • Read only per grid constant memory (on device low latency high bandwidth read only cache which stores constants and kernel arguments)
  • Read only per grid texture memory (on device low latency read only cache for 2D/3D textures)

Host code can

  • Transfer data to/from per grid global, constant and texture memories

Memory feature and size in A100

Memory Location Access Scope Lifetime Amount in A100 SXM
Register On chip Read/Write 1 thread Thread 256 KB per SM
Local* Off chip Read/Write 1 thread Thread
Shared** On chip Read/Write All threads in a block Block up to 228 KB per SM
Global Off chip Read/Write All threads + host Host allocation 40 GB or 80 GB
Constant Off chip Read only All threads + host Host allocation 64 KB
Texture Off chip Read only All threads + host Host allocation Depends on textures used

* Local memory is not a physical type of memory, but an abstraction of global memory. It is used only to hold automatic variables. The compiler makes use of local memory when it determines that there is not enough register space to hold the variable.

** Shared memory is configurable up to 228KB per SM, depending on the compute capability.

Caches

Type Access Amount in A100 SXM
L1 data cache Read/Write 192 KB per SM
L2 cache Read/Write 40 MB