cuda - What is the difference: DRAM Throughput vs Global Memory Throughput -


the actual throughput achieved kernel reported cuda profiler using 4 metrics:

  • global memory load throughput
  • global memory store throughput
  • dram read throughput
  • dram write throughput

cuda c best practices guide describes global memory load/store throughput actual throughput , says nothing specific dram read/write throughput.

cupti users guide defines:

  • global memory load throughput ((128*global_load_hit) + (l2_subp0_read_requests + l2_subp1_read_requests) * 32 - (l1_cached_local_ld_misses * 128))/(gputime)
  • global memory store throughput (l2_subp0_write_requests + l2_subp1_write_requests) * 32 - (l1_cached_local_ld_misses * 128))/(gputime)
  • dram read throughput (fb_subp0_read + fb_subp1_read) * 32 / gputime
  • dram write throughput (fb_subp0_write + fb_subp1_write) * 32 / gputime

i understand dram read/write throughput since fb_subp* counters report number of dram accesses (incremented 1 32 byte access) , collected sms. clear me throughput calculated function of gputime , number of bytes accessed.

i not understand global memory throughput definition. there no definition of global_load_hit , counter. not see why l1_cached_local_ld_misses substracted in both cases.

is dram different global memory in context?

if want know actual throughput of kernel should use dram or global memory throughput metrics?

global memory throughput amount of data requested instructions global address space. global_load_hits number of l1 cache hits global requests (cache line size 128 bytes). rest of formula approximates global throughput accesses miss l1 calculating accesses l2.

global memory virtual memory space can map both device memory , system memory.

dram physical device memory (e.g. gddr5 on card). dram accessed on l2 misses. following virtual address spaces can in dram/device memory (global, local, constant, instruction, , texture). note many of these memory spaces virtual address spaces , final data can reside in either dram or system memory.


Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -