gpu - CUDA shared memory bank conflicts report higher -


i've been working on optimizing code , ran issue shared memory bank conflict report cuda nsight performance analysis. able reduce simple piece of code nsight reports having bank conflict, when doesn't seem there should one. below kernel:

__global__ void conflict() {     __shared__ double values[33];     values[threadidx.x] = threadidx.x;     values[threadidx.x+1] = threadidx.x; } 

and main function call it:

int main() {     conflict<<<1,32>>>(); } 

note using single warp reduce bare minimum. when run code, nsight says there 1 bank conflict, according have read, there should not any. each access shared memory array, each thread accessing consecutive values, each belonging separate banks.

has else experienced issues reporting of nsight or missing functioning of bank conflicts? appreciate feedback!

btw, running following setup:

  • windows 8
  • gtx 770
  • visual studio community 2013
  • cuda 7
  • nsight visual studio edition version 4.5

if intent run posted code as-is, double data type, , no bank conflicts, believe it's possible appropriate use of cudadevicesetsharedmemconfig (on cc3.x devices). here's test case:

$ cat t750.cu #include <stdio.h>  typedef double mytype;   template <typename t> __global__ void conflict() {     __shared__ t values[33];     values[threadidx.x] = threadidx.x;     values[threadidx.x+1] = threadidx.x; }  int main(){  #ifdef ebm   cudadevicesetsharedmemconfig(cudasharedmembanksizeeightbyte); #endif    conflict<mytype><<<1,32>>>();   cudadevicesynchronize(); }  $ nvcc -arch=sm_35 -o t750 t750.cu t750.cu(8): warning: variable "values" set never used           detected during instantiation of "void conflict<t>() [with t=mytype]" (19): here  $ nvprof --metrics shared_replay_overhead ./t750 ==46560== nvprof profiling process 46560, command: ./t750 ==46560== profiling application: ./t750 ==46560== profiling result: ==46560== metric result: invocations                               metric name                        metric description         min         max         avg device "tesla k40c (0)"  kernel: void conflict<double>(void)           1                    shared_replay_overhead             shared memory replay overhead    0.142857    0.142857    0.142857 $ nvcc -arch=sm_35 -debm -o t750 t750.cu t750.cu(8): warning: variable "values" set never used           detected during instantiation of "void conflict<t>() [with t=mytype]" (19): here  $ nvprof --metrics shared_replay_overhead ./t750 ==46609== nvprof profiling process 46609, command: ./t750 ==46609== profiling application: ./t750 ==46609== profiling result: ==46609== metric result: invocations                               metric name                        metric description         min         max         avg device "tesla k40c (0)"  kernel: void conflict<double>(void)           1                    shared_replay_overhead             shared memory replay overhead    0.000000    0.000000    0.000000 $ 

with specification of eightbytemode, shared memory replay overhead zero.


Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -