gpu - CUDA shared memory bank conflicts report higher -
i've been working on optimizing code , ran issue shared memory bank conflict report cuda nsight performance analysis. able reduce simple piece of code nsight reports having bank conflict, when doesn't seem there should one. below kernel:
__global__ void conflict() { __shared__ double values[33]; values[threadidx.x] = threadidx.x; values[threadidx.x+1] = threadidx.x; }
and main function call it:
int main() { conflict<<<1,32>>>(); }
note using single warp reduce bare minimum. when run code, nsight says there 1 bank conflict, according have read, there should not any. each access shared memory array, each thread accessing consecutive values, each belonging separate banks.
has else experienced issues reporting of nsight or missing functioning of bank conflicts? appreciate feedback!
btw, running following setup:
- windows 8
- gtx 770
- visual studio community 2013
- cuda 7
- nsight visual studio edition version 4.5
if intent run posted code as-is, double
data type, , no bank conflicts, believe it's possible appropriate use of cudadevicesetsharedmemconfig
(on cc3.x devices). here's test case:
$ cat t750.cu #include <stdio.h> typedef double mytype; template <typename t> __global__ void conflict() { __shared__ t values[33]; values[threadidx.x] = threadidx.x; values[threadidx.x+1] = threadidx.x; } int main(){ #ifdef ebm cudadevicesetsharedmemconfig(cudasharedmembanksizeeightbyte); #endif conflict<mytype><<<1,32>>>(); cudadevicesynchronize(); } $ nvcc -arch=sm_35 -o t750 t750.cu t750.cu(8): warning: variable "values" set never used detected during instantiation of "void conflict<t>() [with t=mytype]" (19): here $ nvprof --metrics shared_replay_overhead ./t750 ==46560== nvprof profiling process 46560, command: ./t750 ==46560== profiling application: ./t750 ==46560== profiling result: ==46560== metric result: invocations metric name metric description min max avg device "tesla k40c (0)" kernel: void conflict<double>(void) 1 shared_replay_overhead shared memory replay overhead 0.142857 0.142857 0.142857 $ nvcc -arch=sm_35 -debm -o t750 t750.cu t750.cu(8): warning: variable "values" set never used detected during instantiation of "void conflict<t>() [with t=mytype]" (19): here $ nvprof --metrics shared_replay_overhead ./t750 ==46609== nvprof profiling process 46609, command: ./t750 ==46609== profiling application: ./t750 ==46609== profiling result: ==46609== metric result: invocations metric name metric description min max avg device "tesla k40c (0)" kernel: void conflict<double>(void) 1 shared_replay_overhead shared memory replay overhead 0.000000 0.000000 0.000000 $
with specification of eightbytemode
, shared memory replay overhead zero.
Comments
Post a Comment