gpgpu - CUDA Kernel executing a statement by a single thread only -
how can write statement in cuda kernel executed single thread. example if have following kernel:
__global__ void kernel(bool *d_over, bool *d_update_flag_threads, int no_nodes) { int tid = blockidx.x*blockdim.x + threadidx.x; if( tid<no_nodes && d_update_flag_threads[tid]) { ... *d_over=true; // writing single memory location, 1 thread should do? ... } } in above kernel, "d_over" single boolean flag while "d_update_flag_threads" boolean array.
what did before using first thread in thread block e.g.:
if(threadidx.x==0) but not work in case have flag array here , threads assosiated flag "true" execute if statement. flag array set cuda kernel called before , don't have knowledge in advance.
in short, need similar "single" construct in openmp.
a possible approach use atomic operations. if need 1 thread per block update, atomic operation in shared memory (for compute capability >= 1.2) faster perform in global memory.
said that, idea follow:
int tid = blockidx.x*blockdim.x + threadidx.x; __shared__ int sflag; // initialize flag if (threadidx.x == 0) sflag = 0; __syncthreads(); if( tid<no_nodes && d_update_flag_threads[tid]) { // safely update flag int singleflag = atomicadd(&sflag, 1); // custom single operation if ( singleflag == 0) *d_over=true; // writing single memory location, 1 thread ... } it idea. i've not tested close operation performed single thread, not being first thread of block.
Comments
Post a Comment