gpgpu - CUDA Kernel executing a statement by a single thread only -


how can write statement in cuda kernel executed single thread. example if have following kernel:

__global__ void kernel(bool *d_over, bool *d_update_flag_threads, int no_nodes) {    int tid = blockidx.x*blockdim.x + threadidx.x;    if( tid<no_nodes && d_update_flag_threads[tid])    {      ...      *d_over=true; // writing single memory location, 1 thread should do?      ...    } } 

in above kernel, "d_over" single boolean flag while "d_update_flag_threads" boolean array.

what did before using first thread in thread block e.g.:

if(threadidx.x==0) 

but not work in case have flag array here , threads assosiated flag "true" execute if statement. flag array set cuda kernel called before , don't have knowledge in advance.

in short, need similar "single" construct in openmp.

a possible approach use atomic operations. if need 1 thread per block update, atomic operation in shared memory (for compute capability >= 1.2) faster perform in global memory.

said that, idea follow:

int tid = blockidx.x*blockdim.x + threadidx.x;  __shared__ int sflag; // initialize flag if (threadidx.x == 0) sflag = 0; __syncthreads();  if( tid<no_nodes && d_update_flag_threads[tid]) {   // safely update flag   int singleflag = atomicadd(&sflag, 1);   // custom single operation   if ( singleflag == 0)        *d_over=true; // writing single memory location, 1 thread        ... } 

it idea. i've not tested close operation performed single thread, not being first thread of block.


Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -