parallel processing - Parallelly started Processes in Python executing serially? -


i have understanding problem/question concerning multiprocessing library of python:
why different processes started (almost) simultaneously @ least seem execute serially instead of parallely?

the task control universe of large number of particles (a particle being set of x/y/z coordinates , mass) , perform various analyses on them while taking advantage of multi-processor environment. example shown below want calculate center of mass of particles.
because task says use multiple processors didn't use thread library there gil-thingy in place constrains execution 1 processor.
here's code:

from multiprocessing import process, lock, array, value random import random import math time import time  def exercise2(noofparticles, noofprocs):     startingtime = time()     particles = []     processes = []     centercoords = array('d',[0,0,0])     totalmass = value('d',0)     lock = lock()      #create particles     in range(noofparticles):         p = particle()         particles.append(p)      in range(noofprocs):         #determine number of particles every process needs analyse         particlesperprocess = math.ceil(noofparticles / noofprocs)         #create noofprocs processes, each different set of particles                 p = process(target=processbatch, args=(             particles[i*particlesperprocess:(i+1)*particlesperprocess],             centercoords, #handle shared memory             totalmass, #handle shared memory             lock, #handle lock             'batch'+str(i)), #also pass name of process easier logging             name='batch'+str(i))         processes.append(p)         print('created proc:',i)      #start processes     p in processes:         p.start() #here, program waits started process terminate. why?      #wait processes finish     p in processes:         p.join()      #normalize coordinates     centercoords[0] /= totalmass.value     centercoords[1] /= totalmass.value     centercoords[2] /= totalmass.value      print(centercoords[:])     print('total time used', time() - startingtime, ' seconds')   class particle():     """a particle simple physical object, having set of x/y/z coordinates , mass.     values randomly set @ initialization of object"""      def __init__(self):         self.x = random() * 1000         self.y = random() * 1000         self.z = random() * 1000         self.m = random() * 10      def printproperties(self):         attrs = vars(self)         print ('\n'.join("%s: %s" % item item in attrs.items()))  def processbatch(particles,centercoords,totalmass,lock,name):     """calculates mass-weighted sum of coordinates of particles sum of masses.     writes results shared memory centercoords , totalmass, using lock"""      print(name,' started')     mass = 0     centerx = 0     centery = 0     centerz = 0      p in particles:         centerx += p.m*p.x         centery += p.m*p.y         centerz += p.m*p.z         mass += p.m      lock:         centercoords[0] += centerx         centercoords[1] += centery         centercoords[2] += centerz         totalmass.value += mass      print(name,' ended')  if __name__ == '__main__':     exercise2(2**16,6) 

now i'd expect processes start @ same time , parallelly execute. when @ output of programm, looks if processes executing serially:

created proc: 0 created proc: 1 created proc: 2 created proc: 3 created proc: 4 created proc: 5 batch0  started batch0  ended batch1  started batch1  ended batch2  started batch2  ended batch3  started batch3  ended batch4  started batch4  ended batch5  started batch5  ended [499.72234074100135, 497.26586187539453, 498.9208784328791] total time used 4.7220001220703125  seconds 

also when stepping through programm using eclipse-debugger, can see how program waits 1 process terminate before starting next 1 @ line marked comment ending in 'why?'. of course, might debugger, when @ output produced in normal run, shows above picture.

  • are processes executing parallelly , can't see due sharing problem of stdout?
  • if processes executing serially: why? , how can make them run in parallel?

any on understanding appreciated.

i executed above code pydev , command line using python 3.2.3 on windows 7 machine dual core intel processor.


edit:
due output of program misunderstood problem: processes running in parallel, overhead of pickling large amounts of data , sending subprocesses takes long distorts picture.
moving creation of particles (i.e. data) subprocesses don't have pickled in first place removed problems , resulted in useful, parallel execution of program.
solve task, therefore have keep particles in shared memory don't have passed subprocesses.

i ran code on system (python 2.6.5) , returned instantly results, makes me think perhaps task size small processes finish before next can begin (note starting process slower spinning thread). question total time used 4.7220001220703125 seconds in results, because that's 40x longer took system run same code. scaled number of particles 2**20, , got following results:

('created proc:', 0) ('created proc:', 1) ('created proc:', 2) ('created proc:', 3) ('created proc:', 4) ('created proc:', 5) ('batch0', ' started') ('batch1', ' started') ('batch2', ' started') ('batch3', ' started') ('batch4', ' started') ('batch5', ' started') ('batch0', ' ended') ('batch1', ' ended') ('batch2', ' ended') ('batch3', ' ended') ('batch5', ' ended') ('batch4', ' ended') [500.12090773656854, 499.92759577086059, 499.97075039983588] ('total time used', 5.1031057834625244, ' seconds') 

that's more in line expect. if increase task size?


Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -