walking and averaging values in python -
i have process .txt files presnent in subfolder inside folder.like:
new folder>folder 1 6>xx.txt & yy.txt(files present in each folder)
each file contain 2 columns as:
arg asp gln glu and
arg glu arg arg glu asp now have :
1)count number of occurance of each word each file > , average total count dividing total no. of lines in file
2)then values obtained after completing 1st step, divide values total no. of files present in folder averaging (i.e. 2 in case) have tried code follows:
have succeeded in 1st case i'm not getting 2nd case.
for root,dirs,files in os.walk(path): aspcount = 0 glu_count = 0 lys_count = 0 arg_count = 0 his_count = 0 acid_count = 0 base_count = 0 count = 0 listoffile = glob.iglob(os.path.join(root,'*.txt') filename in listoffile: linecount = 0 asp_count_col1 = 0 asp_count_col2 = 0 glu_count_col1 = 0 glu_count_col2 = 0 lys_count_col1 = 0 lys_count_col2 = 0 arg_count_col1 = 0 arg_count_col2 = 0 his_count_col1 = 0 his_count_col2 = 0 count += 1 line in map(str.split,inp): saltcount += 1 k = line[4] m = line[6] if k == 'asp': asp_count_col1 += 1 elif m == 'asp': asp_count_col2 += 1 if k == 'glu': glu_count_col += 1 elif m == 'glu': glu_count_col2 += 1 if k == 'lys': lys_count_col1 += 1 elif m == 'lys': lys_count_col2 += 1 if k == 'arg': arg_count_col1 += 1 elif m == 'arg': arg_count_col2 += 1 if k == 'his': his_count_col1 += 1 elif m == 'his': his_count_col2 += 1 asp_count = (float(asp_count_col1 + asp_count_col2))/linecount glu_count = (float(glu_count_col1 + glu_count_col2))/linecount lys_count = (float(lys_count_col1 + lys_count_col2))/linecount arg_count = (float(arg_count_col1 + arg_count_col2))/linecount his_count = (float(his_count_col1 + his_count_col2))/linecount upto able average value per file. how able average per subfolder(i.e. dividing count(total no. of file)). problem 2nd part. 1st part done. code provided average values each file. want add averages , make new average dividing total no. of files present in sub-folder.
your use of os.walk glob.iglob bogus. either use 1 or other, not both together. here's how it:
import os, os.path, re, pprint, sys #... root, dirs, files in os.walk(path): counts = {} nlines = 0 f in filter(lambda n: re.search(r'\.txt$', n), files): l in open(f, 'rt'): nlines += 1 k in l.split(): counts[k] = counts[k]+1 if k in counts else 1 k, v in counts.items(): counts[k] = float(v)/nlines sys.stdout.write('frequencies directory %s:\n'%root pprint.pprint(counts)
Comments
Post a Comment