Baffling magic behaviour with dict variable & for loop in Python -
junior python programmer here , i've been beating head against brick wall on unexpected loop , dictionary behavior. i'm looping through csv file of log entries , parsing data categories dict. when initialize categories dict each time through loop, works expected..
like so:
log_entries = autovivification() # http://stackoverflow.com/questions/635483/what-is-the-best-way-to-implement-nested-dictionaries-in-python def scrublooper(log_file): ll in log_file: # initialize categories dict every round through loop categories = {'requests': {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 0, 'pages': 0, 'content_files': 0}, 'filter_action': {'re': 0, 'pl': 0, 'bs': 0}} lld = logdomain(ll) domain, hostname, lan_host = lld.domain, lld.hostname, lld.lan_host mimetypes = url_searcher(settings.mimetypes, lld.mime_type) if mimetypes: category = mimetypes[2] if not log_entries[lan_host].has_key(domain): log_entries[lan_host][domain]= categories log_entries[lan_host][domain]['requests'][category] += 1 print log_entries['192.168.5.210']['google.com']['requests'] print log_entries['192.168.5.210']['webtrendslive.com']['requests'] print log_entries['192.168.5.210']['osnews.com']['requests'] print log_entries['192.168.5.210']['question-defense.com']['requests'] print log_entries['192.168.5.210']['optimost.com']['requests'] output expect:
{'content_visual': 0, 'content_programsupdates': 0, 'content_text': 95, 'pages': 0, 'content_files': 0} {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 1, 'pages': 0, 'content_files': 0} {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 2, 'pages': 0, 'content_files': 0} {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 18, 'pages': 0, 'content_files': 0} {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 3, 'pages': 0, 'content_files': 0} however! here problem. don't want initialize categories dict every time through loop. in simplified example case doesn't matter, down road program, it'll cause significant performance degradation (30%).
i need initialize categories dict once:
log_entries = autovivification() categories = {'requests': {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 0, 'pages': 0, 'content_files': 0}, 'filter_action': {'re': 0, 'pl': 0, 'bs': 0}} def scrublooper(log_file): ll in log_file: lld = logdomain(ll) # etc, etc, etc however, when initialize categories dict anywhere outside loop (whether in scrublooper function or right after log_entries variable), output is:
{'content_visual': 0, 'content_programsupdates': 0, 'content_text': 685, 'pages': 0, 'content_files': 0} {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 685, 'pages': 0, 'content_files': 0} {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 685, 'pages': 0, 'content_files': 0} {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 685, 'pages': 0, 'content_files': 0} {'content_visual': 0, 'content_programsupdates': 0, 'content_text': 685, 'pages': 0, 'content_files': 0} all 'conent_text' values have incremented equally! happening here? i'm sure i've violating python principle don't know or how find out. took me hours figure out problem connected categories dict.
much obliged explanation.
i'm not familiar tools you're using, when create dictionary outside of loop, you're creating 1 dictionary.
if not log_entries[lan_host].has_key(domain): log_entries[lan_host][domain]= categories this code makes log_entries[lan_host][domain] point single dictionary. python doesn't copy values or that. these lines refer same dictionary.
log_entries['192.168.5.210']['google.com'] log_entries['192.168.5.210']['webtrendslive.com'] p.s. can't sure, gut says not wanting initialize new dictionary performance excessive.
Comments
Post a Comment