abstract syntax tree - Python script stops processing when unexpected EOF loop doesn't return to next file -
i have script reads in number of files in directory glob, splits them line line new files based on dates found on each line in particular json field:
here's script works point:
import json import glob import fileinput dateutil import parser import ast import gzip line = [] filestobeanalyzed = glob.glob('../data/*') filename in filestobeanalyzed: inputfilename = filename print inputfilename line in fileinput.input([inputfilename]): line = line.strip(); if not line: continue line = ast.literal_eval(line) line = json.dumps(line) if not json.loads(line).get('created_at'): continue date = json.loads(line).get('created_at') date_converted = parser.parse(date).strftime('%y%m%d') outputfilename = gzip.open(date_converted, "a") outputfilename.write(line) outputfilename.write("\n") outputfilename.close() i'm getting following error when end of first file in directory reached:
python split_json_docs_by_date_with_dict-to-json.py ../data/research_data_p1.json traceback (most recent call last): file "split_json_docs_by_date_with_dict-to-json.py", line 18, in <module> line = ast.literal_eval(line) file "/usr/lib64/python2.7/ast.py", line 49, in literal_eval node_or_string = parse(node_or_string, mode='eval') file "/usr/lib64/python2.7/ast.py", line 37, in parse return compile(source, filename, mode, pycf_only_ast) file "<unknown>", line 1 {u'user': {u'follow_request_sent': none, u'profile_use_background_image': false, u'default_profile_image': false, u'geo_enabled': false, u'verified': false, u'profile_image_url_https': u'https://si0.twimg.com/profile_images/1829421396/ya6hez2j_normal', u'profile_sidebar_fill_color': u'ddeef6', u'id': 15054232, u'profile_text_color': u'333333', u'followers_count': 117, u'protected': false, u'id_str': u'15054232', u'profile_background_color': u'858585', u'listed_count': 6, u'utc_offset': -25200, u'statuses_count': 9418, u'description': u"hi- i'm jordan, , refuse put effort bio. well... except enough type guess.", u'friends_count': 59, u'location': u'washington terrace, ut', u'profile_link_color': u'0084b4', u'profile_image_url': u'http://a3.twimg.com/profile_images/1829421396/ya6hez2j_normal', u'notifications': n it's obvious me ast failing evaluate line since isn't complete if insert:
if not ast.literal_eval(line): continue before the:
line = ast.literal_eval(line) i still exact same error.
if looking ignore errors can catch exceptions in try block , continue. if want parse multi-line json fileinput might not best choice.
here edit should work you. contains both basic multi-line json support , try blocks unparsables not crash program. untested not have access test data. remove lines comments remove rudimentary multi-line json support.
import json import glob import fileinput dateutil import parser import ast import gzip line = [] filestobeanalyzed = glob.glob('../data/*') filename in filestobeanalyzed: inputfilename = filename print inputfilename pastlines = "" # stores previous unparsable lines line in fileinput.input([inputfilename]): line = pastlines + line # put past unparsable lines current line line = line.strip(); if not line: continue try: line = ast.literal_eval(line) line = json.dumps(line) pastlines = "" # reset unparsable lines except: pastlines += line # add current line unparsable lines continue date = json.loads(line).get('created_at', none) if not date: continue date_converted = parser.parse(date).strftime('%y%m%d') outputfilename = gzip.open(date_converted, "a") outputfilename.write(line) outputfilename.write("\n") outputfilename.close()
Comments
Post a Comment