abstract syntax tree - Python script stops processing when unexpected EOF loop doesn't return to next file -


i have script reads in number of files in directory glob, splits them line line new files based on dates found on each line in particular json field:

here's script works point:

import json import glob import fileinput dateutil import parser import ast import gzip  line = []  filestobeanalyzed = glob.glob('../data/*')  filename in filestobeanalyzed:         inputfilename = filename         print inputfilename         line in fileinput.input([inputfilename]):                 line = line.strip();                 if not line: continue                 line = ast.literal_eval(line)                 line = json.dumps(line)                 if not json.loads(line).get('created_at'): continue                 date = json.loads(line).get('created_at')                 date_converted = parser.parse(date).strftime('%y%m%d')                 outputfilename = gzip.open(date_converted, "a")                 outputfilename.write(line)                 outputfilename.write("\n")  outputfilename.close() 

i'm getting following error when end of first file in directory reached:

python split_json_docs_by_date_with_dict-to-json.py  ../data/research_data_p1.json traceback (most recent call last):   file "split_json_docs_by_date_with_dict-to-json.py", line 18, in <module>                                                                                       line = ast.literal_eval(line)                                                                                                                               file "/usr/lib64/python2.7/ast.py", line 49, in literal_eval                                                                                                    node_or_string = parse(node_or_string, mode='eval')                                                                                                         file "/usr/lib64/python2.7/ast.py", line 37, in parse                                                                                                           return compile(source, filename, mode, pycf_only_ast)                                                                                                       file "<unknown>", line 1                                                                                                                                        {u'user': {u'follow_request_sent': none, u'profile_use_background_image': false, u'default_profile_image': false, u'geo_enabled': false, u'verified': false, u'profile_image_url_https': u'https://si0.twimg.com/profile_images/1829421396/ya6hez2j_normal', u'profile_sidebar_fill_color': u'ddeef6', u'id': 15054232, u'profile_text_color': u'333333', u'followers_count': 117, u'protected': false, u'id_str': u'15054232', u'profile_background_color': u'858585', u'listed_count': 6, u'utc_offset': -25200, u'statuses_count': 9418, u'description': u"hi- i'm jordan, , refuse put effort bio. well... except enough type guess.", u'friends_count': 59, u'location': u'washington terrace, ut', u'profile_link_color': u'0084b4', u'profile_image_url': u'http://a3.twimg.com/profile_images/1829421396/ya6hez2j_normal', u'notifications': n     

it's obvious me ast failing evaluate line since isn't complete if insert:

if not ast.literal_eval(line): continue 

before the:

line = ast.literal_eval(line)  

i still exact same error.

if looking ignore errors can catch exceptions in try block , continue. if want parse multi-line json fileinput might not best choice.

here edit should work you. contains both basic multi-line json support , try blocks unparsables not crash program. untested not have access test data. remove lines comments remove rudimentary multi-line json support.

import json import glob import fileinput dateutil import parser import ast import gzip  line = []  filestobeanalyzed = glob.glob('../data/*')  filename in filestobeanalyzed:         inputfilename = filename         print inputfilename         pastlines = "" # stores previous unparsable lines         line in fileinput.input([inputfilename]):                 line = pastlines + line # put past unparsable lines current line                 line = line.strip();                 if not line: continue                 try:                     line = ast.literal_eval(line)                     line = json.dumps(line)                     pastlines = "" # reset unparsable lines                 except:                     pastlines += line # add current line unparsable lines                     continue                 date = json.loads(line).get('created_at', none)                 if not date: continue                 date_converted = parser.parse(date).strftime('%y%m%d')                 outputfilename = gzip.open(date_converted, "a")                 outputfilename.write(line)                 outputfilename.write("\n")  outputfilename.close() 

Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -