Python Urllib2 Reading only part of document -

September 15, 2011

ok, driving me nuts.

i trying read crunchbase api using python's urllib2 library. relevant code:

api_url="http://api.crunchbase.com/v/1/financial-organization/venrock.js" len(urllib2.urlopen(api_url).read())

the result either 73493 or 69397. actual length of document longer. when try on different computer, length either 44821 or 40725. i've tried changing user-agent, using urllib, increasing time-out large number, , reading small chunks @ time. same result.

i assumed server problem, browser reads whole thing.

python 2.7.2, os x 10.6.8 ~40k lengths. python 2.7.1 running ipython ~70k lengths, os x 10.7.3. thoughts?

there kooky server. might work if you, browser, request file gzip encoding. here code should trick:

import urllib2, gzip  api_url='http://api.crunchbase.com/v/1/financial-organization/venrock.js' req = urllib2.request(api_url) req.add_header('accept-encoding', 'gzip') resp = urllib2.urlopen(req) data = resp.read()  >>> print len(data) 26610

the problem decompress data.

from stringio import stringio  if resp.info().get('content-encoding') == 'gzip':     g = gzip.gzipfile(fileobj=stringio(data))     data = g.read()  >>> print len(data) 183159

Search This Blog

Funaction

Python Urllib2 Reading only part of document -

Comments

Post a Comment

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -