Loading external python modules for Pig UDFs on Amazon EMR -
i've created python udf convert datetimes different timezones. script uses pytz doesn't ship python (or jython). i've tried couple things:
- bootstrapping pig install it's own jython , including pytz in jython installation. can't pig use newly installed jython, keeps reverting amazon's jython.
- setting pythonpath local directory new modules have been installed
- setting hadoop_classpath/pig_classpath new installation of jython
each of these ends "importerror: no module named pytz" when try load udf script. script loads fine if remove pytz it's external module that's giving problems.
edit: put comment thought i'd make edit:
i've tried every way know of pig recognize jython jar. hasn't worked. amazon's jython here: /home/hadoop/.versions/pig-0.9.2/lib/pig/jython.jar, recognizing sys.path: /home/hadoop/lib/lib. can't figure out how build external libraries against jar.
could manually hack sys.path inside of jython script?
Comments
Post a Comment