improve python regex performance -
trying improve regex below:
urlpath=columns[4].strip() urlpath=re.sub("(\?.*|\/[0-9a-f]{24})","",urlpath) urlpath=re.sub("\/[0-9\/]*","/",urlpath) urlpath=re.sub("\;.*","",urlpath) urlpath=re.sub("\/",".",urlpath) urlpath=re.sub("\.api","api",urlpath) if urlpath in dlatency: this transforms url this:
/api/v4/path/apicalltwo?host=wapp&trackid=1347158 to
api.v4.path.apicalltwo would try , improve regex far performance, every 5 minutes script runs across 50,000 files approximately , takes 40 seconds overall run.
thank you
try this:
s = '/api/v4/path/apicalltwo?host=wapp&trackid=1347158' re.sub(r'\?.+', '', s).replace('/', '.')[1:] > 'api.v4.path.apicalltwo' for better performance, compile once regular expression , reuse it, this:
regexp = re.compile(r'\?.+') s = '/api/v4/path/apicalltwo?host=wapp&trackid=1347158' # `s` changes, can reuse `regexp` many times needed regexp.sub('', s).replace('/', '.')[1:] an simpler approach, without using regular expressions:
s[1:s.index('?')].replace('/', '.') > 'api.v4.path.apicalltwo'
Comments
Post a Comment