improve python regex performance -


trying improve regex below:

urlpath=columns[4].strip()                                 urlpath=re.sub("(\?.*|\/[0-9a-f]{24})","",urlpath)                                 urlpath=re.sub("\/[0-9\/]*","/",urlpath)                                 urlpath=re.sub("\;.*","",urlpath)                                 urlpath=re.sub("\/",".",urlpath)                                 urlpath=re.sub("\.api","api",urlpath)                                 if urlpath in dlatency: 

this transforms url this:

/api/v4/path/apicalltwo?host=wapp&trackid=1347158 

to

api.v4.path.apicalltwo 

would try , improve regex far performance, every 5 minutes script runs across 50,000 files approximately , takes 40 seconds overall run.

thank you

try this:

s = '/api/v4/path/apicalltwo?host=wapp&trackid=1347158' re.sub(r'\?.+', '', s).replace('/', '.')[1:] > 'api.v4.path.apicalltwo' 

for better performance, compile once regular expression , reuse it, this:

regexp = re.compile(r'\?.+') s = '/api/v4/path/apicalltwo?host=wapp&trackid=1347158'  # `s` changes, can reuse `regexp` many times needed regexp.sub('', s).replace('/', '.')[1:] 

an simpler approach, without using regular expressions:

s[1:s.index('?')].replace('/', '.') > 'api.v4.path.apicalltwo' 

Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -