regex - regular expression for finding and replacing variable URL strings in XML -
i'm having difficulty figuring out regular expression stripping part of string within particular xml tag , replacing it. have number of url paths variable parts, need find between string , last slash in url. example, might have tags , urls this:
<bpoc:resourcemetadataloc>http://app01/media/images/i//1951-1960_embark_object_photos/1957.59.jpg</bpoc:resourcemetadataloc>
or
<bpoc:resourcemetadataloc>http://app01/media/images/contemporary/1986-2005/1991.2.jpg</bpoc:resourcemetadataloc>
the output should like
<bpoc:resourcemetadataloc>http://app01/media/previews/1957.59.jpg</bpoc:resourcemetadataloc>
this far got, captures last slash in string, , not second-to-last slash:
(<bpoc:resourcemetadataloc>http://app01/media/images)+(.*[/])
that regex capture following:
<bpoc:resourcemetadataloc>http://app01/media/images/i//1951-1960_embark_object_photos/1957.59.jpg</
what need add regex exclude </bpoc:resourcemetadataloc> bit query , capture prior last slash in url?
because xml, there can't (non-escaped) < or > in url itself. can use advantage:
<bpoc:resourcemetadataloc>http://app01/media/images[^<]*/([^<]*) this should capture last segment (e.g. "1957.59.jpg") of url. works greedily matching start of end-of-tag (the first [^<]*), backtracking match nearest (i.e. last) /, capturing after slash (the ([^<]*)) group 1 can use during replacement step.
Comments
Post a Comment