How to extract an attribute from an HTML element with Ant? -


i have ant configuration file becoming complicated, , i'm stuck issue. 1 of tasks retrieves page website , saves file. need load such file , extract href attribute of specific element. html reasonably formed, can't guarantee it.

i thinking of regex, element's attributes not guaranteed appear in same order (e.g. class name, or id). besides, haven't found out how return value of href attribute, without attribute itself.

i'm trying limit amount of addons added ant, therefore "self-contained" solution welcome. thanks.

i'm not sure how you're going find specific html element has href you're looking (i'd assume checking id attribute, did not so). put chain of regex's filter html down candidate anchor tags , strip out href's. used source of page sample input , since couldn't find id attributes associated anchors (that had hrefs), filtered down anchors class="question-hyperlink" -- i'm hopeful starting point (and note: stipulated, not contain dependencies on additional modules, etc, regardless of how easy install):

<?xml version="1.0" encoding="utf-8"?> <project name="test html attribute" default="test" basedir=".">    <target name="test">        <loadfile srcfile="ant.htm" property="html">          <filterchain>             <linecontainsregexp>               <regexp pattern="&lt;a.*href[^&gt;]*&gt;"/>               <regexp pattern="&lt;a.*class=[&quot;']question-hyperlink[&quot;'][^&gt;]*&gt;"/>             </linecontainsregexp>             <tokenfilter>                <replaceregex pattern=".*&lt;a.*href=[&quot;']?([^&gt;&quot;']*).*&gt;[^&lt;]*" replace="\1" flags="gi"/>             </tokenfilter>          </filterchain>       </loadfile>        <echo>${html}</echo>    </target> </project> 

Comments

Popular posts from this blog

jquery - Invalid Assignment Left-Hand Side -

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -