[ htmlunit-Support Requests-1580497 ] html form's submittable element resides outside of the form

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[ htmlunit-Support Requests-1580497 ] html form's submittable element resides outside of the form

SourceForge.net
Support Requests item #1580497, was opened at 2006-10-19 14:38
Message generated for change (Comment added) made by mguillem
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=448267&aid=1580497&group_id=47038

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Manik (mc27y)
Assigned to: Nobody/Anonymous (nobody)
Summary: html form's submittable element resides outside of the form

Initial Comment:
Hi,

A question about "com.gargoylesoftware.htmlunit".

Lets we have a html file (test1.html) like below where
"<form>" tag is not placed suitably. (However I am not
sure if if the following HTML is valid, but browsers
are worked fine with it).

<html>
<head><title>Testing
com.gargoylesoftware.htmlunit</title></head>
<body>
  <table>
    <tr><td>
    <form name="frmTest" method="post" action="test2.php">
    <table>
      <tr><td>Testing com.gargoylesoftware.htmlunit's
html processing
behaviour</td></tr>
    </table>
    </td></tr>
    <input type="hidden" name="hidXTNUM" value="50">
    </form>
  </table>
</body>
</html>

and lets we have codes (like below) to download and
process the above html file -

//
String strUrl = "http://.../test1.html";
WebClient webClient = new WebClient();
URL url = null;
try {
  url = new URL(strUrl);
} catch (Exception ex) {

  System.out.println(ex.toString());

}

HtmlPage page = null;
try {
  page = (HtmlPage) webClient.getPage(url);
}

catch (Exception ex) {
  System.out.println(ex.toString());

}

HtmlForm frmPage = page.getFormByName("frmTest");
frmPage.getInputByName("hidXTNUM").setAttributeValue("value",
"100");
//

Execution results -

1. It downloads the html page
2. Also It can process the form: HtmlForm frmPage =
page.getFormByName("frmTest");
3. It could not set the "hidXTNUM" value in the last
statement. EXCEPTION.

I found that WebClient has processed the <form> tag
incorrectly and put the "hidXTNUM" hidden element
outside of the form.

Dumping the html file (test1.html) using pase.asXml() I
found the following text and where "hidXTNUM" hidden
input is placed outside of the <form>.

<html>
<head><title>Testing
com.gargoylesoftware.htmlunit</title></head>
<body>
  <table>
    <tr><td>
    <form name="frmTest" method="post" action="test2.php">
    <table>
      <tr><td>Testing com.gargoylesoftware.htmlunit's
html processing
behaviour</td></tr>
    </table>
    </form>
    </td></tr>
    <input type="hidden" name="hidXTNUM" value="50">
  </table>
</body>
</html>

I want "HtmlPage" to tolerate malformed html and
process the <form> tag accurately. Does "HtmlPage"
support this sort of malformed html?

Thanks
Manik


----------------------------------------------------------------------

>Comment By: Marc Guillemot (mguillem)
Date: 2006-10-20 12:46

Message:
Logged In: YES
user_id=402164

If you can't fix your html code at the source, a workaround
could be to provide your own WebConnection which would
modify the html source before it is parsed.

----------------------------------------------------------------------

Comment By: Manik (mc27y)
Date: 2006-10-20 12:05

Message:
Logged In: YES
user_id=1624740

Is there any way to handle the issue using existing parser?


----------------------------------------------------------------------

Comment By: Marc Guillemot (mguillem)
Date: 2006-10-19 20:56

Message:
Logged In: YES
user_id=402164

The html is not correct, form is not allowed at this place.
htmlunit parses it correctly as "normal" browsers do (try to
dump the DOM in a normal browser) BUT you're right, normal
browsers are able to "connect" the fields to the form.
I've an idea how the "lost children" could be mapped to the
form but I've never had time for this nor the necessity. The
parser needs to be modified to "see" the </form> (which is
ignored as <form> has already been closed due to the <tr>)
and register all input fields found since form opening to
the last form element.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=448267&aid=1580497&group_id=47038

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
HtmlUnit-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-develop