Quantcast

[HtmlUnit] [htmlunit:bugs] #1869 Parsing invalid numeric character references may fail the page load

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[HtmlUnit] [htmlunit:bugs] #1869 Parsing invalid numeric character references may fail the page load

Joerg Werner

[bugs:#1869] Parsing invalid numeric character references may fail the page load

Status: open
Group: 2.26
Created: Thu Apr 13, 2017 11:31 AM UTC by Joerg Werner
Last Updated: Thu Apr 13, 2017 11:31 AM UTC
Owner: nobody
Attachments:

Hi team,

There is a small parser issue if a numeric character reference is invalid. Imagine a text like "Nimbus™ 3000" that the page author, however, entered as "Nimbus&#84823000" (mind the missing semicolon). As a consequence, the numeric character reference is of course invalid. When such a text is parsed, browsers usually handle this by inserting the � symbol.

HtmlUnit may or may not fail in such a scenario. Looks like the parser can gracefully handle this situation if the offending text is in the body of an element. If it is in an attribute value, the page load fails with an IllegalArgumentException thrown by Neko.

See the attached test case that demonstrates this behavior.

Thanks,
J.


Sent from sourceforge.net because [hidden email] is subscribed to https://sourceforge.net/p/htmlunit/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/htmlunit/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
HtmlUnit-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-develop
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[HtmlUnit] [htmlunit:bugs] #1869 Parsing invalid numeric character references may fail the page load

RBRi-2
  • assigned_to: RBRi

[bugs:#1869] Parsing invalid numeric character references may fail the page load

Status: open
Group: 2.26
Created: Thu Apr 13, 2017 11:31 AM UTC by Joerg Werner
Last Updated: Thu Apr 13, 2017 11:31 AM UTC
Owner: RBRi
Attachments:

Hi team,

There is a small parser issue if a numeric character reference is invalid. Imagine a text like "Nimbus™ 3000" that the page author, however, entered as "Nimbus&#84823000" (mind the missing semicolon). As a consequence, the numeric character reference is of course invalid. When such a text is parsed, browsers usually handle this by inserting the � symbol.

HtmlUnit may or may not fail in such a scenario. Looks like the parser can gracefully handle this situation if the offending text is in the body of an element. If it is in an attribute value, the page load fails with an IllegalArgumentException thrown by Neko.

See the attached test case that demonstrates this behavior.

Thanks,
J.


Sent from sourceforge.net because [hidden email] is subscribed to https://sourceforge.net/p/htmlunit/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/htmlunit/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
HtmlUnit-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-develop
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[HtmlUnit] [htmlunit:bugs] #1869 Parsing invalid numeric character references may fail the page load

RBRi-2
In reply to this post by Joerg Werner
  • status: open --> closed
  • Comment:

Fixed in SVN - you need an updated Neko.

Thanks for reporting...

PS: Solche Fehler können nur Leute mit Umlauten im Namen finden ;-)


[bugs:#1869] Parsing invalid numeric character references may fail the page load

Status: closed
Group: 2.26
Created: Thu Apr 13, 2017 11:31 AM UTC by Joerg Werner
Last Updated: Thu Apr 13, 2017 05:44 PM UTC
Owner: RBRi
Attachments:

Hi team,

There is a small parser issue if a numeric character reference is invalid. Imagine a text like "Nimbus™ 3000" that the page author, however, entered as "Nimbus&#84823000" (mind the missing semicolon). As a consequence, the numeric character reference is of course invalid. When such a text is parsed, browsers usually handle this by inserting the � symbol.

HtmlUnit may or may not fail in such a scenario. Looks like the parser can gracefully handle this situation if the offending text is in the body of an element. If it is in an attribute value, the page load fails with an IllegalArgumentException thrown by Neko.

See the attached test case that demonstrates this behavior.

Thanks,
J.


Sent from sourceforge.net because [hidden email] is subscribed to https://sourceforge.net/p/htmlunit/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/htmlunit/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
HtmlUnit-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-develop
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[HtmlUnit] [htmlunit:bugs] Re: #1869 Parsing invalid numeric character references may fail the page load

Joerg Werner
In reply to this post by Joerg Werner

Dafür sind wir doch da ... ;-)


[bugs:#1869] Parsing invalid numeric character references may fail the page load

Status: closed
Group: 2.26
Created: Thu Apr 13, 2017 11:31 AM UTC by Joerg Werner
Last Updated: Thu Apr 13, 2017 06:53 PM UTC
Owner: RBRi
Attachments:

Hi team,

There is a small parser issue if a numeric character reference is invalid. Imagine a text like "Nimbus™ 3000" that the page author, however, entered as "Nimbus&#84823000" (mind the missing semicolon). As a consequence, the numeric character reference is of course invalid. When such a text is parsed, browsers usually handle this by inserting the � symbol.

HtmlUnit may or may not fail in such a scenario. Looks like the parser can gracefully handle this situation if the offending text is in the body of an element. If it is in an attribute value, the page load fails with an IllegalArgumentException thrown by Neko.

See the attached test case that demonstrates this behavior.

Thanks,
J.


Sent from sourceforge.net because [hidden email] is subscribed to https://sourceforge.net/p/htmlunit/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/htmlunit/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
HtmlUnit-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-develop
Loading...