Need help to access a page (Illegal Character)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Need help to access a page (Illegal Character)

Guillaume Lepinay
Hello every one

I'm trying to use HtmlUnit to connect a webpage, but a javascript has some accent characters (like é or è or à) and it throws an exception.

The code to reproduce the issue is very simple :

        String urlDepart = "https://www.ca-languedoc.fr/";
        try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
            
            
            HtmlPage page = webClient.getPage(urlDepart);
            String title = page.getTitleText();

            System.out.println("Title : "+title);

        }

I tried to add the "-Dfile.encoding=UTF-8" but it didn't solve the problem.

Do you have any idea about it ?

I'm using HTML Unit 2.27, from maven : 

        <dependency>
            <groupId>net.sourceforge.htmlunit</groupId>
            <artifactId>htmlunit</artifactId>
            <version>2.27</version>
        </dependency>

Thank you for your help :)
Guillaume

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user
Reply | Threaded
Open this post in threaded view
|

Re: Need help to access a page (Illegal Character)

Ronald Brill
Thanks for the info, will have a look.
Can you please open an issue....

Thanks
     RBRi

On Wed, 21 Jun 2017 01:51:46 -0700 Guillaume Lepinay wrote:

>
>Hello every one
>
>I'm trying to use HtmlUnit to connect a webpage, but a javascript has some
>accent characters (like é or ? or ?) and it throws an exception.
>
>The code to reproduce the issue is very simple :
>
>*        String urlDepart = "https://www.ca-languedoc.fr/
><https://www.ca-languedoc.fr/>";*
>*        try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {*
>
>
>*            HtmlPage page = webClient.getPage(urlDepart);*
>*            String title = page.getTitleText();*
>
>*            System.out.println("Title : "+title);*
>
>*        }*
>
>The page is : https://www.ca-languedoc.fr/Vitrine/ObjCommun/js/xiti.js
>I tried to add the "-Dfile.encoding=UTF-8" but it didn't solve the problem.
>
>Do you have any idea about it ?
>
>I'm using HTML Unit 2.27, from maven :
>
>        <dependency>
>            <groupId>net.sourceforge.htmlunit</groupId>
>            <artifactId>htmlunit</artifactId>
>            <version>2.27</version>
>        </dependency>
>
>Thank you for your help :)
>Guillaume
>
>
>
>----< Inline text [text-plain-04.txt] >------------------
>
>-----------------------------------------------------------------------
-------

>Check out the vibrant tech community on one of the world's most
>engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>----< Inline text [text-plain-05.txt] >------------------
>
>_______________________________________________
>Htmlunit-user mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user
Reply | Threaded
Open this post in threaded view
|

Re: Need help to access a page (Illegal Character)

Guillaume Lepinay
Hello,

thank you for your fast reply.

I just checked the bug tracker and I found this similar issue : https://sourceforge.net/p/htmlunit/bugs/1895/ 
There is a comment that says it was working with version 2.23. So I tested my situation with version 2.23, and that's true, with version 2.23 it is working, but not with 2.27.

Le mer. 21 juin 2017 à 11:32, Ronald Brill <[hidden email]> a écrit :
Thanks for the info, will have a look.
Can you please open an issue....

Thanks
     RBRi

On Wed, 21 Jun 2017 01:51:46 -0700 Guillaume Lepinay wrote:
>
>Hello every one
>
>I'm trying to use HtmlUnit to connect a webpage, but a javascript has some
>accent characters (like é or ? or ?) and it throws an exception.
>
>The code to reproduce the issue is very simple :
>
>*        String urlDepart = "https://www.ca-languedoc.fr/
><https://www.ca-languedoc.fr/>";*
>*        try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {*
>
>
>*            HtmlPage page = webClient.getPage(urlDepart);*
>*            String title = page.getTitleText();*
>
>*            System.out.println("Title : "+title);*
>
>*        }*
>
>The page is : https://www.ca-languedoc.fr/Vitrine/ObjCommun/js/xiti.js
>I tried to add the "-Dfile.encoding=UTF-8" but it didn't solve the problem.
>
>Do you have any idea about it ?
>
>I'm using HTML Unit 2.27, from maven :
>
>        <dependency>
>            <groupId>net.sourceforge.htmlunit</groupId>
>            <artifactId>htmlunit</artifactId>
>            <version>2.27</version>
>        </dependency>
>
>Thank you for your help :)
>Guillaume
>
>
>
>----< Inline text [text-plain-04.txt] >------------------
>
>-----------------------------------------------------------------------
-------
>Check out the vibrant tech community on one of the world's most
>engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>----< Inline text [text-plain-05.txt] >------------------
>
>_______________________________________________
>Htmlunit-user mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
>

--
Cordialement,
Guillaume Lepinay
09 52 95 97 99

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user