Failing to load the complete html content of a page with ajax

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Failing to load the complete html content of a page with ajax

EricWong


Hello.

I try to load a page by HtmlUnit:
https://www.ecmwf.int/en/forecasts/charts/catalogue/medium-mslp-wind850?time=2017070900,0,2017070900&projection=classical_europe
(It is a page of a major meteorological agency in Europe)

I try to get the complete html content by "HtmlPage.asXml()". However, the image tag:
img class="chart-image" id="map_1_image" src="..."
cannot be loaded even waiting for a period of time.

The whole page can be loaded successfully by both Chrome and Firefox. The attached screen shows the page loaded by Chrome with F12 panel. This page does not require any user click. Just type the URL and wait for the ajax to load is OK.

(Please subsituted the YYYYMMDD component of the URL with the previous day for test if necessary. E.g. if today is 15 Jul 2017, please use 20170714)

May I know how the page can be loaded completely by Htmlunit? Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Failing to load the complete html content of a page with ajax

albu77
I used htmlunit in the past and it was not loading the images. But You can have a look at this link anyway because now it depends of th version you are using https://stackoverflow.com/questions/3425697/does-htmlunit-load-images-when-it-browses-page
Reply | Threaded
Open this post in threaded view
|

Re: Failing to load the complete html content of a page with ajax

EricWong
Thanks for your reply.

I have tried "webClient.getOptions().setDownloadImages(true);" but it does not work.
I am using the latest version 2.27.

I do not need to download any image from the page by Htmlunit. I just want to get the complete html code result just as that shown on the F12 panel of Chrome.
Reply | Threaded
Open this post in threaded view
|

Re: Failing to load the complete html content of a page with ajax

albu77
And can you show what you get from your Page save?
Reply | Threaded
Open this post in threaded view
|

Re: Failing to load the complete html content of a page with ajax

EricWong
I now upload the text file of the result of HtmlPage.asXML() :
AsXmlResult.txt
 As it shows, the image tag as described above and as shown in the captured screen is not found.
Reply | Threaded
Open this post in threaded view
|

Re: Failing to load the complete html content of a page with ajax

albu77
chart-controls div is not showing either.
I think you are not get the page at the right time there should be an ajax call with dom append opearation on success of the ajax call. If I were you I will dig in this direction. perhaps have a look to this link
It's a long time since I've used htmlunit but looking in my sources I found that I used my own class:
 public class MyWebClient extends WebClient ...and also if(ajaxSynchrone){
                        webClient.setAjaxController(new NicelyResynchronizingAjaxController());

There is nothing more I can tell you it's too far away and don't have any way of building any solution now.
Good luck...
Reply | Threaded
Open this post in threaded view
|

Re: Failing to load the complete html content of a page with ajax

EricWong
Thanks for your information. The problem is solved.

My program already included this line of code before:
webClient.setAjaxController(new NicelyResynchronizingAjaxController());

Per advise by you, I focus on this line. I commented it out
//webClient.setAjaxController(new NicelyResynchronizingAjaxController());
and the complete html page can be loaded successfully.

In your program, you determine whether to use it by:
if(ajaxSynchrone) ...

May you say a little about how to determine whether "ajaxSynchrone" is true or false?
Reply | Threaded
Open this post in threaded view
|

Re: Failing to load the complete html content of a page with ajax

albu77
As I created my webclient factory, I checked for that and I passed asynchrone as a parameter and it is set to true. So it's strange but one thing more to say is that I set the browser version of the webclient to BrowserVersion.FIREFOX_24 .
AND LAST BUT NOT LEAST I put also some code I can call
+webClient.attendPourJavascriptSaufTimers(pageAffichageLicence, AttentePourJavascript.BEAUCOUP.getTempo());
+ webClient.waitForBackgroundJavaScript(AttentePourJavascript.DIX_SECONDES.getTempo());
Two methods which allow any background javascript to execute with a time parameters and in some case the time is long sometime less. the first method kill any anytimer running on the page

public int attendPourJavascriptSaufTimers(HtmlPage page,long tempo){

                String texteDuScript = ScriptAExecuter.ANNULE_LES_TIMERS.getScript();
                Object result = page.executeJavaScript(texteDuScript).getJavaScriptResult();
                int retour = this.waitForBackgroundJavaScript(tempo);
                return retour;
        }
 


public enum ScriptAExecuter {
        ANNULE_LES_TIMERS(" limit= 10; \r\n    var np, n= setInterval(function(){},100000); \r\n    np= Math.max(0, n-limit);\r\n    while(n> np){\r\n        clearInterval(n--);\r\n    }");
       
       
       
        final private String script;
       
        ScriptAExecuter(String script) {
                this.script = script;
        }

        public String getScript() {
                return script;
        }
}
AS I said it's very far away so I even don't remember the why and how of these code, but What I know it's still in production and running well with htmlunit 2.14.

I Hope It could help
Reply | Threaded
Open this post in threaded view
|

Re: Failing to load the complete html content of a page with ajax

EricWong
Thanks for your source code sharing.
The programming technique used is quite advanced and it's not easy to understand it. But it's quite interesting. Thanks.