Clarification Requried

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Clarification Requried

Vasudevan Comandur
Hi,

   I am using HTMLUnit 2.23. I received a response from the site which had
   page object mapped to instance of TextPage. I tried to get the content using
   getContent() method but I was not getting the data. The response code was 200
   and the content-type was text/css.

   Am I missing something?.

   The site I am scrapping is http://adh.sagepub.com

    The CSS data which I was trying to read is http://journals.sagepub.com/pb/css/t1493676764000-v1493676764000/head_1_6_7.css

   Appreciate your  help in advance.

Regards
 Vasu

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user
Reply | Threaded
Open this post in threaded view
|

Re: Clarification Requried

asashour
Hi Vasu,

Please use latest version, if not latest build.

And post your complete code.

Ahmed



From: Vasudevan Comandur <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: Tuesday, May 2, 2017 9:13 PM
Subject: [Htmlunit-user] Clarification Requried

Hi,

   I am using HTMLUnit 2.23. I received a response from the site which had
   page object mapped to instance of TextPage. I tried to get the content using
   getContent() method but I was not getting the data. The response code was 200
   and the content-type was text/css.

   Am I missing something?.

   The site I am scrapping is <a rel="nofollow" class="yiv5569072223gmail-hyperlink1" target="_blank" onclick="return window.theMainWindow.showLinkWarning(this)" href="http://localhost:8080/jlf/search/generalredirect?rurl=http://adh.sagepub.com">http://adh.sagepub.com

    The CSS data which I was trying to read is http://journals.sagepub.com/pb/css/t1493676764000-v1493676764000/head_1_6_7.css

   Appreciate your  help in advance.

Regards
 Vasu

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user
Reply | Threaded
Open this post in threaded view
|

Re: Clarification Requried

Vasudevan Comandur
Hi Ahmed,

    The call (getContent ()) which I am making to read the TextPage instance is correct I suppose.

Regards
 Vasu

On 3 May 2017 at 00:59, Ahmed Ashour <[hidden email]> wrote:
Hi Vasu,

Please use latest version, if not latest build.

And post your complete code.

Ahmed



From: Vasudevan Comandur <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: Tuesday, May 2, 2017 9:13 PM
Subject: [Htmlunit-user] Clarification Requried

Hi,

   I am using HTMLUnit 2.23. I received a response from the site which had
   page object mapped to instance of TextPage. I tried to get the content using
   getContent() method but I was not getting the data. The response code was 200
   and the content-type was text/css.

   Am I missing something?.

   The site I am scrapping is http://adh.sagepub.com

    The CSS data which I was trying to read is http://journals.sagepub.com/pb/css/t1493676764000-v1493676764000/head_1_6_7.css

   Appreciate your  help in advance.

Regards
 Vasu

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user
Reply | Threaded
Open this post in threaded view
|

Re: Clarification Requried

Vasudevan Comandur
In reply to this post by asashour
Hi Ahmed,

    I had changed the HTTP Header Accept-Encoding to deflate and HTMLUnit 2.23 was reading the content. However, if I had left it to the
    default Accept-Encoding to gzip, deflate header, it was not giving me the content.

    Let me know if you need any other stuff from me.

    Response HEader from Host when defalte was set


HTTP/1.1 200 OK
Server AtyponWS/7.1
Last-Modified Mon, 01 May 2017 22:55:39 GMT
Expires Thu, 19 Oct 2017 05:40:28 GMT
Cache-Control public
Vary User-Agent,Accept-Encoding
Content-Type text/css; charset=UTF-8
Date Tue, 02 May 2017 20:23:20 GMT
Content-Encoding deflate
Transfer-Encoding chunked

 

    Response Header from Host when gzip was set


HTTP/1.1 200 OK
Server AtyponWS/7.1
Content-Encoding gzip
Last-Modified Mon, 01 May 2017 22:55:39 GMT
Expires Thu, 19 Oct 2017 05:40:28 GMT
Cache-Control public
Vary User-Agent,Accept-Encoding
Content-Type text/css; charset=UTF-8
Transfer-Encoding chunked
Date Tue, 02 May 2017 11:34:24 GMT

Regards
 Vasu

On 3 May 2017 at 00:59, Ahmed Ashour <[hidden email]> wrote:
Hi Vasu,

Please use latest version, if not latest build.

And post your complete code.

Ahmed



From: Vasudevan Comandur <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: Tuesday, May 2, 2017 9:13 PM
Subject: [Htmlunit-user] Clarification Requried

Hi,

   I am using HTMLUnit 2.23. I received a response from the site which had
   page object mapped to instance of TextPage. I tried to get the content using
   getContent() method but I was not getting the data. The response code was 200
   and the content-type was text/css.

   Am I missing something?.

   The site I am scrapping is http://adh.sagepub.com

    The CSS data which I was trying to read is http://journals.sagepub.com/pb/css/t1493676764000-v1493676764000/head_1_6_7.css

   Appreciate your  help in advance.

Regards
 Vasu

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user
Reply | Threaded
Open this post in threaded view
|

Re: Clarification Requried

asashour
Hi Vasu,

The below is only the client-side. As the server-side may be sending a different encoding that it states.


Ahmed


From: Vasudevan Comandur <[hidden email]>
To: Ahmed Ashour <[hidden email]>; "[hidden email]" <[hidden email]>
Sent: Tuesday, May 2, 2017 10:30 PM
Subject: Re: [Htmlunit-user] Clarification Requried

Hi Ahmed,

    I had changed the HTTP Header Accept-Encoding to deflate and HTMLUnit 2.23 was reading the content. However, if I had left it to the
    default Accept-Encoding to gzip, deflate header, it was not giving me the content.

    Let me know if you need any other stuff from me.

    Response HEader from Host when defalte was set


HTTP/1.1 200 OK
ServerAtyponWS/7.1
Last-ModifiedMon, 01 May 2017 22:55:39 GMT
ExpiresThu, 19 Oct 2017 05:40:28 GMT
Cache-Controlpublic
VaryUser-Agent,Accept-Encoding
Content-Typetext/css; charset=UTF-8
DateTue, 02 May 2017 20:23:20 GMT
Content-Encodingdeflate
Transfer-Encodingchunked

 

    Response Header from Host when gzip was set


HTTP/1.1 200 OK
ServerAtyponWS/7.1
Content-Encodinggzip
Last-ModifiedMon, 01 May 2017 22:55:39 GMT
ExpiresThu, 19 Oct 2017 05:40:28 GMT
Cache-Controlpublic
VaryUser-Agent,Accept-Encoding
Content-Typetext/css; charset=UTF-8
Transfer-Encodingchunked
DateTue, 02 May 2017 11:34:24 GMT

Regards
 Vasu

On 3 May 2017 at 00:59, Ahmed Ashour <[hidden email]> wrote:
Hi Vasu,

Please use latest version, if not latest build.

And post your complete code.

Ahmed



From: Vasudevan Comandur <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: Tuesday, May 2, 2017 9:13 PM
Subject: [Htmlunit-user] Clarification Requried

Hi,

   I am using HTMLUnit 2.23. I received a response from the site which had
   page object mapped to instance of TextPage. I tried to get the content using
   getContent() method but I was not getting the data. The response code was 200
   and the content-type was text/css.

   Am I missing something?.

   The site I am scrapping is <a rel="nofollow" shape="rect" class="yiv5122133223m_-4724114561589969343yiv5569072223gmail-hyperlink1" target="_blank" onclick="return window.theMainWindow.showLinkWarning(this)" href="http://localhost:8080/jlf/search/generalredirect?rurl=http://adh.sagepub.com">http://adh.sagepub.com

    The CSS data which I was trying to read is http://journals.sagepub.com/ pb/css/t1493676764000- v1493676764000/head_1_6_7.css

   Appreciate your  help in advance.

Regards
 Vasu

------------------------------ ------------------------------ ------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
______________________________ _________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/ lists/listinfo/htmlunit-user


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Htmlunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/htmlunit-user