Friday, April 26, 2013

Webrequest url not returning correct webpage source if proxynull isn't use as part of url query string (Web Scraping)

In one of the sites im crawling, I encountered a situation where
a site needs a query string like proxynull = 90B69303-3A61-4482-AF0725FDA1DAE548 or appended into a url like this http://samplesite/bin/jobs_list.cfm?proxynull=90B69303-3A61-4482-AF0725FDA1DAE548

I wonder if i could just use the post data and use the url without the proxynull query string like this http://samplesite/bin/jobs_list.cfm to scrape the website.

After series of experimentation, the solution is to set the webproxy of the webrequest object to default proxy similar to the code below:
((HttpWebRequest)webRequest).Proxy = WebRequest.DefaultWebProxy; 

in order to use the url(http://samplesite/bin/jobs_list.cfm) without proxynull.



