Posts

Showing posts from April, 2013

Donate

WebRequest Url Not Returning Correct Page Source If Proxynull Not Used As Part Of Url Query String (Web Scraping) In C#

In one of the sites im crawling, I encountered a situation where a site needs a query string like proxynull = 90B69303-3A61-4482-AF0725FDA1DAE548 or appended into a url like this http://samplesite/bin/jobs_list.cfm?proxynull=90B69303-3A61-4482-AF0725FDA1DAE548 I wonder if i could just use the post data and use the url without the proxynull query string like this http://samplesite/bin/jobs_list.cfm to scrape the website. After series of experimentation, the solution is to set the webproxy of the webrequest object to default proxy similar to the code below: ((HttpWebRequest)webRequest).Proxy = WebRequest.DefaultWebProxy; in order to use the url(http://samplesite/bin/jobs_list.cfm) without proxynull.

Cannot find JavaScriptSerializer in .Net 4.0

These are the steps for using it in .NET 4.0 1. Create a new console application 2. Change the target to dot.net 4 instead of Client Profile 3. Add a reference to System.Web.Extensions (4.0) 4. Got access to JavaScriptSerializer in Program.cs now :-) Source: Cannot Find Javascript Serializer in .NET 4.0

Remove HTML Tags In An XML String Document Using Regular Expressions (REGEX) In C#

Here's a regex pattern that will match html tags that are present in an xml string document. Where xml, node1, node2, node3, node4, node5, node6 and node7 are xml tags. node1 could represent a valid xml tag name like employername or similar tag names. xmlStringData = Regex.Replace(xmlStringData, @"<((\??)|(/?)|(!))\s?\b(?! (\b(xml|node1||node2|node3|node4|node5|node6|node7)\b))\b[^>]*((/?))>" , " " , RegexOptions.IgnoreCase); Greg

Donate