Thursday, April 11, 2013

Remove HTML tags in an XML String Document using REGEX (C#)

Here's a regex pattern that will match html tags that are present in
an xml string document. Where xml, node1, node2, node3, node4, node5, node6 and node7 are xml tags. node1 could represent a valid xml tag name like employername or similar tag names.
Code:
 xmlStringData = Regex.Replace(xmlStringData, @"<((\??)|(/?)|(!))\s?\b(?!  
 (\b(xml|node1||node2|node3|node4|node5|node6|node7)\b))\b[^>]*((/?))>", " ",  
 RegexOptions.IgnoreCase);

Note: This is only applicable for small xml files. Using this pattern to large xml files will cause memory exception.
Greg

0 comments:

Post a Comment