Donate

Using Microsoft.mshtml With The WebBrowser Control In C#

Hello and Good Evening

When working with a WebBrowser to extract information from a webpage, the most common classes that are probably used to traverse or navigate it's DOM are from the System.Windows.Forms namespace such as HtmlDocument, HtmlElementCollection and etc. Another way to navigate and traverse the DOM of the WebBrowser's HTMLDocument is using the Microsoft.mshtml namespace which contains interfaces for the rendering engine of Internet Explorer. So to access those interfaces and in your application, add reference to Microsoft.mshtml.
Using Microsoft.mshtml With The WebBrowser Control In C#
In the code sample below, set alias for the Microsoft.mshtml namespace. You need to explicitly specify the namespace of a class or interface as MSHTML since Windows.Forms also have same class names with the former. And one thing to remember is that, empty results in mshtml does not yield nulls. Instead, it produces DBNull value.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
using MSHTML = mshtml; //set namespace alias for form-level use
....
....
List<string> myVidsData = new List<string>();
MSHTML.IHTMLElementCollection anchorElements = 
        ((MSHTML.HTMLDocument)WebBrowser1.Document.DomDocument).getElementsByTagName("a");
 
if (anchorElements != null && anchorElements.length > 0)
{
    foreach (MSHTML.HTMLAnchorElement element in anchorElements)
    {
        var attribute = element.getAttribute("data-vids");
 
        //empty attributes return DBNull
        if(!System.DBNull.Value.Equals(attribute))
        {
            myVidsData.Add(element.getAttribute("data-vids"));
        }
    }
}

Comments

Donate

Popular Posts From This Blog

WPF CRUD Application Using DataGrid, MVVM Pattern, Entity Framework, And C#.NET

How To Insert Or Add Emojis In Microsoft Teams Status Message

TypeScript Error Or Bug: The term 'tsc' is not recognized as the name of a cmdlet, function, script file, or operable program.