Tuesday, November 20, 2012

GC.Collect() method slowing web crawlers

In our web crawlers, we encountered slow crawling of items specially XML feeds. We checked our codes if there are lines generating bottlenecks.

It turned out that we placed GC.Collect() in an inner loop that's processing item per item.

The solution is to comment out GC.Collect() in the inner loop and transfer it to the outer loop statement. Here's the code:
Code:
  while(!EndOfPage)  
  {  
    do  
    {  
      if(date != String.empty)  
      {  
       //comment this statement, transfer outside inner loop...  
       //GC.Collect();  
       //processing of items code statements...      
      }  
    } while (!(url.Equals(String.Empty)));  
    GC.Collect();  
    PageNo++;  
    if(PageNo > totalPage)   
    {   
      EndOfPage = true;   
    }  
  }

Credits to my fellow developer for the discovery.

Greg

0 comments:

Post a Comment