Configuration

<!-- Configuration -->
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="RobotRulesUseCache" value="True"/>
    <add key="RobotRulesCacheLibrary" value="RobotRules.Cache.MemoryCache, RobotRules"/>
    <add key="RobotRulesCacheTimeout" value="00:01:00" />
  </appSettings>
</configuration>

Use the the parser

using RobotRules; 

var robotParser = new RobotsFileParser() 
                          {
                               LocalUserAgent = @"Mozilla/5.0 (compatible; Bluebot/1.0; +http://bluecurve.codeplex.com/)"
                          };

robotParser.Parse(new Uri("http://blablabla.com"));
if (robotParser.IsAllowed("Bluebot", new Uri ("http://blablabla.com")))
{
   // your code
}

Embedded robots control

If you want to deal with html embedded robots control (meta tag robots) you can use the library like this :
var strategy = robotParser.CheckRobotControlStrategy("Bluebot", "HTML CONTENT");

if (strategy.CanFollow)
{
    // your code
}
if (strategy.CanIndex)
{
    // your code
}

Deal with the cache

If you set RobotRulesUseCache to True you can clear the cache

   robotParser.ClearCache();

The Dispose() method of RobotsFileParser always call the Dispose() method of the cache.

Last edited Jun 10, 2014 at 9:49 PM by teddyalbina, version 11

Comments

teddyalbina Jun 14, 2014 at 11:29 AM 
Version 1.5.2.4 :
Fix cache initialization