Most load balancers do health checks on their backend nodes. While this is a good thing, you most likely don't want them to be counted in analytics. One simple solution to overcome this, is to add the load balancer IP address to the "exclude robots list" (in our specific case, user agent string was no option).
However, Sitecore doesn't take the X-Forwarded-For
http header (or rather the Analytics.ForwardedRequestHttpHeader
setting) into account when resolving the client IP for robot detection (CheckIpAddress
pipeline processor). This is a known issue with a patch available: https://kb.sitecore.net/articles/454115
Please note, that instead of resolving all inconsistencies related to robot detection and the Analytics.ForwardedRequestHttpHeader
setting, this adds another inconsistency. While the XForwardedFor
pipeline processor of the createVisit
pipeline allows you to specify an index to determine which IP address from the header to use, this fix always uses the last entry. To avoid this, you can adjust the patch:
public class CheckIpAddress : ExcludeRobotsProcessor {
public int HeaderIpIndex { get; set; }
public override void Process(ExcludeRobotsArgs args) {
var httpContext = System.Web.HttpContext.Current;
Assert.IsNotNull(httpContext?.Request, "httpContext or httpContext.Request");
var headerValue = httpContext.Request.Headers[AnalyticsSettings.ForwardedRequestHttpHeader];
var ipAddress = GetIpFromHeader(headerValue) ?? httpContext.Request.UserHostAddress;
if (ipAddress == null || !AnalyticsSettings.Robots.ExcludeList.ContainsIpAddress(ipAddress)) {
return;
}
args.IsInExcludeList = true;
}
private string GetIpFromHeader(string headerValue) {
if (string.IsNullOrEmpty(headerValue)) {
return null;
}
var strArray = headerValue.Split(',');
var str = HeaderIpIndex < strArray.Length ? strArray[HeaderIpIndex] : strArray.LastOrDefault();
return string.IsNullOrEmpty(str) ? null : str.Trim();
}
}
And then patch it in:
<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<pipelines>
<excludeRobots>
<processor type="YourAssembly.YourNamespace.CheckIpAddress, YourAssembly" patch:instead="*[@type='Sitecore.Analytics.Pipelines.ExcludeRobots.CheckIpAddress, Sitecore.Analytics']">
<HeaderIpIndex>0</HeaderIpIndex>
</processor>
</excludeRobots>
</pipelines>
</sitecore>
</configuration>
Now the behavior for IP resolution should take the forwarded for header into account and be consistent.