Monday, March 8, 2010

Why are some Squid/Lusca ACL types slower than others? And which ones?

This post should likely be part of the documentation!

One thing which hasn't really been documented is the relative speed of each of the Squid/Lusca ACL types. This is important to know if you're administering a large Squid/Lusca install - it's entirely possible that the performance of your site will be massively impacted with the wrong ACL setup.

Firstly - the types themselves:
  1. Splay trees are likely the fastest - src, dst, myip, dstdomain, srcdomain
  2. The wordlist checks are linear but place hits back on the top of the wordlist to try and speed up the most looked up items - portname, method, snmp community, urlgroup, hiercode,
  3. The regular expression checks are also linear and also reshuffle the list based on the most popular items - url regex, path regex, source/destination domain regex, request/reply mime type
Now the exceptions! Some require DNS lookups to match on the IP of the hostname being connected to - eg "dst", "srcdom_regex", "dstdom_regex".

A lot of places will simply use URL regular expression ACLs ("url_regex") to filter/forward requests. Unfortunately these scale poorly under high load and are almost always the reason a busy proxy server is pegging at full CPU.

I'll write up an article explaining how to work around these behaviours if enough people ask me nicely. :)

2 comments:

  1. What is a better alternative to url_regex?

    ReplyDelete
  2. Well, for the most part, it boils down to whether you can short-cut the ACL lookup process somehow. Eg, if you have 50 URL regex's for the same domain, put them under one dstdomain ACL check first. That way the URL regex won't be evaluated.

    But for really large ACL sets, I'd suggest external ACL helpers for now.

    ReplyDelete