Saturday, December 19, 2009

How does a proxy interfere with throughput?

Another big question with the filtering debate is figuring out how much of an impact on performance an inline proxy filter will have.

Well, it's quite easy to estimate how much of an impact an inline server running Windows/UNIX will have on traffic. And this will be important as the filtering mechanisms tested by the government and which will be implemented by more than one of the ISPs.

Inline proxies are very popular today. They're used in a variety of ISP and corporate environments. Traffic is either forced there via configuration on users' machines, or redirected there transparently by some part of the network (eg a router, or transparent bridge.)

The inline proxy will "hijack" the TCP sessions from the client and terminate them locally. The client believes it is talking to the web server but it actually talking to the proxy server.

Then the inline proxy will issue outbound TCP sessions to the web server as requested by the user - and in some configurations, the web server will think it is talking directly to the client.

This is all relatively well understood stuff. It's been going on for 10-15 years. I was involved in some of the early implementations of this stuff in Australia back in the mid 1990's. What isn't always well understood is how it impacts performance and throughput. Sometimes this doesn't matter for a variety of reasons - the users may not have a large internet connection in the first place, or the proxy is specifically in place to limit how much bandwidth each user can consume. But these proxies are going to be used for everyone, some of which will have multi-megabit internet connections. Performance and throughput suddenly matter.

I'll cover one specific example today - how inline proxies affect data throughput. There are ways that inline proxies affect perceived request times (ie, how long it takes to begin and complete a web request) which will take a lot more space to write about.

Each request on the inline proxy - client facing and server facing - will have a bit of memory reserved to store data which is being sent and received. The throughput of a connection is, roughly speaking, limited by how big this buffer is. If the buffer is small, then you'll only get fast speeds when speaking to web sites that are next door to you. If the buffer is large, you'll get fast speeds when speaking to web sites that are overseas - but only if they too have large buffers on their servers.

These buffers take memory, and memory is a fixed commodity in a server. Just to give you an idea - if you have 1GB of RAM assigned for "network buffers", and you're using 64 kilobyte buffers for each session, then you can only hold up (1 gigabyte / 64 kilobyte) sessions - ie, 16,384 sessions. This may sound like a lot of sessions! But how fast can you download with a 64 kilobyte buffer?

If you're 1 millisecond away from the webserver (ie, its on the same LAN as you), then that 64 kilobyte buffer will give you (64 / 0.001) kilobytes - or ~ 64 megabytes a second. That's 512 megabits. Quite quick, no?

But if you're on DSL, your latency will be at least 10 milliseconds on average. That's 6.4 megabytes a second, or 51.2 megabits. Hm, it's still faster than ADSL2, but suddenly its slower than what bandwidth the NBN is going to give you.

Say you're streaming from Google. My Perth ISP routes traffic to/from Google in Sydney for a few services. That's 53 milliseconds. With 64 kilobyte buffers, that's (64 / 0.053), or 1207 kilobytes/second. Or, around a megabyte a second. Or, say, 8-10 megabits a second. That isn't even ADSL2 speed (24 megabits), let alone NBN speeds (100 megabits.)

So the operative question here is - how do you get such fast speeds when talking to websites in different cities, states or countries to you? The answer is quite simple. Your machine has plenty of RAM for you - so your buffers can be huge. Those streaming websites you're speaking to build servers which are optimised for handling large buffered streams - they'll buy servers which -just- stream flash/video/music, which handle a few thousand clients per server, and have gigabytes of RAM. They're making enough money from the service (I hope!) to just buy more streaming servers where needed - or they'll put the streaming servers all around the world, closer to end-users, so they don't need such big buffers when talking to end users.

What does this all mean for the performance through a filtering proxy?

Well, firstly, the ISP filtering proxy is going to be filtering all requests to a website. So, it'll have to filter all requests (say) to Youtube, or Wikipedia. This means that all streaming content is potentially passing through it. It's going to handle streaming and non-streaming requests for the websites in question.

So say you've got a filtering proxy with 16GB of RAM, and you've got 64 kilobyte buffers. You have:
  • Say, minimum, 262,144 concurrent sessions (16 gigabytes / 64 kilobytes) going through the proxy before you run out of network buffers. You may have more sessions available if there aren't many streaming/downloading connections, but you'll always have that minimum you need to worry about.
  • Actually, it's half of that - as you have a 64 kilobyte buffer for transmit and a 64 kilobyte buffer for receive. So that's 131,072 concurrent sessions.
  • If your streaming site is luckily on a LAN, and you're on a LAN to the proxy - you'll get ~ 100mbit.
  • If you're on ADSL (10 milliseconds) from the proxy - you'll get 6.4 megabytes/second, or 51 megabits/sec from the proxy.
  • If you're on NBN (1 millisecond, say) from the proxy - you'll get 64 megabytes/second, or 512 megabits from the proxy.
  • BUT - if the proxy is 50 milliseconds from the web server - then no matter how fast your connection is, you're only going to get maximum (65536 / 0.050) bytes/sec, or 1.2 megabytes/second, or 12 megabits/second.
  • And woe be if you're talking to a US site. No matter how fast your connection is, the proxy will only achieve speeds of 320 kilobytes/sec, or 2.5 megabits. Not even ADSL1 speed.
The only way to increase the throughput your proxy has is to use larger buffers - which means either packing much more RAM into a server, or limiting the number of connections you can handle, or buying more servers. Or, if you're very unlucky, all of the above.

Now, the technical people in the know will say that modern operating systems have auto-tuning buffering. You only need to use big buffers for distant connections, rather than for all connections. And sure, they're right. This means that the proxy will handle more connections and obtain higher throughput. But the question now is how you design a proxy service which meets certain goals. Sure, you can design for the best case - and things like auto-tuning buffering is certainly good for raising the best-case performance. But the worst case performance doesn't change. If you have lots of international streaming sessions suddenly being filtered (because, say, some very popular US-centric website gets filtered because of one particular video), the performance will suddenly drop to the worst case scenario, and everyone suffers.

Now, just because I can blow my own trumpet a bit - when I design Squid/Lusca web proxy solutions, I always design for the worst case. And my proxies work better for it. Why? Because I make clear to the customer that the worst case solution is what we should be designing and budgeting for, and new equipment should be purchased based on that. The best case performance is just extra leg room during peak periods. That way clients are never surprised by poorly performing and unstable proxies, and the customer themselves knows exactly when to buy hardware. (They can then choose not to buy new proxies and save money - but then, when you're saving $100k a month on a $6k server, buying that second $6k server to save another $100k a month suddenly makes a lot of sense. Skimping on $6k and risking the wrath of your clients isn't appealing.)


  1. /If/ they're smart, they'll do what many ISPs in the UK do - announce fake BGP routes to sites they want to do some filtering for, and only proxy and filter those hosts. Meaning that in order to hit a proxy, the site has to share an IP with a bad site..

  2. theducks: sure, but when you block say, Youtube, you can't just block a single IP route. You have to start redirecting large parts of the Google infrastructure because of the underlying CDN.

    Which is a good point. I'm going to write a basic post about that! Thanks Alex!