Archive for the ‘Cache’ Category

Terracotta: unbelievable cool stuff!

Tuesday, July 3rd, 2007

Click this logo to return to the home page

Just found this cool stuff — Terracotta. An open source software provide distributed scalability and availability to JVM. This is from their home page, I guess all Java guys who ever dream to have a nice scalable solution will happy to dead when they see those and if it could possibly done in easy steps:

A great blog post Amazon Web Services (EC2 & S3) - The Future of Data Centre Computing? Part 2 have shown how to install Terracotta in Amazon EC2 and make a scalable tomcat cluster, and in it’s comment I found something even more interesting:

More details on EHCache support are here - http://www.terracotta.org/confluence/display/integrations/EHCache

Also, OSCache has been clustered by us for a long time, but we have never had any effort to wrap it up into a Config Module ala EHCache. A forum user had success with OSCache (and describes how to get it working) here - http://forums.terracotta.org/forums/posts/list/197.page

Distributed OSCache is what I cried for years! It’s unbelievable that those guys are doing all of those nice stuff and completely open source!

I will spend some time on it and do some test by myself.  Wish this stuff will really work!!!

Popularity: 11% [?]

Global DNS Load Balancing, for FREE!

Saturday, June 30th, 2007

Last year, yes, over 1.5 years ago, the company I worked on faced on a serious problem/requirement of separating web traffic to different servers in different data centers. When we come to many network solution providers, or CDN (content dilivery network) service providers, we get unbelievable and unacceptable quotations, so I spend some time research this issue, finally we got a good enough solution with all open source software and data(Linux + PowerDNS + geo backend + geo IP data we collected), it’s exactly same as the solutions which cost many thousands dollars per month.

What’s Global DNS Load Balancing

Global load balancing, aka Geographic Load Balancing, or GSLB for short, is a DNS technology to reply DNS records based on the request’s IP geographic location. e.g. a US user visit www.google.com, he/she will visit a web server located in a data center in the United States, in the same time, a user in China type in the exactly same URL “www.google.com”, he/she will get served from a server located in Google’s China data center.

GSLB

Diagram credit: http://www.oes.co.th

It’s a very important and valuable technology for big web sites, Google, Yahoo, MSN… almost all those multinational web sites are using this technology.

How ?

A special DNS serve, or a module attached with such DNS server, return different answer to different request based on the request (generally another DNS server, which is your DNS server of your ISP) IP geo-location :

www.yourdomain.com —[CNMAE Record] —> geo.yourdomain.com — [ GSLB handdling, CNAME ]–> us.geo.yourdomain.com — [A record] –> 68.178.110.21

Of course you directly configure the steps simpler to: www.yourdomain.com — [GSLB, return A record] —> IP address, however most site use some CNAME records to make the configuration more flexible and easier to manage:

  • first CNMAE
    • Because you may have many different top level names, e.g. photo.yoursite.com, blog.yoursite.com, etc. , you can handle them all in geo.yoursite.com.
  • next CNAME
    • GSLB generally return another cname record, It is much more useful for configuration, because you don’t wish GSLB know too much about a bunch of IP addresses, it’s better to use names such as us.yoursite.com, jp.yoursite.com, etc.

    IP Address

    • Of course you can config multiple IPs for A records to enable DNS round robin, which is a simple load balancing for servers.

In short one important thing is

Here is how google’s look like, you can see if in your browser via this link:

DNS Lookup: www.google.com A record

Generated by www.DNSstuff.com at 07:23:25 GMT on 30 Jun 2007.

How I am searching:

Searching for www.google.com A record at d.root-servers.net [128.8.10.90]: Got referral to G.GTLD-SERVERS.NET. (zone: com.) [took 46 ms]
Searching for www.google.com A record at G.GTLD-SERVERS.NET. [192.42.93.30]: Got referral to ns1.google.com. (zone: google.com.) [took 52 ms]
Searching for www.google.com A record at ns1.google.com. [216.239.32.10]: Got CNAME of www.l.google.com. and referral to g.l.google.com. [took 73 ms]
Searching for www.l.google.com A record at h.root-servers.net [128.63.2.53]: Got referral to b.gtld-servers.net. (zone: com.) [took 44 ms]
Searching for www.l.google.com A record at b.gtld-servers.net. [192.33.14.30]: Got referral to ns1.google.com. (zone: google.com.) [took 162 ms]
Searching for www.l.google.com A record at ns1.google.com. [216.239.32.10]: Got referral to a.l.google.com. (zone: l.google.com.) [took 72 ms]
Searching for www.l.google.com A record at a.l.google.com. [209.85.139.9]: Reports www.l.google.com. [took 71 ms] Response:

Domain Type Class TTL Answer
www.l.google.com. A IN 300 209.85.135.147
www.l.google.com. A IN 300 209.85.135.99
www.l.google.com. A IN 300 209.85.135.104
www.l.google.com. A IN 300 209.85.135.103

NOTE: One or more CNAMEs were encountered. www.google.com is really www.l.google.com.

How much?

It’s really depends on who you ask! Most of network equipment providers, such as CISCO, F5 Networks can give you a nearly perfect solution, and sell you bunch of boxes which cost you over $10,000 or even over $100,000 ! If you ask for some CDN serivces providers, they may sell you a “Global DNS” services to you cost over $1,000 for each month.

But there are open source solutions which cost you almost FREE! Though there are several free and open source solutions (but too many of them), I think PowerDNS is the best choice, I used powerDNS and its geo backend to full fill our requirements.

Poor(smart) man’s Global DNS Load Balancing solutions, FREE!

There are a very specific wiki pages talking about how to implement GSLB with open source software and how to configure.

PowerDNS is free with full source code, and it’s really powerful with many advanced features, it support plugins (called ‘backend’) to extend it. Geo backend is one of those free banckends come together with PowerDNS. Unfortunately, very few document could be found about geo backend, fortunately there is a setup notes which is simple but almost explained everything in step and step.

You need to set the TTL of the CNAME records that geobackend will return for a reasonable (generally short) time, in my case I use 5 minutes.

Something important is how to build the IP->Geo location map, you can use rsync to grab a coutry geo data or you may need to build your own.

To grab the country data:

rsync -va rsync://countries-ns.mdc.dk/zone .

(updated: check here http://countries.nerd.dk/more.html for the zonefile rsync)

The config of powerDNS:

# This is the real guts of the data that drives this backend.  This is a DNS
# zone file for RBLDNSD, a nameserver specialised for running large DNS zones
# typical of DNSBLs and such.  We choose it for our data because it is easier
# to parse than the BIND-format one.
#
# Anyway, it comes from http://countries.nerd.dk/more.html - there are details
# there for how to rsync your own copy.  You'll want to do that regularly,
# every couple of days maybe.  We believe the nerd.dk guys take the netblock
# info from Regional Internet Registries (RIRs) like RIPE, ARIN, APNIC.  From
# that they build a big zonefile of IP/prefixlen -> ISO-country-code mappings.
geo-ip-map-zonefile=/usr/local/etc/zz.countries.nerd.dk.rbldnsd

Map country codes to your country name:

# Andorra
20 eu
# United Arab Emirates
784 uk
# Afghanistan
4 uk

You don’t have to replace your original DNS server to PowerDNS, you can just let PowerDNS handle the geo-location part, and keep all other DNS records in your favorite DNS server.

Here is a sample configuration to implement: www.yoursite.com –> geo.yoursite.com –> us.yoursite.com –> IP address.

Inside your original DNS configure:

www CNAME geo

pdns A 192.168.1.1 ; your server installed PowerDNS
geo NS pdns ; use PowerDNS to handle geo

And also inside your original DNS server configure, add A records for servers located in different places:

us A 68.178.100.12 ; server IP address in the United States, ‘us’ defined in your country code to name mapping.

cn A 202.102.24.100 ;server IP address in China…

It seemed not too many people talking about it or related matters. (I guess it’s because for most web site, people don’t need GSLB; for those guys who expereinced on GSLB, such as google, yahoo, they feel GSLB is such a simple thing they don’t want to waste time to explain such a “easy thing”; for some guys they may treat GSLB as a technical knowhow to make money, also not willing to talk too much)

Wikipedia is using PowerDNS, and with geobackend to do global load balancing. Here is from Wikipedia about PowerDNS:

As of early 2005, PowerDNS in combination with the bind and geo backends is used by Wikimedia to handle all DNS traffic. By using the geobackend, incoming clients can be redirected to the nearest Wikipedia server (based on their geographic location). This facility provides an effective way of load balancing and it reduces response times for the clients

Global DNS Load Balancing Limitations (0n HA)

Global load balancing is not a good solution for HA(high availability), there are many reasons, the DNS refresh time, web browser DNS cache, server down detect time lag, etc., here is an article explained the limitation very clearly and easy to understand: “Why DNS Based Global Server Load Balancing (GSLB) Doesn’t Work“. Don’t let its title fool you, I think what “doesn’t work” means HA doesn’t work, GSLB is still very useful for you to run a huge site across the world.

Useful links:

[1] Thoughts on Global Server Load Balancing Dave Walker from SUN talking about GSLB in general.

[2] DNS Balancing Very specific explain and config of free GSLB

[3] GeoDNS A simple but clear setup of PowerDNS+geobackend

Popularity: 24% [?]

What’s inside X-Forwarded-For and how to handle

Saturday, June 30th, 2007

X-Forwarded-For

In one of my project, I need to handle the http header part, in X-Forwarded-For header there contain IP addresses and some names such as “unknown”, what does this mean? Is it possible to contain a domain name in this list? Do we need to run a DNS lookup (which may low down the performance) or we just need to validate the IP address? I searched across the Internet and here some information I collected.

X-Forward-For was originally invented by Squid, and become a de fact standard for most of other proxies implementation.

From Wikipedia: http://en.wikipedia.org/wiki/X-Forwarded-For

The X-Forwarded-For (XFF) HTTP header is a de facto standard for identifying the originating IP address of a client connecting to a web server through an HTTP proxy. XFF headers are supported by most proxy servers, notably Squid, Apache mod_proxy, Blue Coat ProxySG, Cisco Cache Engine, and NetApp NetCache.

In this context, the caching servers are most often those of large ISPs who either encourage or force their users to use proxy server for access to the World Wide Web, something which is often done to reduce external bandwidth through caching. In some cases, these proxy servers are transparent proxies, and the user may be unaware that they are using them.

Without the use of XFF or another similar technique, any connection through the proxy would reveal only the originating IP address of the proxy server, effectively turning the proxy server into an anonymizing service, thus making the detection and prevention of abusive accesses significantly harder than if the originating IP address was available. The usefulness of XFF depends on the proxy server truthfully reporting the original host’s IP address; for this reason, effective use of XFF requires knowledge of which proxies are trustworthy, for instance by looking them up in a whitelist of servers whose maintainers can be trusted.

The general format of the header is:

X-Forwarded-For: client1, proxy1, proxy2

From SQUID’s FAQ: (http://www.comfsm.fm/computing/squid/FAQ.html#toc4.17)

When a proxy-cache is used, a server does not see the connection coming from the originating client. Many people like to implement access controls based on the client address. To accommodate these people, Squid adds its own request header called “X-Forwarded-For” which looks like this:

X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30

Entries are always IP addresses, or the word unknown if the address could not be determined or if it has been disabled with the forwarded_for configuration option.

We must note that access controls based on this header are extremely weak and simple to fake. Anyone may hand-enter a request with any IP address whatsoever. This is perhaps the reason why client IP addresses have been omitted from the HTTP/1.1 specification.

Conclusion

By read the source code of SQUID and some apache mod, I found the implementation of this header had never try to put any name in this header tag. It could put “unknown”, but this is surely not intend to be a name.

From Squid source code: http://squid.cvs.sourceforge.net/squid/squid/src/http.c?view=markup

1212 /* append X-Forwarded-For */

1213 if (opt_forwarded_for) {

1214 strFwd = httpHeaderGetList(hdr_in, HDR_X_FORWARDED_FOR);

1215 strListAdd(&strFwd,

1216 (((orig_request->client_addr.s_addr != no_addr.s_addr) && opt_forwarded_for) ?

1217 inet_ntoa(orig_request->client_addr) : ”unknown”), ’,’);

1218 httpHeaderPutStr(hdr_out, HDR_X_FORWARDED_FOR, strBuf(strFwd));

1219 stringClean(&strFwd);

1220 }

By searched and read a bunch of source codes which process this header (http://www.google.com/codesearch?hl=zh-CN&q=+x-forwarded-for&start=20&sa=N) , none of the code I read check the IP by name lookup.

So, we can simply change the code, remove the DNS lookup and just valid the IP address by itself. This will dramatically improve the speed of this function.

Popularity: 13% [?]

Cache control header and browser cache behaviours

Saturday, June 30th, 2007

Conclusion

By grab the packet and analysis, I discovered that:

  • When a broswer send a request to server while:
    • without local cache
    • brower cache disabled or cleaned up
    • user request for “deep refresh”
      The browser will send a HTTP request without any cache-control related headers (e.g. If-Modified-Since,
      If-None-Match, Cache-Control)
  • press “refresh” button means different for different browser
    • Opera will “deep refresh” without consider local cache
    • Firefox, IE will consider local cache unless user press “CTRL+F5″ to force “deep refresh”

Vey useful links for cache control and browser behaviors

Tests results for each browsers

  • use ethereal to grab the packet and analysis

Firefox 2.0.0.3:

#1: First request without cache

GET /7/692/809/d102c6510d49b9/sp.ask.com/sh/i/a10/p/logo_ask_x.png HTTP/1.1
Host: a692.g.akamai.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Accept: image/png,*/*;q=0.5
Accept-Language: zh-cn,zh;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: gb2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.0 200 OK
Server: Apache/2.0.55 (Unix)
Last-Modified: Mon, 11 Dec 2006 23:44:12 GMT
ETag: "501c-be1ceb00"
Accept-Ranges: bytes
Content-Length: 20508
Content-Type: image/png
Cache-Control: max-age=0
Expires: Tue, 22 May 2007 02:09:05 GMT
Date: Tue, 22 May 2007 02:09:05 GMT
Connection: keep-alive
.PNG.
...

#2: with cache

GET /7/692/809/d102c6510d49b9/sp.ask.com/sh/i/a10/p/logo_ask_x.png HTTP/1.1
Host: a692.g.akamai.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5Accept-Language: zh-cn,zh;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: gb2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
If-Modified-Since: Mon, 11 Dec 2006 23:44:12 GMT
If-None-Match: "501c-be1ceb00"
Cache-Control: max-age=0

HTTP/1.0 304 Not Modified
Date: Tue, 22 May 2007 02:07:17 GMT
Connection: keep-alive

#3: Press “refreh” button

same as #2

#4 press CTRL+F5 force a deep “refresh”

same as #1

#5: force no cache by disable cache

same as #1

H3.Opera 9.0

#1: empty cache request

GET /sh/i/a10/p/askx_logo_home.gif HTTP/1.1
User-Agent: Opera/9.00 (Windows NT 5.1; U; zh-cn)
Host: sp.ask.comAccept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1Accept-Language: en_US,en;q=0.9Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0Cookie: accepting=1; wz_uid=074159E941E886C853D1D5E429E169AE; ptid=3131|1; user=l=dir; tbe=1; ax=4; wz_sid=084E6B5F2B5B2688C3D1D5E429E16925; wz_scnt=8Cookie2: $Version=1
Connection: Keep-Alive, TETE: deflate, gzip, chunked, identity, trailers

HTTP/1.1 200 OK
Date: Tue, 22 May 2007 02:21:47 GMT
Server: Apache/2.0.55 (Unix)
Last-Modified: Mon, 18 Dec 2006 21:08:31 GMT
ETag: "fce-623c31c0"
Accept-Ranges: bytesContent-Length: 4046
Cache-Control: max-age=31536000
Expires: Wed, 21 May 2008 02:21:47 GMT
Connection: close
Content-Type: image/gif
GIF89a..{
.........

#2: request with cache

304, not modified.

#3: Press “refresh” button (opera always refresh when button pressed)

same as #1

#4: disable cache

same as #1

IE 7

#1: first request

GET /7/692/809/d102c6510d49b9/sp.ask.com/sh/i/a10/p/logo_ask_x.png HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, */*
Accept-Language: en-usUA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Host: a692.g.akamai.net
Connection: Keep-Alive

HTTP/1.0 200 OK
Server: Apache/2.0.55 (Unix)
Last-Modified: Mon, 11 Dec 2006 23:44:12 GMT
ETag: "501c-be1ceb00"Accept-Ranges: bytesContent-Length: 20508
Content-Type: image/png
Cache-Control: max-age=0Expires: Tue, 22 May 2007 02:28:39 GMT
Date: Tue, 22 May 2007 02:28:39 GMT
Connection: keep-alive
.PNG.
....

#2: refresh with cache

GET /7/692/809/d102c6510d49b9/sp.ask.com/sh/i/a10/p/logo_ask_x.png HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, */*
Accept-Language: en-usUA-CPU: x86
Accept-Encoding: gzip, deflate
If-Modified-Since: Mon, 11 Dec 2006 23:44:12 GMT; length=20508
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Host: a692.g.akamai.net
Connection: Keep-Alive

HTTP/1.0 304 Not Modified
Content-Type: image/png
Last-Modified: Mon, 11 Dec 2006 23:44:12 GMT
ETag: "501c-be1ceb00"
Cache-Control: max-age=0
Expires: Tue, 22 May 2007 02:29:36 GMT
Date: Tue, 22 May 2007 02:29:36 GMT
Connection: keep-alive

#3: press “refresh” button

same as #2

#4: press CTRL+F5 (force a clean fresh)

same as #1

Popularity: 58% [?]

Close
E-mail It
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.