Friday, July 17, 2009

Cuckoo: How a Country Censors Internet Information and Why Google Reader Wasn't Filtered in China

Here's how a country can censor internet information, and why one country didn't censor Google Reader.





Why Google Reader survived the 2nd June ban of GFW and what can you learn from it


GFW (Great Firewall of China) is the national firewall implemented in backbone Internet of China. It has four types of block:

* URL block: ie. if twitter.com is blocked, any HTTP request that has a twitter.com string appears between GET and rnrn is blocked. GFW will try RST to both you and the server many times.
* DNS pollution. All UDP port 53 are censored in China. Your domain will resolve to fake IPs if polluted. Currently known fake IPs include:

202.106.1.2, 211.95.129.161, 211.94.66.147, 220.250.64.23, 216.234.179.13, 4.36.66.178

* IP ban.
* content filter. All unencrypted TCP/UDP data are monitored. You might trigger a shit load of keywords during your daily surfing without knowing anything. Like URL block, GFW will try RST to both you and the server many times if your keywords matches a certain level and quantity. And your IP & client info will be logged for further investigation if needed.

During the great ban of 2009-06-02, *.live.com, bing.com, twitter.com, flickr.com, hotmail.com, along with previous banned youtube,com and *.blogspot.com, are no longer accessible within China directly. But Google Reader, one of the main anti-☭ propaganda source, survived. Why?

Let's look at one of the typical Google Reader HTTP request:

https://www.google.com/reader/api/0/stream/contents/user%2F1338082....
| | |
| +-----+------+
| |
| +--- DNS pollution and IP ban are unlikely,
| unless GFW totally bans all www.google.com
|
+--- https, means data transfer between your IP
and www.google.com IP (64.233.189.99) are encrypted,
thus URL block and content filter are useless,
except GFW implements some sort of MITM attack which is too costly

So what can we learn from it?

* The bigger your website are, the sooner your website get unblocked. If your little known sites get blocked, who cares?
* Don't use HTTP GET since it's very easily URL-blocked. This also helps reduce XSS & XSRF
* Make your URL jumping httpS compatible. Even static files.
* Use RFC 2068 compatible CDNs like squid. So you can route all your other nodes' traffic to a node where your clients could access(ie. youtube could be accessed within China if your proxy all your www.youtube.com/* requests to www.google.cn IP). But you have to make sure your CDN can deliver content accross GFW.
* Use as few as subdomains as possible. This reduce DNS pollution damage if you just have dozens to recover instead of thousands. If Google Reader's URL is something like reader.google.com, it's very possible to get banned years ago.


initiative.yo2.cn/archives/640553

No comments: