Symkat

Bloggity Blog

Introducing SymPullCDN

Posted in Code

Articles for Code

Last week we discussed speeding up wordpress by using a cookieless domain to host all the static files that wordpress uses. Since that time SymPullCDN was written and released to the public for free on github. SymPullCDN leverages the free bandwidth, storage and power of Google AppEngine to create a trivial to implement reverse proxy.

What Is Google AppEngine?

From the Google AppEngine website: Google App Engine enables you to build and host web apps on the same systems that power Google applications.

Google gives this away for free, provided you stay within their free limits. To go beyond the free limits you'll need to pay for the service. For the needs of most people, you'll never have to pay for Google AppEngine.

The relevant limits for using SymPullCDN at the time of this writing are:

Max Quantity Resource How SymPullCDN Uses It
1 Gigabyte per day Bandwidth In Incoming bandwidth is used by SymPullCDN contacting your origin, and for request headers sent by a browser.
1 Gigabyte per day Bandwidth Out Outgoing bandwidth is used for sending cached entities or passing an entity without caching to the browser.
1 Gigabyte Total Database Storage The maximum size of the database that holds cached entities.
1 Megabyte per entity Cached File Size The maximum size of a file that may be sent cached or passed to a browser.

While a website that's getting in the hundreds of thousands of page-views per day may not be able to live within Google's free limits, most blogs on the Internet and even most websites easily can.

SymKat has been using SymPullCDN to serve the static files that this site uses for the last 5 days. We have offloaded 36 static files and saved approximately 300 megabytes in bandwidth for those files.

How Fast Is It?

In testing through Gomez Networks and ApacheBench throughout the United States on a dedicated file that was 512 kbytes, the average request time was 0.80 seconds, with faster response times on the East Coast.

Google Application Engine is not a traditional CDN in the same vein that, for instance, EdgeCast is. EdgeCast's network is built on the idea that downloading an entity should be served from as close a geographical location as possible and uses a handful of technologies to accomplish this. If you're in New York, chances are you're downloading the file from New York. If you're in Washington, chances are you're downloading the file from Washington. However, the awesomeness of such technologies and speed come at a price far, far above Google's free.

Google seems to make an effort to serve files from a close area, based on testing we've come to the conclusion that routing is likely optimized on a country basis as opposed to a state/providence basis.

How does SymPullCDN Work?

SymPullCDN acts as a reverse proxy and caches the content it pulls from an origin. The origin is where the content is originally hosted, your own website. SymPullCDN exists on a completely different domain name. When a request is made against SymPullCDN for a file, it tries to download it from your website - the origin. Once it has downloaded the file it is saved in a database and sends the file to whoever requested it.

All subsequent downloads of that file are served from SymPullCDN itself, without utilizing your origin further.

How Do I Install It?

SymPullCDN runs on Google AppEngine. You can sign up for an account by following the directions at http://appengine.google.com/.

Once you have an account created and are logged in click Create An Application. Choose a unique name for the Application Identifier and remember it. Choose SymPullCDN - [Your Origin] as the application title, replacing [Your Origin] with the domain that will serve as the origin.

Once you have created the application instance, download the Google Application Engine SDK for Python for your Operating System from http://code.google.com/appengine/downloads.html.

After it is installed, configure an application with the application identifier that you used previously.

You can download the latest version of SymPullCDN from http://github.com/symkat/SymPullCDN

You will have to modify two files: main.py and app.yaml.

Open main.py in your favorite text editor and find this line:

Origin = “http://replace*me/” and change it to the domain name you would like to use as your origin.

Save main.py and open app.yaml and find the following line:

Application: *replace*me*

Replace this with the unique application identifier you've configured your Google Application Engine app to run on.

Save the file, and upload the application to Google. Once it's done, it should automatically be mirroring and caching your content.

Is It Stable?

Not by any stretch of the imagination. It's stable only if you don't give it something it doesn't expect. Your car is stable, as long as you don't drive it into a wall. SymPullCDN still needs copious amounts of rubber padding added to it.

This is a proof of concept for an idea that occurred to me last Thursday. It's been written following the assumption that everything is okay and wonderful. As such, there is virtually no error handling outside of the HTTP protocol handling. We have not run into any problems with it running into errors here at SymKat, however there are a number of theoretical conditions that SymPullCDN can run into which are not handled correctly. We will be working on implementing more error catching over the next week or two. The best place to find out what had changed is at http://github.com/symkat/SymPullCDN/blob/master/ChangeLog

Why release it if it has bugs? Why write about it if it's not done?

Typically SymKat does lots of planning, writing of specifications, and proof of concepts of various aspects of a project and lots of other Very Good™ things before working on a program. SymPullCDN went from idea to implementation without so much as opening TaskPaper, so it felt fitting to release it with the same haste. Writing a little blurb about it for the Wednesday article here made sense.

Bug reports that are reproducible are welcome, as well as any suggestions.

blog comments powered by Disqus