Movable Type System Architectures

Movable Type is renown in operations circles for a number of its "abilities," mainly its scalability, availability and reliability. The larger your site gets however, and the more readers you have creating content for you in the form of comments and posts1, the more complex your network architecture may need to become. For many the process of architecting a scalable and high performant Movable Type system can be a daunting task, largely because the process is largely undocumented.

The truth is that there is no one, canonical way to design your Movable Type system or any system for that matter, which is most likely one of the primary reasons contributing to the lack of documentation. So let's approach this challenge another way. Let's start with a basic footprint designed for large sites powered by Movable Type, and then let the architects add or remove pieces as needed and according to their unique operational requirements and cost constraints.

Below is just such a network, one that serves as the basis for any large scale Movable Type site I typically design:

Basic MT Network Architectures

  • Front-end Web Servers - these servers serve static files only. If all else fails in your network these will continue to serve content, not to mention your ads, which is the life blood of any large media site.

  • NFS Server - invest in a single, and very reliable, network storage device, one that your web servers read from, and your publishing daemons write to.

  • Database Server - have one or two dedicated database servers depending upon whether you want one available for redundancy purposes. These should be beefy machines with a lot of CPU power and a lot of memory. There is a whole other chapter in fine tuning your database, and for that I highly recommend consulting an expert. In my experience however, text doesn't take up a lot of memory, and with a large enough cache configured for your database, you can practically load up your entire database into memory, resulting in dramatic speed improvements.

  • Comment Servers - these web servers handle all the write requests from your community and readers including favoriting, commenting, and the like. These can be broken off from your front-end web servers so that they can be scaled independently from the rest. The diagram doesn't show this, but you may consider having these connected to the NFS server as well and have them handle publishing of your permalink pages synchronously with each comment received. This ensures that when a reader returns to the entry they commented on they will see the comment they just left.

  • Admin Web Servers - these servers are what your editors access. It is given a dedicated machine so that if you site is under high load you can rest assured that authors and editors can still login and be productive administering the site.

  • Publishing Machines - these servers are work horses. They handle much of the publishing and virtually all of the non-critical, non-blocking processes on your system, like Action Streams aggregation and most publishing. One simple way to approach this little cluster is with a lot of small cheap machines, or virtual machines that you can easily spin up when your site is under serious load.

Anyways, that's the basic gist. I would love to learn how others have architected their Movable Type clusters. Let me know in the comments if you don't mind sharing.

1 - something one is likely to see a lot more of when Motion is released.

7 Comments

Byrne, this diagram makes perfect sense. As someone who manages a relatively high-volume set of MT blogs, I'm trying to move to a multi-box architecture.

The part that I'd love to learn more about is how you get the boxen talking to each other. What configuration changes are made in order to get the publishing servers to push their files to the right place? How do you handle pointing users to the comment servers when it's time to handle a comment (subdomains?)

I'd love some pointers in the right direction here -- and am willing to pitch in and document my progress, of course!

John, thanks for your inquiry. That is a common question and something I should probably add to the operations manual. In general there are two ways to configure Movable Type and the Publish Queue for large scale deployments. They are:

  • Linked together via NFS
  • Replication via RSync

I am not sure what solution is right for you however. So let me try to explain how each works.

NFS

In using the NFS solution all of your publishing servers (or Publish Queue workers) write files to an external NFS mount. In so doing these files never actually physically reside on the publishing server, they only appear to be local thanks to NFS which helps different servers share the same set of files between them.

The front end web server then mount this shared NFS directory for reading. Simple enough, right?

Pros:

  • Scales better because each file is written once and immediately made visible on the front end web server.
  • Easier to setup IMHO.

Cons:

  • Single point of failure. If something were go wrong with your shared filesystem, then much of your system will be hosted. This can be mitigated with a solid RAID config or other highly reliable disk storage.

RSync

When using rsync, Movable Type will invoke a command line utility designed for keeping two different file systems in sync with one another. This is what happens when Movable Type is configued to use RSync:

  1. User leaves a comment.
  2. Job is created in Publish Queue.
  3. Worker pulls job off queue and publishes file to local file system.
  4. Worker then begins to rsync (usually via scp) to each of the designated servers.

Pros:

  • Failure tolerance - by replicating your published content you ensure that if one file system or server goes bad, you still have something to fall back on.

Cons:

  • Slightly harder to setup IMHO.
  • Scalability - the more front end web servers you have the more servers you will need to synchronize with. This can add latency to your publishing process and cause some servers for a brief period of time to have slightly different content from one another.

I must have a misunderstanding here, but I expect I am not alone. I was trying to figure out how Schwartz is being used for RSync as well as queued publishing.

Here's my [mis-]understanding of how RSync works in a scaled-out set up (using server type names from the "Advanced Configuration" http://www.movabletype.org/documentation/enterprise/system-architecture.html and your own diagram):

  1. Content may be immediately published on an "App Server" (called "Admin Web Server" in your diagram for content) or "Comment Server" (for new comments) or it can be queued to be published asynchronously by a "Publisher" (called "Publishing Machine(s)" in your diagram).
  2. Publish job requests are queued in Schwartz and fielded by a "Publisher", when run-periodic-tasks gives it a bite of the cherry.
  3. When a page has been published by an "App Server", "Comment Server" or "Publisher" another job request is put into Schwartz to RSync the file.
  4. A "Publisher" picks up the RSync job request from Schwartz. It expects to find the file on its local file system and sends it off to SyncTargets. The SyncTargets would be servers known as "Page Servers" (called "Front-end Web Server" in your diagram).

But I must have something wrong in my understanding. Step 4 would only work if the published content was local to the Publisher that picked up the RSync job request and that is not guaranteed.

In your reply to John Young above, it sounds like the RSync is part of the pubish job itself, which would work fine because the file would be local. However, that doesn't appear to be the case when I skim through the code. It appears to be a separate job, which could therefore be fielded by a different server than the one that published the content locally to itself.

This is an excellent and very relevant observation. A farm of PQ workers almost certainly should share a file system via NFS. Doing so will eliminate the potential for this problem.

That being said, one Schwartz job is capable of spawning an additional job immediately I believe, which maybe exactly what is happening here. So while there are two jobs, MT will ensure that the jobs are executed by the same worker. That certainly is inline with what I have seen: that files are published and then immediately transferred.

There remains the thorny issue of pages, which are immediately published by "App Server" or "Comment Server", when it is set up for static publishing. These will need to be distributed to the "Page Servers", which means that they too would need to be published to an NFS, shared with the PQ "Publishers" for rsync to the "Page Servers" to work.

I guess NFS is mandatory for all of the servers except for the "Page Servers" as things stand.

Either that or a different approach should be invoked for the rsync in MT::Worker::Sync (i.e. some Perl mods required), where the file is first pulled from the server that published the page, when applicable. Is there data in the queue to indicate which server published the page so it could be fetched from the appropriate server?

With my other non-Movable Type web server I segment the images from the main page server. i.e. images.domain.com and www.domain.com

that reduces the hard drive / CPU load on the www.domain.com page server.

I'm wondering if there is a way to do that with Movable Type while still retaining the awesome MT Asset Management tool. I don't think you can have assets (images, videos, files) on domain A and then have web pages served from a different file server B. The MT app interface would have to somehow push images to server A and pages to server B. Don't think you can do that.

correct me if I'm wrong.

Meant to say this: I don't think you can have assets (images, videos, files) on domain A and then have web pages served from a different domain B. The MT app interface would have to somehow push images to domain A and pages to domain B (each on a different server).

AFAIK the MT app must push pages and images to the SAME domain.

Leave a comment

what will you say?


Recent Comments

  • Meant to say this: I don't think you can have assets (images, videos, files) on domain A and then have web pages served from a different domain B. The MT app interface would have to somehow push images to domain A and pa...

  • With my other non-Movable Type web server I segment the images from the main page server. i.e. images.domain.com and www.domain.com that reduces the hard drive / CPU load on the www.domain.com page server. I'm wonderi...

  • There remains the thorny issue of pages, which are immediately published by "App Server" or "Comment Server", when it is set up for static publishing. These will need to be distributed to the "Page Servers", which means ...

  • This is an excellent and very relevant observation. A farm of PQ workers almost certainly should share a file system via NFS. Doing so will eliminate the potential for this problem. That being said, one Schwartz job is...

  • I must have a misunderstanding here, but I expect I am not alone. I was trying to figure out how Schwartz is being used for RSync as well as queued publishing. Here's my [mis-]understanding of how RSync works in a scale...

Close