Movable Type System Architectures

Posted by Byrne, 15 Jan 2009

Movable Type is renown in operations circles for a number of its "abilities," mainly its scalability, availability and reliability. The larger your site gets however, and the more readers you have creating content for you in the form of comments and posts1, the more complex your network architecture may need to become. For many the process of architecting a scalable and high performant Movable Type system can be a daunting task, largely because the process is largely undocumented.

The truth is that there is no one, canonical way to design your Movable Type system or any system for that matter, which is most likely one of the primary reasons contributing to the lack of documentation. So let's approach this challenge another way. Let's start with a basic footprint designed for large sites powered by Movable Type, and then let the architects add or remove pieces as needed and according to their unique operational requirements and cost constraints.

Below is just such a network, one that serves as the basis for any large scale Movable Type site I typically design:

Basic MT Network Architectures

Anyways, that's the basic gist. I would love to learn how others have architected their Movable Type clusters. Let me know in the comments if you don't mind sharing.

1 - something one is likely to see a lot more of when Motion is released.

Filed in ,

7 Comments

Byrne, this diagram makes perfect sense. As someone who manages a relatively high-volume set of MT blogs, I'm trying to move to a multi-box architecture.

The part that I'd love to learn more about is how you get the boxen talking to each other. What configuration changes are made in order to get the publishing servers to push their files to the right place? How do you handle pointing users to the comment servers when it's time to handle a comment (subdomains?)

I'd love some pointers in the right direction here -- and am willing to pitch in and document my progress, of course!

John, thanks for your inquiry. That is a common question and something I should probably add to the operations manual. In general there are two ways to configure Movable Type and the Publish Queue for large scale deployments. They are:

  • Linked together via NFS
  • Replication via RSync

I am not sure what solution is right for you however. So let me try to explain how each works.

NFS

In using the NFS solution all of your publishing servers (or Publish Queue workers) write files to an external NFS mount. In so doing these files never actually physically reside on the publishing server, they only appear to be local thanks to NFS which helps different servers share the same set of files between them.

The front end web server then mount this shared NFS directory for reading. Simple enough, right?

Pros:

  • Scales better because each file is written once and immediately made visible on the front end web server.
  • Easier to setup IMHO.

Cons:

  • Single point of failure. If something were go wrong with your shared filesystem, then much of your system will be hosted. This can be mitigated with a solid RAID config or other highly reliable disk storage.

RSync

When using rsync, Movable Type will invoke a command line utility designed for keeping two different file systems in sync with one another. This is what happens when Movable Type is configued to use RSync:

  1. User leaves a comment.
  2. Job is created in Publish Queue.
  3. Worker pulls job off queue and publishes file to local file system.
  4. Worker then begins to rsync (usually via scp) to each of the designated servers.

Pros:

  • Failure tolerance - by replicating your published content you ensure that if one file system or server goes bad, you still have something to fall back on.

Cons:

  • Slightly harder to setup IMHO.
  • Scalability - the more front end web servers you have the more servers you will need to synchronize with. This can add latency to your publishing process and cause some servers for a brief period of time to have slightly different content from one another.

I must have a misunderstanding here, but I expect I am not alone. I was trying to figure out how Schwartz is being used for RSync as well as queued publishing.

Here's my [mis-]understanding of how RSync works in a scaled-out set up (using server type names from the "Advanced Configuration" http://www.movabletype.org/documentation/enterprise/system-architecture.html and your own diagram):

  1. Content may be immediately published on an "App Server" (called "Admin Web Server" in your diagram for content) or "Comment Server" (for new comments) or it can be queued to be published asynchronously by a "Publisher" (called "Publishing Machine(s)" in your diagram).
  2. Publish job requests are queued in Schwartz and fielded by a "Publisher", when run-periodic-tasks gives it a bite of the cherry.
  3. When a page has been published by an "App Server", "Comment Server" or "Publisher" another job request is put into Schwartz to RSync the file.
  4. A "Publisher" picks up the RSync job request from Schwartz. It expects to find the file on its local file system and sends it off to SyncTargets. The SyncTargets would be servers known as "Page Servers" (called "Front-end Web Server" in your diagram).

But I must have something wrong in my understanding. Step 4 would only work if the published content was local to the Publisher that picked up the RSync job request and that is not guaranteed.

In your reply to John Young above, it sounds like the RSync is part of the pubish job itself, which would work fine because the file would be local. However, that doesn't appear to be the case when I skim through the code. It appears to be a separate job, which could therefore be fielded by a different server than the one that published the content locally to itself.

This is an excellent and very relevant observation. A farm of PQ workers almost certainly should share a file system via NFS. Doing so will eliminate the potential for this problem.

That being said, one Schwartz job is capable of spawning an additional job immediately I believe, which maybe exactly what is happening here. So while there are two jobs, MT will ensure that the jobs are executed by the same worker. That certainly is inline with what I have seen: that files are published and then immediately transferred.

There remains the thorny issue of pages, which are immediately published by "App Server" or "Comment Server", when it is set up for static publishing. These will need to be distributed to the "Page Servers", which means that they too would need to be published to an NFS, shared with the PQ "Publishers" for rsync to the "Page Servers" to work.

I guess NFS is mandatory for all of the servers except for the "Page Servers" as things stand.

Either that or a different approach should be invoked for the rsync in MT::Worker::Sync (i.e. some Perl mods required), where the file is first pulled from the server that published the page, when applicable. Is there data in the queue to indicate which server published the page so it could be fetched from the appropriate server?

With my other non-Movable Type web server I segment the images from the main page server. i.e. images.domain.com and www.domain.com

that reduces the hard drive / CPU load on the www.domain.com page server.

I'm wondering if there is a way to do that with Movable Type while still retaining the awesome MT Asset Management tool. I don't think you can have assets (images, videos, files) on domain A and then have web pages served from a different file server B. The MT app interface would have to somehow push images to server A and pages to server B. Don't think you can do that.

correct me if I'm wrong.

Meant to say this: I don't think you can have assets (images, videos, files) on domain A and then have web pages served from a different domain B. The MT app interface would have to somehow push images to domain A and pages to domain B (each on a different server).

AFAIK the MT app must push pages and images to the SAME domain.