Movable Type is renown in operations circles for a number of its "abilities," mainly its scalability, availability and reliability. The larger your site gets however, and the more readers you have creating content for you in the form of comments and posts1, the more complex your network architecture may need to become. For many the process of architecting a scalable and high performant Movable Type system can be a daunting task, largely because the process is largely undocumented.
The truth is that there is no one, canonical way to design your Movable Type system or any system for that matter, which is most likely one of the primary reasons contributing to the lack of documentation. So let's approach this challenge another way. Let's start with a basic footprint designed for large sites powered by Movable Type, and then let the architects add or remove pieces as needed and according to their unique operational requirements and cost constraints.
Below is just such a network, one that serves as the basis for any large scale Movable Type site I typically design:
Front-end Web Servers - these servers serve static files only. If all else fails in your network these will continue to serve content, not to mention your ads, which is the life blood of any large media site.
NFS Server - invest in a single, and very reliable, network storage device, one that your web servers read from, and your publishing daemons write to.
Database Server - have one or two dedicated database servers depending upon whether you want one available for redundancy purposes. These should be beefy machines with a lot of CPU power and a lot of memory. There is a whole other chapter in fine tuning your database, and for that I highly recommend consulting an expert. In my experience however, text doesn't take up a lot of memory, and with a large enough cache configured for your database, you can practically load up your entire database into memory, resulting in dramatic speed improvements.
Comment Servers - these web servers handle all the write requests from your community and readers including favoriting, commenting, and the like. These can be broken off from your front-end web servers so that they can be scaled independently from the rest. The diagram doesn't show this, but you may consider having these connected to the NFS server as well and have them handle publishing of your permalink pages synchronously with each comment received. This ensures that when a reader returns to the entry they commented on they will see the comment they just left.
Admin Web Servers - these servers are what your editors access. It is given a dedicated machine so that if you site is under high load you can rest assured that authors and editors can still login and be productive administering the site.
Publishing Machines - these servers are work horses. They handle much of the publishing and virtually all of the non-critical, non-blocking processes on your system, like Action Streams aggregation and most publishing. One simple way to approach this little cluster is with a lot of small cheap machines, or virtual machines that you can easily spin up when your site is under serious load.
Anyways, that's the basic gist. I would love to learn how others have architected their Movable Type clusters. Let me know in the comments if you don't mind sharing.
1 - something one is likely to see a lot more of when Motion is released.







Byrne, this diagram makes perfect sense. As someone who manages a relatively high-volume set of MT blogs, I'm trying to move to a multi-box architecture.
The part that I'd love to learn more about is how you get the boxen talking to each other. What configuration changes are made in order to get the publishing servers to push their files to the right place? How do you handle pointing users to the comment servers when it's time to handle a comment (subdomains?)
I'd love some pointers in the right direction here -- and am willing to pitch in and document my progress, of course!
John, thanks for your inquiry. That is a common question and something I should probably add to the operations manual. In general there are two ways to configure Movable Type and the Publish Queue for large scale deployments. They are:
I am not sure what solution is right for you however. So let me try to explain how each works.
NFS
In using the NFS solution all of your publishing servers (or Publish Queue workers) write files to an external NFS mount. In so doing these files never actually physically reside on the publishing server, they only appear to be local thanks to NFS which helps different servers share the same set of files between them.
The front end web server then mount this shared NFS directory for reading. Simple enough, right?
Pros:
Cons:
RSync
When using rsync, Movable Type will invoke a command line utility designed for keeping two different file systems in sync with one another. This is what happens when Movable Type is configued to use RSync:
Pros:
Cons:
I must have a misunderstanding here, but I expect I am not alone. I was trying to figure out how Schwartz is being used for RSync as well as queued publishing.
Here's my [mis-]understanding of how RSync works in a scaled-out set up (using server type names from the "Advanced Configuration" http://www.movabletype.org/documentation/enterprise/system-architecture.html and your own diagram):
But I must have something wrong in my understanding. Step 4 would only work if the published content was local to the Publisher that picked up the RSync job request and that is not guaranteed.
In your reply to John Young above, it sounds like the RSync is part of the pubish job itself, which would work fine because the file would be local. However, that doesn't appear to be the case when I skim through the code. It appears to be a separate job, which could therefore be fielded by a different server than the one that published the content locally to itself.
This is an excellent and very relevant observation. A farm of PQ workers almost certainly should share a file system via NFS. Doing so will eliminate the potential for this problem.
That being said, one Schwartz job is capable of spawning an additional job immediately I believe, which maybe exactly what is happening here. So while there are two jobs, MT will ensure that the jobs are executed by the same worker. That certainly is inline with what I have seen: that files are published and then immediately transferred.
There remains the thorny issue of pages, which are immediately published by "App Server" or "Comment Server", when it is set up for static publishing. These will need to be distributed to the "Page Servers", which means that they too would need to be published to an NFS, shared with the PQ "Publishers" for rsync to the "Page Servers" to work.
I guess NFS is mandatory for all of the servers except for the "Page Servers" as things stand.
Either that or a different approach should be invoked for the rsync in MT::Worker::Sync (i.e. some Perl mods required), where the file is first pulled from the server that published the page, when applicable. Is there data in the queue to indicate which server published the page so it could be fetched from the appropriate server?
With my other non-Movable Type web server I segment the images from the main page server. i.e. images.domain.com and www.domain.com
that reduces the hard drive / CPU load on the www.domain.com page server.
I'm wondering if there is a way to do that with Movable Type while still retaining the awesome MT Asset Management tool. I don't think you can have assets (images, videos, files) on domain A and then have web pages served from a different file server B. The MT app interface would have to somehow push images to server A and pages to server B. Don't think you can do that.
correct me if I'm wrong.
Meant to say this: I don't think you can have assets (images, videos, files) on domain A and then have web pages served from a different domain B. The MT app interface would have to somehow push images to domain A and pages to domain B (each on a different server).
AFAIK the MT app must push pages and images to the SAME domain.