I recently added the following chapter to Movable Type's Open Source Operations Manual and wanted to publish here for review by the community and feedback.
About Publish Queue
The Movable Type Publish Queue is an essential component to any large scale Movable Type powered web site because it plays a crucial role in publishing performance optimization. There are a number of benefits to using the publish queue, they are:
- It eliminates redundant, duplicated and unnecessary publication of files.
- It offloads publishing to stand alone process which can be throttled and scaled independently from the Movable Type web application itself.
- It speeds up the commenting experience by reducing the number of files that an end user must wait to be published prior to being able to navigate the web site again.
How it Works
It might be best to describe how the publish queue works by examining a scenario in which it would be utilized: republishing the necessary files in response to a comment.
Adding Jobs to the Queue
When a comment comes in to Movable Type multiple files are often in need of being updated, not only because the comment needs to be published to the entry's permalink page, but also because multiple other pages which display a comment count associated with the comment's entry may need to be updated.
Each of those pages (assuming they are configured to be published via the publish queue) will then be added to the "publish queue." When this happens, a publishing "job" is created and added to the database for each page that need to be published. There is one row in the database for each individual job in the system.
Now let's assume for a moment that shortly after receiving the first comment, a second one is published by a different visitor to your web site. This action also results in pages needing to be republished. However this time, before those pages are added to the queue as jobs the system checks to see if a job corresponding to each page is already on the queue. If there is, then the job is discarded because its work would be unnecessarily duplicated otherwise. If the job is not already on the queue, then it is added. This ensures that no unnecessary work is performed by the system.
In addition, each page that is added to the publish queue is given a priority which dictates the order in which the corresponding job will be processed. The higher the priority, the sooner the system will work on the job. Movable Type assigns priority based upon the following criteria:
|Preferred Page and Entry archives||10|
|Index templates with a filename beginning with "index" or "default"||9|
|Feed index templates||9|
|All other index templates||8|
|Non-preferred Page and Entry archives||5|
|Any Category archive||1|
|Any Author archive||1|
And that is how jobs are added to the queue. There is a separate process that exists that is then responsible for publishing.
Creating Publish Queue Workers
One or more publish queue "workers" can be created to process jobs on the queue. The number of workers needed by a system is based largely upon two variables:
- The capacity of any one worker to process jobs on the queue.
- The volume of jobs being added to the queue over time.
A worker is created by running the "run-periodic-tasks" script that comes with every copy of Movable Type. This script can be run in three modes:
daemon mode - in this mode the script never quits; instead it constantly monitors the job queue for work to be done and nearly the instance a job is made available for work, the script will begin work on it.
run-once - in this mode the script is run via the command line and will quit only after there is no more work on the queue to be done.
scheduled task - in this mode the script is executed in the "run-once" mode periodically according to a schedule defined by cron or a similar service.
Processing Jobs on the Queue
Each worker will monitor the queue for jobs. When one becomes available it is pulled off the queue to be worked on. Once it is "off the queue" no other workers can claim it. This makes sure that no two workers are trying to work on the same job at the same time.
In the event that something goes wrong during the publishing process and the file is not published, then the system will notice saying something skin to, "uh-oh, look at this job that was claimed on the queue, but was never successfully finished," and then free up the job for a worker to pick up and try again on. If the task is retried more than 5 times, then the job is marked as failed and left on the queue. In this state it is possible for a similar job to be placed on the queue, and if the problem that was resulting in the published failure is not transient, then that job is likely to fail again.
An important thing to note is that if a job is pulled off the queue by a worker to be worked on, then it remains possible at that point in time for that same page to be added to the queue again in response to the receipt of another comment. The rational being that by the time the page is finished being rebuilt it is most likely out of date, and so needs to be published again.
What Powers It?
The Publish Queue is powered by a stand alone job/queue management library called "The Schwartz." The Schwartz is actually a more generic and abstract job management system capable of processing any number of tasks via a similar queuing mechanism. For the time being, Movable Type only utilizes the Schwartz for publishing, but in the future may use this framework for sending emails or other non-critical system tasks.
Publish Queue Tools
There is one tool in particular that is recommended for most systems that utilize the Publish Queue, aptly named the Publish Queue Manager.
This tool provides a user interface within Movable Type that allows administrators to monitor and inspect jobs on the queue. Each job can be deleted, or have its priority changed.
For more information, visit the plugin's web site at the following URL:
To learn more about the Publish Queue, consider reading the following resources:
- Using the Publish Queue
- Setting up run-periodic-tasks
- Scalable Publishing Models in Movable Type
- The Schwartz Homepage
By the way, a new version of the Movable Type Operations Manual is now available.