Clean Sweep Plugin for Movable Type

Clean Sweep is a plugin that assists administrators in finding and fixing broken inbound links to their website. It was build to support two use cases:

  • to help users get a clean start with their blog by allowing them to completely restructure their permalink URL structure and have a system that can automatically adapt by redirecting stale and inbound links to the proper destination

  • to help users in the process of migrating to Movable Type who are forced to modify their web site's URL and permalink structure

Both of these use cases have to do with preserving a site's page rank in light of a major redesign.

Features and Benefits

  • Manage redirects in your blog using an easy to use user interface.
  • Help maintain good SEO and page rank by keeping links fresh.
  • No need to hack Apache configuration files until you are sure your redirects are correct.

Screenshots

Dashboard Widget

Clean Sweep Dashboard Widget

Create a Redirect

Clean Sweep Map 404 to Destination

View a List of All 404s

Clean Sweep 404 Listing

View a List of Recommended Rewrite Rules

apache-rewrite.png

Download

Installation Instructions

  1. Unpack the Clean Sweep archive.
  2. Copy the contents of CleanSweep-1.1/plugins to: /path/to/mt/plugins/ and copy the contents of CleanSweep-1.1/mt-static to: /path/to/your/mt-static/.
  3. Create a page in Movable Type called "URL Not Found". Give it a basename of "404". Place whatever personalized message you want that will be displayed to your visitors when Clean Sweep is unsuccessful in mapping the request to the correct page or destination.
  4. Publish the page and remember the complete URL to this page on your published blog.
  5. Navigate to the Plugin Settings area for Clean Sweep.
  6. Enter in the full URL to your "URL Not Found" page you created in step #3. Copy that URL into the "404 URL" configuration parameter for Clean Sweep.
  7. In your plugin settings area for Clean Sweep, make note of the Apache configuration directive that Clean Sweep asks that you place in your httpd.conf or in an .htaccess file.
  8. Add the Apache configuration directive to your web server. This may be placed in your httpd.conf file or in an .htaccess file located in the DocumentRoot for your blog.
  9. Restart Apache

How it Works

Once properly configured, Clean Sweep will track all inbound links that result in a 404. Administrators can monitor the list of 404s on their web site through a dedicated listing screen in the application found under the Manage > 404s menu, or through a convenient dashboard widget.

If Clean Sweep can determine what file to serve in place of a request that resulted in a 404 it will. If all else fails the plugin will serve a custom 404 page you design.

Clean Sweep will also provide you with a list of Apache mod_rewrite rules that you can add to your web server's configuration settings to permanently redirect users to the proper resource, thereby bypassing the Clean Sweep plugin from that point forward for those specific set of links.

The Redirection Decision Making Process

The following is how Clean Sweep determines what files to serve in place of a requested file that could not be found on the file system:

  1. Check to see if a redirect has been setup by a user for the specific file being requested. If one exists, redirect the client to that file.

  2. Is the target resource using the entry id as a URL This is a prevalent URL pattern for older MT installations. This will:

    Map: http://www.majordojo.com/archives/000675.php To: http://www.majordojo.com/2005/07/goodbye-bookque.php

  3. Is the target resource using underscore when it should be using hyphens? Many users have switched to using hyphens for purported SEO benefits. This will attempt to look for a file in the system of the same name, but using '-' instead of '_'. This will:

    Map: http://www.majordojo.com/2005/07/goodbye_bookque.php To: http://www.majordojo.com/2005/07/goodbye-bookque.php

  4. Is their a target resource with the same basename somewhere? If a user switches their primary mapping to use a date based URL as opposed to a category based URL, then this rule will apply. This will:

    Map: http://www.majordojo.com/personal-projects/goodbye-bookque.php To: http://www.majordojo.com/2005/07/goodbye-bookque.php

  5. Let me know and I will add it!

  6. If all else fails, serve up the users configured custom 404 URL.

Reporting Bugs

During the length of the beta please use the comment form at the bottom of the page to report any bugs with Clean Sweep.

License

Clean Sweep is licensed under the GPL (v2).

Copyright

Donated to the Movable Type Open Source Project. Copyright 2007-2008 Six Apart Ltd.

26 Comments

Byrne

I'm glad it's not automated and hope that if you offer an automated version at some stage that the automation is optional. I really don't like handing over too much control to something else :)

Michele

I get this error after installing CleanSweap Can't call method "id" on an undefined value at /home/23753/domains/mysite.com/html/plugins/CleanSweep/lib/CleanSweep/CMS.pm line 111.

For some reason, the dashboard widget doesn't show up on my system (perhaps because I haven't collected any 404s yet). Something happened to it though, because now the graph in the dashboard widget covers up part of the Manage menu (or the Manage menu is displayed behind the dashboard graph).

The code snippet that's supposed to go in the http.conf file gets cut off for longer URLs. It's still selectable, but not fully displayed, which might be a little confusing to some people.

Also, the Location directory is not allowed in .htaccess (at least for Apache anyway). If people are going to put the ErrorDocument in .htaccess, they need to leave off the Location wrapper.

I'm having trouble implementing this for my site. I'm using dynamic publishing for most of my templates.. so each of my blogs have the .htaccess file MT creates. How do I modify this to work with cleansweep??

How does Clean Sweep's Apache directive in .htaccess reconcile with the MT4-generated directives?

As an example, MT4 generates the following:

<IfModule !mod_rewrite.c>
  # if mod_rewrite is unavailable, we forward any missing page
  # or unresolved directory index requests to mtview
  # if mtview.php can resolve the request, it returns a 200
  # result code which prevents any 4xx error code from going
  # to the server's access logs. However, an error will be
  # reported in the error log file. If this is your only choice,
  # and you want to suppress these messages, adding a "LogLevel crit"
  # directive within your VirtualHost or root configuration for
  # Apache will turn them off.
  ErrorDocument 404 /mtview.php
  ErrorDocument 403 /mtview.php
</IfModule>

I've put the Clean Sweep 404 directive BELOW this MT4 directive, but my 404s are still picked up by mtview.php and not by Clean Sweep. Does the CS directive need to be ABOVE the MT one?

@Kelly - in thinking more about it, I am thinking that under dynamic publishing there will need to be an alternative to mtview.php and mtview.cgi. That is the only way to have CS work with MTDP.

Thanks for responding so quickly, Byrne. It sounds like Clean Sweep is not yet compatible with sites published dynamically. Is this a fair assertion? If so, you might want to put that in the documentation above to avoid excess user frustration.

All the same, it still looks like a great plugin - and as soon as my bug report on dynamic publishing clears I'll switch my site to static publishing and will use it in earnest.

I am having the same issue. Did you figure out a cure for this?

Carlo, w3.myopenid.com...

You must add the Clean Sweep widget to the dashboard for the blog in question.

If you try and add the widget to the 'System Overview' dashboard, you will receive this error.

(Hint: The 'id' parameter in the MT error string refers to a blog id).

Hey Byrne.

I don't understand these steps:

Navigate to the Plugin Settings area for Clean Sweep. Enter in the full URL to your "URL Not Found" page you created in step #3. Copy that URL into the "404 URL" configuration parameter for Clean Sweep.

Where do I navigate to? When I go to System Overview > Plugins, I get a listing of plugins, but no place to add a URL.

Thanks, Eli

@Eli - you need to:

  1. navigate the dashboard for the blog you wish to enable Clean Sweep for.
  2. from the Preferences menu, select Plugins
  3. listed there you will see Clean Sweep

That is where you need to be.

Burning question: Where do I "clean" the Clean Sweep log file?

Wow... I don't think you can yet. I will need to add that capability.

That would be smashing! You have the header check box there to check all (that doesn't check all, btw) but no 'Delete' button like on the other pages with this same UI layout, such as Entries, Pages, etc, etc.

Have another question for ya Byrne, what is this:

'

That right now has the highest count (15, last was 1 hr. ago), I assume that is an apostrophe, but what does it mean? Many thank.

I've got the plugin installed, I added the line to my htaccess without the tags, I go to "Manage 404's" but it says: "No cleansweep_logs could be found"

I have not yet restarted Apache because I don't know how (with Media Temple). Could this be the reason why it's not working?

thanks, shane

I also forgot to ask, will this work with dynamic publishing?

Shane, it will say that until it logs some 404s. Then the log file will be created, and it will show them on the Manage > 404s screen.

I noticed that message too the first time I checked.

I do not know about dynamic publishing, as I do not use it. But I think that was discussed earlier here in the comments. It sounds like it might be possible with some modifications to mtview.php and the ErrorDocument definition.

Ok thanks a lot, Ken. Does anyone know how to modify the "mtview.php and the ErrorDocument definition" to work with Dynamic Publishing? Our site has over 5,000 entries, so static publishing is not an option.

thanks!

To get the functionality (sorta) of this plugin, you could just look in your log file. Do you use AwStats or some other stats package?

In other words I have no idea what to modify, and I won't make any suggestions because of that ;)

Hi,

I uploaded the folders to both locations, after which the console said it needed updating and according to the log was successful:

User 'admin' installed plugin 'Clean Sweep', version 1.02 (schema version 0.17).

CleanSweep shows under System, but when I go to plugins for a blog I get a 500 error (no details in the raw log). Under my Manage menu for the blog, I do have the new 404s item which appears fine.

I'm on MT 4.1 and Apache/2.2.8 with mod_rewrite installed (though I've never used it).

Help?

Installed this version today and when I try to show the broken link report on the dashboard I get this error:

Can't call method "id" on an undefined value at /Library/WebServer/CGI-Executables/mt/plugins/CleanSweep/lib/CleanSweep/CMS.pm line 123.

Just installed it but it's only showing "mt4/mt.cgi?_mode=404&blogid=1" in the 404 list without any of my actual 404.

Also, it's outputting "Reading Config" to my logs. Debug statement left in there?

Hmm, I fixed my error by chaging my ErrorDocument to be a relative path vs. absolute in my .htaccess.

ErrorDocument 404 /mt4/mt.cgi...

My 404 log only contains one item: cgi-bin/MT-4.1-en/mt.cgi?_mode=404&blogid=2

No matter what URL I try to go to. I'm turning CleanSweep off (from my .htaccess) until I can figure out why this is - but does anyone have any thoughts? Byrne?

If I'd bother to read @seth's comment, then re-comment, I wouldn't have had to bother posting - or re-posting myself. OOoooops. It was late....

Similar to Carlo and kimonostereo, I'm getting: Can't call method "id" on an undefined value at /home/virtual/site50/fst/var/www/html/adam/cgi-bin/MT-4.1-en/plugins/CleanSweep/lib/CleanSweep/CMS.pm line 124.

This is when I try to go to my blog home page, where I have the module set to display. The only way I can get to my homepage is to disable CleanSweep. If it's linked to the wrong blog ID, where can I change it?

Adam

Leave a comment



Recent Entries

Automagic URL redirection and SEO maximization in Movable Type
Clean Sweep now allows me to change my URL structure without worrying about how Google might penalize me. Clean Sweep…
Creating plugins in Movable Type, with NO PERL REQUIRED
Not to long ago I began work on a very cool new theme, or Template Set, for Movable Type. I…
Keeping a watch over customers using Twitter, and what it really means to be "open"
Not too long ago I stumbled upon a user who was having problems with Movable Type who I later helped…
Change Congress