How to Fix CGI

Over the many years of their coexistence, the terms CGI and Perl have become virtually synonymous. This perception that CGI and Perl are one and the same has contributed to some small degree to the perception that Perl is outdated and an inappropriate language for web programming - unlike its more modern counterparts like Ruby or PHP.

I of course know this to be a complete fallacy. First of all, the company I work for is a devout Perl shop and our products, all written in Perl, are collectively some of the largest and most scalable on the Internet. Few other companies in fact present more widely attended seminars and tutorials about scaling for the Web then ours do at conferences like eTech and OSCON.

But perhaps the more significant fallacy is that CGI and Perl are synonymous. In actuality, they technically have very little to do with one another, and there is technically no reason one should be hampered by the other, despite all evidence to the contrary.

A little history and background

CGI stands for "Common Gateway Interface" and was invented to allow any script, written in any scripting language, to also act as a web based application. In the early days of the Internet this was incredibly helpful to the first Web programmers (a.k.a. system administrators) proficient in sh, bash, csh, tcsh and perl because it allowed them to quickly deploy simple web based automation tools based on scripts and libraries they had already written. Cool.

But the inherent flexibility of language agnosticism is also CGI's greatest liability, and also by association, Perl's as well. You see, CGI is based upon the principle that when the web server receives a request, it does not know what scripting language will interpret that request. Therefore, it defers processing directly to the operative system, and so must do something geeks call "forking and exec'ing" - or in other words, the web server must start up an entirely new process on your server to handle the request. This may not sound like a big deal, but it sometimes can be as each forked process holds the entire Perl interpreter in memory. And that is a big deal. First it can consume a lot of your server's memory, and depending upon the size of your application, it can be slow to initialize. It works - and works well by virtue of working, but it by no means ideal.

More modern languages have been designed to avoid this. Let's take PHP for instance. PHP is a language that was designed for web programming exclusively. Therefore its architects made a critical (perhaps even obvious) decision early on: if the web server is going to handle a lot of PHP scripts, why bother forking a process to determine what scripting language will handle the request (this is expensive) - why not just load the PHP interpreter into memory once and interpret the request within the web server (which is much cheaper)?

So when it comes to CGI vs PHP, it is not really about Perl vs. PHP at all. It is really about understanding two solutions to two different problems - one operating under the assumption that every request will be processed by the same interpreter and the other designed to execute any script via the web.

The solution as it stands today

In the Perl world, there are actually two Apache modules that attempt to do what PHP does inherently: load the Perl interpreter into memory so that you no longer have to spawn a new process each and every time your web server receives a CGI request. Those two modules are mod_perl and mod_fastcgi.

However, these two modules have a critical flaw: they are incredibly complex because they attempt to solve a huge problem set having to do creating a persistent and stateful execution context. The result are two modules that are not only too heavy weight for the average user but also incredibly difficult to install - even for myself.

How to actually fix the problem

The more I have thought of this problem, the more I have come to believe that there is little standing in Perl's way to have all of the benefits that PHP has gained from being a language designed exclusively for the Web.

In theory, one should be able to take the source code of mod_php (the Apache module that dispatches web requests to a PHP interpreter) and swap out the component responsible for dispatching a request to PHP for one that dispatches the request to Perl. In theory it should just work (more or less). The result would be an Apache module that would be easy to install, and be much more efficient in handling and processing requests online.

Granted, this solution would not be stateful and persistent the way mod_perl and mod_fastcgi are, but that is not a problem this solution is engineered to solve.

Introducing mod_perlite

All of this is a really long-winded setup for what is a very quick conclusion.

I shared this hypothesis with an engineer at work, Aaron Stone, who shares a passion for Perl with me, but who also has a passion for operations. He took on this challenge and devoted part of his 20% time to testing this hypothesis.

The output of his work is called mod_perlite. It is largely derivative of PHP and is capable of processing Perl scripts quickly and efficiently. The next step of the project is to make it compatible with the CGI protocol, which can be done by gutting parts mod_cgi and dropping them into mod_perlite.

So far our results are promising, and it is possible that with a little hacking we may have just made Perl faster on the web and easier to deploy for everyone.

If you are interested in helping or participating in this project please let me know -- we could certainly use the help.

Recommended Entries

22 Comments

Sounds interesting, Byrne.

Can you elaborate: what can the current version of modperlite do? and what can it not do (yet)? And without getting into the details, what is the difference between modperl and mod_perlite (is it just the persistence part?)

Mark, the goal of mod_perlite is to run single Perl scripts in the Apache process space, caching Perl bytecode as it goes, but flushing script memory after every request. Installation is also incredibly simple, and 100% analogous to PHP installation.

Right now, mod_perlite can be loaded into Apache and serves requests for any file ending in ".pl" with the phrase "Just Another Perl Hacker" (ala man perlembed ;-)

Still to do: - thrash at a few more bits of the PerlIO - Apache interface. - develop a script caching model (ala Zend Accelerator or APC). - add a script run-timer to kill runaway scripts (ala php's maxexecutiontime).

Fundamentally, mod_perl seeks to map nearly all of Apache's API to Perl. mod_perlite seeks merely to put the Perl interpreter into the same process space and not much else.

Hi Byrne, I think you are hitting part of the nail right on the head. Once you get something like MT installed (the first part of the nail), then you have all of these performance elements that can reach up to bite you. This sounds like an interesting stab at those that could prove very useful.

Does Perlite give the user the option of sharing/stashing data across multiple requests if they want? Also how does this module detect changes in the code? Said differently, if I write a script the loads a module like Data::ObjectDriver and later upgrades that package, how will the file running under perlite pick the change flush the Perl byte code?

The module would be completely stateless, just like PHP. So no - I could not stash data to be shared from one request to another. That is something mod_fastcgi and mod_perl were designed to do.

But mod_perlite is designed to be more light weight. To be simple above all else.

That being said, if I chance a .cgi file on my server then mod_perlite will pick up that change immediately - no server restart required.

Thanks for the clarification. Simplicity is probably the best course. Session or system scope is probably more then most can deal with and sets up potential problems such as memory leaks.

I thought you'd find this link interesting: http://use.perl.org/~jjohn/journal/20761

I'm of the mind that the modperl project blew it in a number of ways.

Not sure I was clear enough on my question of detecting updates and lushing the byte code cache. Let me try another example. Movable Type (you may have heard of it) has tiny .cgi files that essentially call one module that then does all of the work of loading other modules and processing the request amongst a lot more. If I don't update that .cgi, but do update one of those modules that get loaded later, how does perlite pick that up and recompile the source?

Because mod_perlite should only, by design, keep the interpreter resident in memory. Perl modules should be loaded on demand.

However a great feature I can see would be a mode where some modules can be included/excluded from being cached in some way. That way commonly used modules are only loaded once.

But that too would violate the design constraint of mod_perlite. The idea here is to simply avoid the cost of forking and execing a process, not to try to gain any other efficiency because doing so is a slippery slope that leads right into a well of complexity I just assume avoid.

If you need persistence - use mod_perl.

Indeed, I intend to answer most feature requests with "sounds like you need mod_perl" -- but of course I'm only looking forward to getting to that point and not nearly there yet ;-)

Granted, this solution would not be stateful and persistent the way mod_perl and mod_fastcgi are, but that is not a problem this solution is engineered to solve.

Reverse that: statefulness and persistency is a problem in mod_perl that this solution is avoiding.

It is, IMO, the main reason why people use PHP over Perl for web applications, and most definitely on shared webservers: because it's too easy for independent projects on the same webserver to trip over each other.

If you are interested in helping or participating in this project please let me know -- we could certainly use the help.

Well, I'm definitely most interested in how this project is going to turn out, so I'm definitely going to keep an eye on it, but as I'm not a C programmer, nor have I ever done anything mod_anything related, I doubt if you could have any use for me at this stage. So I'll keep standing at the sideline for now. If you need a hand that ordinary Perl programmers can lend, just give me a yell.

Hmm, did you actually benchmark the benefits of this over pure CGI? As pointed out on perlmonks, the time the OS takes to fork a process and load the perl interpreter is rather minuscule in comparison to module load times. So unless I'm very much mistaken I don't see what mod_perllite gains (unless you're on Windows possibly).

To be honest, not yet. Remember, this is just a theory at this point. I think mod_perlite needs to be relatively stateless, but it still needs to address at its core the problem of start up time. If in the end fork and exec is not the bottleneck, then we need to shift attention to the more significant contributor to poor performance.

I think that this may very well introduce yet another use case for the need to specify a list of perl modules to load upon start up. That way large Perl modules can be read into memory and shared.

But this must done without opening the door to memory leaks. Is that possible?

Like Bart, I'm not a C programmer, but what about the model I suggest on perlmonks?

This provides environment separation for different web sites, while allowing you to preload modules (and even data), and still makes sure that each request gets served by a pristine Perl process.

Clinton, I am beginning to see, just based upon the many comments already, that some facility should be given to the module to allow for some modules to be pre-loaded at start up. That clearly has too many benefits and would help us to achieve our primary objective.

Where I am not to sure I agree is any idea that encourages the spawning of a separate process to manage other processes. When it comes to thread and process management, we should leave that exclusively to Apache. If our solution ventures beyond that then my fear is the complexity that would surely follow. All of a sudden you open the door for people to legitimately need a million configuration values to control load, resource allocation, etc.

I can't stress enough - the success of this module will lie with its simplicity. If in the process, its architecture and design can inspire more complex solutions or can inform the designs of existing solutions like mod_perl or mod_fastcgi, then that is a good thing.

This is really interesting indeed. I can’t wait to see where this goes.

As for module preloading, such a facility would be A Very Good Thing indeed.

If you omit that, users will have to fight with the same problem as CGI presents: minimising start-up time. The practical upshot of that is generally: try to use as few modules as possible in order to minimise load time. You can imagine what that means. Having start-up as a constraint seriously mangles code design decisions every step of the way.

Lack of preloading would also make it hard to use Catalyst, and impossible to use DBIx::Class with large schemata – the start-up penalty for both such that they’re next to useless in fully non-persistent environments like CGI.

If you want to allow people to make unafraid use of CPAN, preload support is of the essence.

It's ALIVE! :-)

Basic CGI functions work at this time. Check out the module from http://code.sixapart.com/svn/mod_perlite/trunk and give it a whirl! Build and install instructions are in the README file and are incredibly straight-forward.

I get the impression that mod_php is falling out of favor for mass virtual hosting. It seems that the request separation was not sufficient to prevent security and performance problems.

Dreamhost for example only runs PHP as CGI or FastCGI.

@saj - really? You think mod_php is falling out of favor? That is quite a statement... one that I can't imagine is true. PHP is one of the most ubiquitous web scripting languages, second only to Perl. A cannot imagine hosts degrading their support for this language.

Well, my evidence is anecdotal - one specific host. My main point, however, is that mod_perlite is likely to run into the same problems with multiple users running code within the same Apache module. Specifically, performance monitoring is difficult because you cannot pinpoint which user's code may be causing problems. Also, one user's code may be able to exploit security holes to access another user's data.

Of course, PHP isn't known for having the best code base so perhaps you guys can do a better job of finding solutions to these problems.

@saj - It's interesting you bring these things up because Aaron and I were just talking about what we could do to make the module more appealing to hosting providers - especially in regards to monitoring and process/thread control. Do you think hosting providers are simply interested in monitoring or doing resource control around specific realms/domains?

I think mostly monitoring. Someone notices that a server is running too slowly and the host wants to know which customer to kick off.

Regarding your note about modperl & modfastcgi "However, these two modules have a critical flaw: they are incredibly complex ..."

Don't know about modperl but fastcgi saved my day - I had no problem installing it on Win32 (don't know much about Linux etc.. :-( ), I used CGI::Fast to convert complex CGI application to fast_cgi application - did in within hours and it works f a s t - what used to take 5sec. takes ~1sec

fast_cgi keeps your script in memory so you must be careful with memory leaks, but you do compile once

Looking at what you have written about mod_perllite it reminds me ActiveState's perl isapi which runs only on windows, anyway it is great to see new ways to do same things that have advantages when you get memory leaks (we all do...)

@Roey - It is great to hear you were so successful in deploying FastCGI. What is so remarkable in my personal experience is how variable the ease of installation can be. I have been on some systems where FastCGI is a "yum update" or "apt-get" away, and then on some systems where FastCGI causes Apache to segfault constantly, and others where the prerequisites were virtually impossible to satisfy. It is so unpredictable.

And that is one of the experiences informing and motivating the mod_perlite project. Because nothing should be that difficult or unpredictable to install.

Leave a comment

what will you say?


Recent Comments

  • @Roey - It is great to hear you were so successful in deploying FastCGI. What is so remarkable in my personal experience is how variable the ease of installation can be. I have been on some systems where FastCGI is a "yu...

  • Regarding your note about modperl & modfastcgi "However, these two modules have a critical flaw: they are incredibly complex ..." Don't know about modperl but fastcgi saved my day - I had no problem installing it on...

  • I think mostly monitoring. Someone notices that a server is running too slowly and the host wants to know which customer to kick off. ...

    saj.thecommune.net
    How to Fix CGI
  • @saj - It's interesting you bring these things up because Aaron and I were just talking about what we could do to make the module more appealing to hosting providers - especially in regards to monitoring and process/thre...

  • Well, my evidence is anecdotal - one specific host. My main point, however, is that mod_perlite is likely to run into the same problems with multiple users running code within the same Apache module. Specifically, perf...

    saj.thecommune.net
    How to Fix CGI
Close