There are times when you might want to have some dynamic information (information that is not constant) in your HTML documents. This could include simple information such as the date and time, or a counter that displays “You are visitor number xxx”, but it could also include such things as pie charts/graphs based on user input, results from searching a database, or animations. And the only way you can produce results like these is with CGI scripts (though you can also do so with client-side applications like Java and JavaScript, but that's a totally different story!).
Here is an excellent description that my editor, Andy Oram, wrote up:
Common
Assures you that CGI can be used by many languages and interact with many different types of systems. It doesn't tie you down to one way of doing what you want.
Gateway
Suggests that CGI's strength lies not in what it does by itself, but in the potential access it offers to other systems such as databases and graphic generators.
Interface
Means that CGI provides a well-defined way to call up its features--in other words, that you can write programs that use it.
Simply put, a script is a program! OK, OK, there are semantic differences between the two words. If you really want to know, pick up a book on computer programming (or is that computer scripting :-)
You can create a lot of magic by writing a CGI program/script. You can create graphics on the fly, access databases and return results, and connect to other Internet information servers.
The answer is located in the first three lines of the Perl manpage:
Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information.
Most CGI applications involve manipulating data in some fashion and accessing external programs and applications. Perl provides easy-to-use tools that make these tasks a cinch.
Here is a list of books on CGI and Perl. I got this list from Cye H. Waldman:
Here is a table of books and CD-ROMS about CGI and Perl:
There is a very useful newsgroup: comp.infosystems.www.authoring.cgi, that is “monitored” by numerous CGI experts. However, you should not post a question to this group (or any other group, for that matter), until you have read the FAQ.
Various mailing lists for CGI and the Web exist, as well. Here are two of the most popular:
[email protected] [http://www.webstorm.com/local/cgi-perl]
This list is for those who are writing or interested in writing Perl 5 modules for CGI. It is not intended for any type of CGI support.
Tim Bunce ([email protected]) wrote several elegant and useful CGI modules, although they are currently maintained by Lincoln Stein ([email protected]). These modules are located at:
http://www-genome.wi.mit.edu/WWW/tools/scripting/CGIperl
Lincoln has also written an excellent book on the Web and CGI (see the preceding table).
[email protected] [http://www.ics.uci.edu/WebSoft/libwww-perl/archive]
libwww-perl is a Perl library that provides a simple and consistent programming interface to the Web.
You can access the Perl 4 distribution at:
http://www.ics.uci.edu/pub/websoft/libwww-perl
The Perl 5 libwww modules are located at:
http://www.os/oslonett.no/home/aas/perl/www
Are there archives on the net of mailings or postings about this?
Yes, look at:
The Usenet Newstand (http://CriticalMass.com/Concord/)
All of the comp.infosystems.www.* newsgroups are archived. In addition, the cgi-perl and libwww mailing lists are archived as well.
It really depends on what you are trying to do. The CGI modules should generally be used for heavy-duty CGI scripts. For simple scripts, it is far easier and quicker to roll your own or use CGI Lite (current version is v1.62
http://bytor.engr.wisc.edu/pub/perl/cpan/authors/id/SHGUN/CGI_Lite-1.62.pm.gz). If you really want, you can even use the Perl 4 cgi-lib.pl library (http://www.bio.cam.ac.uk/web/form.html).
Most modules have manpages embedded within the module itself. If that is the case, you can use the pod2man script to view the manpage:
% pod2man module.pm | nroff -man | more
The most widely used CGI library for Perl 4 is cgi-lib.pl written by Steven Benner (http://www.bio.cam.ac.uk/web/form.html). It is very, very simple to use!
CGI::* Modules
(http://www-genome.wi.mit.edu/WWW/tools/scripting/CGIperl/)
These modules allow you to create and decode forms as well as maintain state between forms.
CGI Lite
(http://bytor.engr.wisc.edu/pub/perl/cpan/authors/id/SHGUN/CGI_Lite-1.62.pm.gz)
An alternative to the CGI::* modules. It is a glorious Perl 5 version of cgi-lib.pl.
Both of these modules have the ability to decode the multipart/form-data encoding scheme.
You can use cgi-lib.pl (http://www.bio.cam.ac.uk/web/form.html), which is not object oriented, because it was designed for Perl 4.
But, using the Perl 5 O-O libraries is a piece of cake! Here is a simple example that uses CGI Lite (http://bytor.engr.wisc.edu/pub/perl/cpan/authors/ id/SHGUN/CGI_Lite-1.62.pm.gz) to print out form data:
#!/usr/local/bin/perl5 use CGI_Lite; print "Content-type: text/plain", " "; $cgi = new CGI_Lite () $cgi->parse_form_data (); $cgi->print_form_data (); exit (0);
The server is generally configured so that it executes CGI scripts that are located in the cgi-bin directory. However, the server administrator can set up aliases in the server configuration files, so that scripts with certain extensions (i.e., .cgi, .pl) can also be executed.
File permissions allow read, write, and execute access to users based on their user identification (also known as UID), and their membership in certain groups. You can use the command chmod to change a file's permissions. Here is an example:
% ls -ls form.cgi 1 -rwx------ 1 shishir 974 Oct 31 22:15 form.cgi*
This has a permission of 0700 (octal), which means that no one (besides the owner) can read to, write from, or execute this file. Let's use the chmod command to change the permissions:
% chmod 755 form.cgi % ls -ls form.cgi 1 -rwxr-xr-x 1 shishir 974 Oct 31 22:15 form.cgi*
This changes the permissions so that users in the same group as ”shishir,” as well as all other users, have the permission to read from and execute this file.
See the manpages for the chmod command for a full explanation of the various octal codes.
Perl can be installed anywhere on the system! The only thing you have to ensure is that the server is not running in a chroot-ed environment, and that it can access the interpreter. In other words, system administrators can change the root directory, so that “/” does not point to the actual root (“/”), but to another directory.
You can get a server error for the following reasons:
Generally, the HTTP server will be running as user ”nobody,” or ”www,” or some other user ID that has minimal privileges. As a result, the directory (where you intend to create the file) must be writeable by this process ID.
To be on the safe side, always check the return status from the open ( ) command to see if it was a success:
open (FILE, "/abc/data.txt") || &error ("Could not open file /abc/data.txt"); . . . sub error { local ($message) = @_; print "Content-type: text/html", " "; print "Status: 500 CGI Error", " "; print "<TITLE>CGI Error </TITLE>", " "; print "< H1>Oops! Error </H1>", " "; print "< HR>", $message, "< HR>", " "; }
It is actually a fairly simple process. Your CGI script must be able to perform two tasks:
Decode the form data. Remember, all data in the form will be URL encoded (let's ignore Netscape 2.0 multipart MIME messages).
Open a pipe to mail (or sendmail), and write the form data to the file.
Let's assume you have an associative array called $in (for those of you using Steven Brenner's cgi-lib.pl library, this should be familiar) that contains the form data. Here is how you would deal with sendmail:
open (SENDMAIL, "| /usr/bin/sendmail -f$in{'from'} -t -n -oi"); print SENDMAIL <<End_of_Mail; From: $in{'from'} <$in{'name'}> To: $in{'to'} Reply-To: $in{'from'} Subject: $in{'subject'} $in{'message'} End_of_Mail
One thing you should note is the “Reply-To:” header. Since the server is running as user “nobody,” the mail headers might be messed up (especially when people are trying to reply to it). The “Reply-To:” field fixes that.
There are a lot of mail gateways in operation that use mail in the following format:
open (MAIL, "| mail -s 'Subject' $in{'to'}"); ^ | +-- Possible security hole!!!!
If you don't check the $in{'to'} variable for shell metacharacters, you're in for a major headache! For example, if some malicious user enters the following:
; rm -fr / ;
you'll have a major problem on your hands.
Unfortunately, the mailto: command is not supported by all browsers. If you have this command in your document, it is a limiting factor, as people who use browsers that do not support this do not have the ability to send you mail.
Perl has been ported to all the platforms that are mentioned above. As a result, your Perl CGI program should be reasonably portable. If you're are interfacing with various external programs on the UNIX side, then it probably will not be portable, but if you're just manipulating data, opening and reading files, etc., you should have no problem.
In a CGI environment, STDERR points to the server error log file. You can use this to your advantage by outputting debug messages, and then checking the log file later on.
Both STDIN and STDOUT point to the browser. Actually, STDIN points to the server that interprets the client (or browser's) request and information, and sends that data to the script.
In order to catch errors, you can “dupe” STDERR to STDOUT early on in your script (after outputting the valid HTTP headers):
open (STDERR, ">&STDOUT");
This redirects all of the error messages to STDOUT (or the browser).
Counter scripts tend to be very popular. The idea behind a counter is very simple:
Here is a simple counter script:
#!/usr/local/bin/perl $counter = "/home/shishir/counter.dat"; print "Content-type: text/plain", " "; open (FILE, $counter) || die "Cannot read from the counter file. "; flock (FILE, 2); $visitors = <FILE>; flock (FILE, 8); close (FILE); $VISITORS++; open (FILE, ">" . $counter) || die "Cannot write to counter file. "; flock (FILE, 2); print FILE $visitors; flock (FILE, 8); close (FILE);
You can now use SSI (Server Side Includes) to display a counter in your HTML document:
You are visitor number: <!--#exec cgi="/cgi-bin/counter.pl-->
Here is a simple regular expression that will strip HTML tags:
$line =~ s/<(([^>]| )*)>//g;
Or you can “escape” certain characters in an HTML tag so that it can be displayed:
$line =~ s/<(([^>]| )*)>/<$1>/g;
You can use the environment variable HTTP_USER_AGENT to determine the user's browser.
[ From WWW FAQ ]
Five important environment variables are available to your CGI script to help in identifying the end user.
HTTP_FROM
This environment variable is, theoretically, set to the email address of the user. However, many browsers do not set it at all, and most browsers that do support it allow the user to set any value for this variable. As such, it is recommended that it be used only as a default for the reply email address in an email form.
REMOTE_USER
This variable is only set if secure authentication was used to access the script. The AUTH_TYPE variable can be checked to determine what form of secure authentication was used. REMOTE_USER will then contain the name the user authenticated under. Note that REMOTE_USER is only set if authentication was actually used, and is not supported by all web servers. Authentication may unexpectedly fail to happen under the NCSA server if the method used for the transaction is not listed in the access.conf file (i.e., <Limit GET POST> should be set rather than the default, <Limit GET>).
REMOTE_IDENT
This variable is set if the server has contacted an IDENTD server on the client machine. This is a slow operation, usually turned off in most servers, and there is no way to ensure that the client machine will respond honestly to the query, if it responds at all.
REMOTE_HOST
This variable will not identify the user specifically, but does provide information about the site the user has connected from, if the hostname was retrieved by the server. In the absence of any certainty regarding the user's precise identity, making decisions based on a list of trusted addresses is sometimes an adequate workaround. This variable is not set if the server failed to look up the hostname or skipped the lookup in the interest of speed; see REMOTE_ADDR below. Also keep in mind that you may see all users of a particular proxy server listed under one hostname.
REMOTE_ADDR
This variable will not identify the user specifically, but does provide information about the site the user has connected from. REMOTE_ADDR will contain the dotted-decimal IP address of the client. In the absence of any certainty regarding the user's precise identity, making decisions based on a list of trusted addresses is sometimes an adequate workaround. This variable is always set, unlike REMOTE_HOST, above. Also keep in mind that you may see all users of a particular proxy server listed under one address.
[ End of info from WWW FAQ ]
If you configure your server so that it recognizes that all files in a specific directory (i.e., /cgi-bin), or files with certain extensions (i.e., .pl, .tcl, .sh, etc.) are CGI programs, then it will execute the programs. There is no way for users to see the script itself.
On the other hand, if you allow people to look at your script (by placing it, for example, in the document root directory), it is not a security problem, in most cases.
No, your CGI scripts can access files outside the server and document root directories, unless the server is running in a chroot-ed environment.
No! The forms interface allows you to have a “password” field, but it should not be used for anything highly confidential. The main reason for this is that form data gets sent from the browser to the Web server as plain text, and not as encrypted data.
If you want to solicit secure information, you need to purchase a secure server, such as Netscape's Commerce Server (http://home.netscape.com/comprod/netscape_commerce.html).
You can have your CGI script determine whether your script is being accessed by Netscape:
$browser = $ENV{'HTTP_USER_AGENT'}; if ($browser =~ /Mozilla/) { # # Netscape # } else { # # Non Netscape # }
This has to do with the way the standard output is buffered. In order for the output to display in the correct order, you need to turn buffering off by using the $| variable:
$| = 1;
No, no! The concept of Java is totally different from that of CGI. CGI refers to server-side execution, while Java refers to client-side execution. There are certain things (like animations) that can be improved by using Java. However, you can continue to use Perl to develop server-side applications.
For more information, here are a few documents you can look at:
Sun's Java Documentation (http://sun.java.com/)
Java uber Alles (http://mox.perl.com/perl/versus/java.html) by Tom Christiansen [email protected]
Java, the Illusion (http://www.nombas.com/otherdoc/javamagk.html)
You can access the environment variables through the %ENV associative array. Here is a simple script that dumps out all of the environment variables (sorted):
#!/usr/local/bin/perl print "Content-type: text/plain", " "; foreach $key (sort keys %ENV) { print $key, " = ", $ENV{$key}, " "; } exit (0);
If you send a MIME content type of HTML, you will have to “escape” certain characters, such as “<,” “&,” and “>”, or else the browser will think it is HTML.
You have to escape the characters by using the following construct:
&#ASCII Code;
Here is a simple script that you can run on the command line that will give you the ASCII code for non-alphanumeric characters:
#!/usr/local/bin/perl print "Please enter a string: "; chop ($string = <STDIN>); $string =~ s/([^ws])/sprintf ("&#%d;", ord ($1))/ge; print "The escaped string is: $string "; exit (0);
This most likely is due to permission problems. Remember, your server is probably running as “nobody,” “www,” or a process with very minimal privileges. As a result, it will not be able to execute your script unless it has permission to do so.
Again, this has to do with permissions! The server cannot write to a file in a certain directory if it does not have permission to do so.
You should make it a point to check for error status from the open command:
print "Content-type: text/plain "; . . . open (FILE, ">” . "/some/dir/some.file") || print "Cannot write to the data file!"; . . .
You can use the CGI::MiniSvrmodule (http://www-genome.wi.mit.edu/ftp/pub/ software/WWW/CGIperl/docs/MiniSvr.pm.html) to keep state between multiple entry points.
Or you can create a series of dynamic documents that pass a unique session identification (either as a query, an extra path name, or as a hidden field) to each other.
It's difficult to debug a CGI script. You can emulate a server by setting environment variables manually:
setenv HTTP_USER_AGENT "Mozilla/2.0b6” (csh)
or
export HTTP_USER_AGENT = "Mozilla/2.0b6” (ksh, bash)
You can emulate a POST request by placing the data in a file and piping it to your program:
cat data.file | some_program.pl
Or, you can use CGI Lint, which will automate some of this. It will also check for potential security problems, errors in open ( ), and invalid HTTP headers.
You can call a CGI program by simply opening the URL to it:
http://some.machine/cgi-bin/your_program.pl
You can also have a link in a document, such as:
<A HREF="http://some.machine/cgi-bin/your_program.pl"> Click here to access my CGI program</A>
Why people do this, I don't know. But, you can check the information from all the fields and return a “No Response” if any of them are empty. Here is an example (assume the associative array $in contains your form information):
$error = 0; foreach $value (values %in) { $value =~ s/s//g; $error = 1 unless ($value); } if ($error) { print "Content-type: text/plain "; print "Status: 204 No Response "; print "You should only see this message if your browser does"; print "not support the status code 204 "; } else { # # Process Data Here # }
A CGI program can send specific response codes to the server, which in turn will send them to the browser. For example, if you want a “No Response” (meaning that the browser will not load a new page), you need to send a response code of 204 (see the answer to the last question).
A CGI program can only send one Location header. You also cannot send a MIME content type if you want the server to perform redirection. For example, this is not valid, though it may work with some servers:
#!/usr/local/bin/perl . . . print "Content-type: text/plain " print "Location: http://some.machine/some.doc "";
How can I automatically include a:
"Last updated: ..."
line at the bottom of all my HTML pages? Or can I only do that for SSI pages? How do I get the date of the CGI script?
If you are dynamically creating documents using CGI, you can insert a time stamp pretty easily. Here is an example in Perl 5:
$last_updated = localtime (time); print "Last updated: $last_updated ";
or in Perl 4:
require "ctime.pl"; $last_updated = &cmtime (time); print "Last updated: $last_updated ";
or even:
$date = ‘/usr/local/bin/date‘; print "Last updated: $last_updated ";
You can accomplish this with SSI like this:
<--#echo var="LAST_MODIFIED"-->
Each language has its own advantages and disadvantages. I'm sure you've heard this many times: It depends on what you're trying to do. If you are writing a CGI program that's going to be accessed thousands of times in an hour, then you should write it in C or C++. If you are looking for a quick solution (as far as implementation), then Perl is the way to go!
You should generally avoid the shell for any type of CGI programming, just because of the potential for security problems.
The answer to this is: A CGI program is prone to security problems no matter what language it is written in!
Never expose any form of data to the shell. All of the following are possible security holes:
open (COMMAND, "/usr/ucb/finger $form_user"); system ("/usr/ucb/finger $form_user"); @data = ‘usr/ucb/finger $form_user‘;
See more examples in the following answers. You should also look at:
WWW Security FAQ (by Lincoln Stein) (http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html)
CGI Security FAQ (by Paul Phillips) (http://www.cerf.net/~paulp/cgisecurity/safe-cgi.txt)
@ans = ‘grep '$user_field' some.file‘;
is insecure?
Yes! It's very dangerous! Imagine if $user_field contains:
; rm -fr / ;
An equivalent to the above command is:
if (open (GREP, "-|")) { @ans = <GREP> } else { exec ("/usr/local/bin/grep", $user_field, "some.file") || die "Error exec'ing command", " "; } close (GREP);
No! It's not. It's a security hole if you evaluate the expression at runtime using the eval command. Something like this is dangerous:
foreach $regexp (@all_regexps) { eval "foreach (@data) { push (@matches, $_) if m|$regexp|o; }"; }
--Shishir Gundavaram
(A big thanks to Perl guru Tom Christiansen for coming up with some of the most frequently asked questions.)