The CGI is the simplest and by far the most common way of providing 
              Web pages with dynamic content. Essentially, the CGI (Common Gateway 
              Interface) is a way for the Web server to invoke a program to generate 
              HTML that gets sent back to the Web browser, rather than simply 
              serving up a static HTML file. Without the CGI and other similar 
              dynamic content schemes, many things would be impossible on the 
              Web -- stock trading and booking of vacations, for example, 
              and just about anything requiring input from users. The Web would 
              still be simply a mechanism for downloading static documents. Figure 
              1 shows how CGI scripts fit into the picture. 
            
 These programs invoked by the Web server are called CGI scripts. 
              The name of the program is sent by the Web browser in the URL, followed 
              by arguments to the CGI script. The Web server sets up the CGI script's 
              environment so that it can access the arguments, then starts the 
              CGI script. The CGI script then runs, does whatever the programmer 
              coded, and writes its output to stdout. The Web server redirects 
              stdout back to the Web browser that sent the request. 
            
 With static HTML, the Web server simply sends the requested HTML 
              file back to the user's Web browser, which then interprets 
              the HTML, formats it, and displays it. Take this URL for example: 
            
 A CGI script can be written in almost any programming language 
              that will not get in the way of checking the values of environment 
              variables, reading from stdin and writing to stdout. 
              The majority are written in Perl, but they can also be written in 
              C or the shell scripting language of your choice. 
            
 A CGI script can, intentionally or otherwise, do anything that 
              the user it runs as can do. Typically, CGI scripts run as the same 
              user as the Web server. On most UNIX systems, the Apache Web server 
              is used and by default, Apache runs as user "nobody". 
              By convention, "nobody" is a user for unprivileged operations. 
              Some may think that something running as nobody could not do much 
              to compromise a Web server, but there are many ways security can 
              be compromised. 
            
 There are many files sprinkled around the typical UNIX system, 
              which are readable by all users, but which you probably don't 
              want in the public domain. A prime example is /etc/passwd. 
              This file contains a list of all users on your system, and if you 
              are not using a shadow password file, also contains the encrypted 
              forms of all of your users' passwords. If a hostile party can 
              manage to get a copy of /etc/passwd, you are wide open to 
              a password guessing attack, even if you use a shadow password file. 
              If you don't use a shadow password file, you will be easy prey 
              for a dictionary attack, whereby a program encrypts a long list 
              of words and compares them against encrypted passwords. 
            
 It is easy to write a CGI script that is vulnerable to malicious 
              query string contents, which can make a CGI script do things it 
              was never intended to do (e.g., sending a file on the server back 
              to a hacker's Web browser). A classic CGI security problem 
              occurs when a CGI script starts a shell and passes it data from 
              the query string without carefully checking the query string contents. 
            
 Listing 1 contains a CGI script that appears relatively innocuous 
              -- it provides a way for somebody to run a whois query 
              from his or her Web browser. For example, this HTTP request will 
              return information about IP address 207.46.130.45: 
            
 Of course, there are plenty of other things that this outwardly 
              innocent-looking CGI script could be coaxed into doing -- like 
              emailing the password file or any other file readable by the Web 
              server to a mailing list. 
            
 CGI scripts should not start a subshell if there is a way around 
              it. In Perl, a subshell is started in any of the following ways: 
            
 1. Using the backtick operator, as demonstrated above 
            
 2. Opening up a pipe to a program. For example, the code fragment 
              below would cause the process /bin/lpr (a process that submits 
              print requests), and would cause anything that the Perl script writes 
              to SPOOLER to be redirected into the standard input of /bin/lpr: 
            
 1. Using system () 
            
 2. Using popen () 
            
 1. Parse all input. Determine which set of characters is valid 
              for the particular type of input token you are expecting, and allow 
              ONLY those characters. Either remove or escape other characters. 
              It is not as simple as scanning a piece of data for shell metacharacters 
              and rejecting anything that contains a metacharacter, for two reasons: 
            
 Some characters that are shell metacharacters may be valid 
                in some positions in the input. For example, ";" 
                is a valid character to appear in a file name. But ";" 
                is also a command separator. Suppose a CGI script is displaying 
                the contents of a file on behalf of a user, with the file name 
                coming from the user. The script has a line of code like this:
             
              
'/bin/cat $filename'
             $filename is the name that came from the user. As we have 
            previously seen, a hacker could insert a ";" into 
            the file name with a command after it, thereby causing the shell to 
            execute the command. The hacker could send in a "file name" 
            that looks like this: 
             
              
myfile;ls
            This would cause the file myfile to be displayed, followed 
            by a directory listing. Suppose that you must allow ";" 
            to appear in the file name, which is a valid character in the file 
            name. Before it is passed to the shell, the script should change $filename 
            to this: 
             
              
myfile\;ls
            This would cause the file myfile;ls, if there is such a file, 
            to be displayed. Be careful -- the hacker may know that you are 
            escaping metacharacters, and he may have already sent in a file name 
            that looks like this: 
             
              
myfile\;ls
            If the CGI script simply sticks another "\" before 
            the ";", then what gets passed to the shell is: 
             
              
myfile\\;ls
            This would cause the file myfile\ (if it exists) to be displayed, 
            followed by the directory listing that you didn't want the hacker 
            to see. 
             As you can see, there are all sorts of games that can be played 
              with metacharacters, and having anything other than a blanket ban 
              on metacharacters in input can be tricky and hard to test. It is 
              a matter of balancing flexibility versus security. Even when a CGI 
              script is checking its input, it is important to remember that all 
              programs have bugs and the more complex a program is, the more bugs 
              it is likely to have. Complex logic to check input is more likely 
              to have bugs than simple logic to check input, and exploiting bugs 
              is what hackers do. 
            
 This is a list of metacharacters for various shells: 
            
 
              
;<>*|'&$!#()[]{}:'"/^\n\r
            If any of these are in data passed to subshells, make sure that they 
            are properly escaped and make sure that all possibilities are tested. 
             2. Specify the absolute filename of any commands, so that the 
              PATH environment variable will not be used to find the command. 
              Also ensure that the PATH environment variable is set to a known 
              value. It should contain only directories that are writable solely 
              by the owner of the directory. The reason it is important to set 
              PATH to a known, good value, even if your CGI script does not use 
              PATH to find commands, is that your script might start a command 
              that relies on PATH. 
            
 The risk in relying on PATH to find commands is that a hacker 
              could have modified PATH to include a particular directory. That 
              directory could contain a malicious script placed there by the hacker. 
            
 If a command that your script starts relies on PATH, the danger 
              is mitigated by allowing only directories that are solely writable 
              by their owners. That will reduce the risk of executing a malicious 
              script. 
            
 
            
 If CGI scripts are coded in Perl, a further measure that can (and 
              should) be taken is to turn on "taint" checking. Taint 
              checking is a feature of Perl that forces a program to check untrusted 
              input and environment variables. To turn on taint checking in Perl 
              5, change the CGI script to add the -T option to the invocation 
              of Perl, as shown below: 
            
 
              
#!/usr/bin/perl -T
            If only this change and no other changes are made to the CGI script 
            shown in Listing 1, the script shown dies with the following message: 
             
              
Insecure dependency in '' while running with -T switch.
            This is because the scalar $parm is considered "tainted". 
             The principle of taint checking is that all data from outside 
              the program, or derived from data outside the program, must be "laundered" 
              before the data is used in such a way that it could affect something 
              outside your program. Until data is laundered, it is considered 
              tainted. Attempting to use tainted data in any command that invokes 
              a shell, or in any command that modifies files, directories, or 
              processes, will cause the program to die. 
            
 To launder tainted data, the program must perform a regular expression 
              match. It must then derive the new value of the data from subpattern 
              variables set by the regular expression match. Let's attempt 
              to untaint the data in our example CGI script. After splitting $key 
              and $val, insert the code from Listing 2, which will guarantee 
              that $val has only certain characters. 
            
 When a hacker attempts to add an additional shell command onto 
              the end of the whois request, the script will detect it and 
              terminate. Remember that although taint checking makes you check 
              your input, it doesn't enforce the quality of the checking. 
              The quality of the checking is up to you. 
            
 Now let's attempt to run this script with good input and 
              see what happens. This time, it dies with another error: 
            
 
              
Insecure $ENV{'PATH'} while running with -T switch
            The problem this time is that our script is still running with the 
            PATH environment variable that it inherited from the Web server, which 
            has an unknown value and is considered tainted. As previously mentioned, 
            not setting PATH to a known value, which contains only directories 
            non-writable by anyone except the owner, is a bad practice. Listing 
            3 shows the fixed script. This version of the script does not pass 
            extra commands to the shell, and will only execute the intended programs. 
             Imagine what could have happened if the Web server had been running 
              as root. The CGI script we started with could have been hijacked 
              to do anything, such as emailing the shadow password file to somebody, 
              or trashing an entire file system. Never run the Web server as root. 
              It is usually necessary to start the Web server as root so that 
              it can open the HTTP port, but it should be configured to change 
              to another user, such as nobody, after it is finished with initialization. 
              In fact, it is not a bad idea to set up a user specifically for 
              running the Web server, because there are often other services that 
              run as nobody. 
            
 Other Problems with Bad Input Data 
            
 Suppose you have a form with three radio buttons on it. At the 
              bottom of the form there is a "submit" button that, when 
              clicked, causes a CGI script to run. These radio buttons select 
              a text file on the server, which the CGI script will write to the 
              user's browser. 
            
 The three possible files that can be selected by this set of radio 
              buttons are "file1", "file2", and "file3". 
              Here is a possible implementation of the CGI code: 
            
 
            
 # Since this CGI script is outputting plain text, not HTML, tell 
 # the browser to expect plain text.
 print "Content-type: text/plain\n\n";
 
 # Write the contents of $radiobutton to stdout. Value should be 
 # file1, file2 or file3 since input could only have come from our 
 # form. There's no need to check it -- since all our users are 
 # nice people :-)
 open (FD, $radiobutton);
 print <FD>;
 
            The problem here is that perhaps the form was not used to send the 
            form input. It is trivial for a hacker to display the source HTML 
            for a form and determine what variables the CGI script is expecting 
            in the form data. From there it is not too much more work for the 
            hacker to manually generate an HTTP request using telnet or 
            a simple HTTP client of the hacker's own creation to send a request 
            to the server containing bad form data. A hacker might have guessed 
            (or known, if you are using a public domain CGI script) that the script 
            you are running to handle the form input has this vulnerability. Instead, 
            this piece of code should do something like this: 
             
 if ($radiobutton =~ m/^(file1|file2|file3)$/)
 {
 # Tell the browser to expect plain text.
 print "Content-type: text/plain\n\n";
 
 open (FD, $1);
 print <FD>;
 }
 else
 {
 # Either set $radiobutton to some default value and process or 
 # die with error.
 }
 
            CGI scripts also need to take precautions with plain text input. Consider 
            a system in which users can enter plain text data into a form. Suppose 
            there is a CGI script that handles the form input and saves it in 
            a database verbatim, and another CGI script that retrieves this "plain 
            text" from the database and displays it. The retrieval CGI script 
            might have a section of code that looks like this: 
             
            
 print "<html><title>Here is the text you entered</title><body>";
 print "$userdata\n";
 print "</body></html>";
 
            If $userdata is something like "Hi Fred", then there 
            is no problem. But suppose that when the form data was saved in the 
            database, it contained something like: 
             
            
 <!--#include file="/etc/passwd" -->
 
            If server side includes were turned on in the server, it would display 
            the contents of the password file. 
             There are all kinds of nasty variations on this theme. Something 
              like the following could have been inserted to execute a command 
              to attempt to delete all files on the server: 
            
 
            
 <!--#exec cmd="cd /; rm -rf" -->
            A hacker could even insert HTML designed to blend into the Web site 
            being attacked, complete with a link to a rogue Web site where users 
            might be prompted to enter credit card data for the hacker to steal. 
             To fix this, the CGI script that handles the input should check 
              for "<" and ">" characters 
              in text input that could be used in HTML documents and change those 
              characters to something else, such as < for <, 
              and > for >. Additionally, if server-side 
              includes are enabled, it may be worth turning them off if not necessary. 
            
 Buffer Overflows 
            
 A major source of vulnerabilities in C and other compiled languages 
              has been incorrect assumptions about the size of input to the program. 
              Here is an example in C showing how a buffer overflow could occur: 
            
 
            
 #include <stdio.h>
 #include <stdlib.h>
 
 char query_string_copy [256];
 
 int main (int argc, char *argv [])
 {
 char *qs;
 
 qs = getenv ("QUERY_STRING");
 
 strcpy (query_string_copy, qs);
 
 }
 
            This piece of code gets a pointer to the query string in the environment 
            and makes its own copy of it. However, the buffer that is to receive 
            the copy is only 256 bytes long. If the query string (including the 
            null terminator) is more than 256 bytes long, strcpy will blindly 
            do what it is told and scribble all over whatever comes after query_string_copy 
            in memory. 
             The CGI program may merely crash in a situation like this. However, 
              CGI scripts that are open source and that have bugs like these become 
              easy for dedicated hackers to exploit. A classic form of exploitation 
              of buffer overflows is for the hacker to discover a place in a CGI 
              script where input is not properly length-checked. Then the hacker 
              can design an input string that is intended to overflow the buffer 
              and overlay something specific, such as a return address to the 
              calling function. Once this has happened, the hacker has effectively 
              hijacked the CGI script. The hacker could make the CGI script pass 
              control to some code supplied by the hacker, which could then do 
              just about anything (e.g., deleting files, opening up an xterm on 
              the hacker's host, etc.). 
            
 Fixing this Vulnerability 
            
 When writing CGI code in C, always check the size of all input 
              data and ensure that buffers are never overrun. Avoid the use of 
              the following C library functions, which copy into a destination 
              buffer and do not take a destination length argument or, on some 
              systems, are themselves vulnerable to overflowing of internal buffers: 
            
 
            
 gets (), strcpy (), strcat (), sprintf (), 
 fscanf (), scanf (), sscanf (), vsprintf (), 
 realpath (), getopt (), getpass (), streadd (), 
 strecpy (), strtrns ()
 
            If you use an ANSI C compiler, use function prototypes to ensure that 
            the types of the arguments passed to functions match what the functions 
            expect. If you don't use prototypes, it's very easy to have 
            a type mismatch and never know it. 
             Besides this, it is a matter of fixing compiler warnings, careful 
              inspection, testing, and debugging. 
            
 Other Security Gotchas 
            
 Sometimes programs such as shells and interpreters that are designed 
              to run other programs are located in places where they can be invoked 
              by a request to the Web server. For example, in Windows environments, 
              the Perl interpreter (PERL.EXE) may be located in the cgi-bin 
              directory. This is extremely dangerous, because it allows anyone 
              to run arbitrary commands on the server. Do not do it! No program 
              that you don't want the whole world to be able to invoke should 
              be in any directory that is defined to the Web server as a CGI directory. 
            
 Be careful with temporary files because they could disclose information 
              about the CGI script, the configuration of the server, or confidential 
              information about users. If a CGI script has to create temporary 
              files, those files should be created with the most restrictive permissions 
              possible. If no other users need to read or write to the file, don't 
              give them permission to. If there is no need to have the file stay 
              around after the CGI script is no longer running, make sure it gets 
              deleted before the script terminates. If possible, create temporary 
              files in directories that are readable and writable only by the 
              user that the CGI script runs as. 
            
 Also, beware of temporary files that text editors and other development 
              tools might leave in a CGI directory. A temporary file created by 
              an editor and left in a CGI directory could enable hackers to run 
              old versions of CGI scripts or get the source code. 
            
 Likewise, core files can also disclose information that could 
              be useful to somebody trying to compromise a system. Maybe a hacker 
              has found a way to make a CGI script core dump, and the hacker knows 
              that the CGI script has some confidential information in variables. 
              The hacker could feed the CGI script input to make it core dump, 
              and then get a copy of the dump. If a CGI script is written in C, 
              then when it is in production, use the setrlimit () system 
              call to limit the size of the core file to 0. 
            
 SUID and SGID CGI Scripts 
            
 In UNIX systems, there is a bit in the file permissions called 
              SUID. When the SUID bit is set in a command's file permissions, 
              the program runs with the permissions of the owner of the file, 
              rather than the permissions of the user that started it. Likewise, 
              there is a SGID bit in the file permissions that causes the file 
              to run with the permissions of the group associated with the file. 
              Typically, SUID is used when the script or program needs to be superuser 
              (i.e., root). A well-behaved SUID program gives up its extra privileges 
              as soon as possible. 
            
 It can be dangerous to have SUID or SGID CGI scripts, so their 
              use should be avoided if at all possible. If it is necessary for 
              a CGI script to do something with more privilege than the Web server, 
              take these steps to limit the possible security exposure: 
            
 
            
 1. Do not just make it SUID root. Is there, or could there be, 
              another account that has sufficient privileges but is not superuser? 
              It is better not to run as root if not absolutely necessary. 
            
 2. Do not write a SUID CGI script in a shell scripting language 
              (csh, ksh, etc.). There are too many possible security 
              problems. 
            
 3. Make sure that the CGI script gives up its extra privileges 
              except when it needs them, by setting its effective user ID to the 
              real user ID. 
            
 
            
 Note that if Perl 5 is used, taint checking is automatically turned 
              on when the script is SUID or SGID. 
            
 Putting a CGI Script in Its Own Sandbox 
            
 In an environment in which there are multiple authors of CGI scripts 
              (e.g., a server that is hosting multiple Web sites), it is sometimes 
              advantageous to run CGI scripts as the user who is responsible for 
              the CGI script, not as the Web server. This is done with a piece 
              of software called a CGI wrapper. 
            
 A commonly used CGI wrapper is called CGIWrap, and it is available 
              from http://www.umr.edu/~cgiwrap. CGIWrap is a SUID CGI script 
              that executes other CGI scripts as the user who owns the file, rather 
              than the Web server. It will run under just about any UNIX-based 
              Web server. Typically, the Webmaster develops a policy that all 
              users' CGI scripts must run under CGIWrap. The user puts CGI 
              scripts in a directory under their home directories, and CGIWrap 
              executes the users' CGI scripts from there. 
            
 As an example of how CGIWrap might be used, suppose that a server 
              runs two Web sites, one owned by user Bob and one owned by user 
              Joe. Bob wants to have some CGI scripts, so he creates a directory 
              called public_html/cgi-bin under his home directory home/Bob. 
              Joe puts the CGI scripts for his Web site in home/Joe/public_html/cgi-bin. 
              The executable for CGIWrap goes in the Web server's main cgi-bin 
              directory and is SUID as root. CGIWrap runs all user scripts. 
            
 CGIWrap causes Bob's CGI scripts to run under the permissions 
              of user Bob and Joe's scripts to run as user Joe. Bob's 
              CGI scripts can, if carelessly coded, trash anything writable by 
              Bob, just as Joe's CGI scripts can trash Joe's data. However, 
              unless Joe has given Bob write permissions to his files, Joe's 
              CGI scripts cannot trash Bob's data. 
            
 There are other CGI wrappers besides CGIWrap. Another commonly 
              used one is suEXEC, which comes with the Apache Web server. suEXEC 
              operates on the same general principles as other CGI wrappers, but 
              it is designed to take advantage of Apache's implementation 
              and can only be used with Apache. 
            
 In UNIX systems, there is a facility called chroot, which is a 
              way of giving a program its own root file system outside of which 
              it cannot access. For example, if chroot was used to change a program's 
              root file system to /hom/Joe, and that program tried to open 
              /etc/hosts, then it would actually open /home/Joe/etc/hosts. 
              Once chroot is done, it cannot be undone. Any programs started by 
              a program running in a chroot environment inherit the parent's 
              chroot environment. In effect, the program is locked in a cage that 
              it cannot break out of. This is a very good way of further restricting 
              the potential damage that untrusted CGI scripts can do. There is 
              another CGI wrapper called sbox, from http://stein.cshl.org/software/sbox 
              that makes use of chroot to restrict the environment in which CGI 
              scripts run. If Joe's CGI scripts are always started in a chroot 
              environment with /home/Joe as the root file system for Joe's 
              CGI scripts, then it is impossible for Joe's CGI scripts to 
              even attempt to access anything outside of /home/Joe. However, 
              using chroot to restrict CGI scripts can involve a lot of work. 
              All files that are needed in order for the CGI scripts to run (i.e., 
              shared libraries, the Perl interpreter, and various configuration 
              files) all must exist within the restricted area. This means that 
              directories such as /usr, /tmp, /dev, /etc, 
              and others, will have to be created within the chroot environment. 
              These directories will have to be populated with the subsets of 
              the real directories' files, which are needed in order to support 
              the programs that run under the chroot environment. 
            
 The use of CGI wrappers make users accountable for the actions 
              of their individual scripts, rather than having an amorphous mass 
              of scripts that various users have responsibility for, all running 
              as "nobody". CGI wrappers are not a security panacea however. 
              All of the nasty things that CGI scripts running as "nobody" 
              can do can also be done by CGI scripts running as any other user. 
              There are a lot of world readable files on the typical UNIX system 
              that you don't want anybody with a Web browser to access. 
            
 Developing a CGI Security Strategy 
            
 There are obviously many security issues that a Webmaster must 
              consider, and high on the list should be the security issues associated 
              with CGI scripts. 
            
 Taking Responsibility 
            
 The Webmaster must ensure that all CGI scripts placed on any Web 
              server have been through a process to find and fix security holes. 
              Some of the items that should be on the CGI script security checklist 
              include: 
            
 
            
 
            
              -  Is all input parsed to ensure that the input is not going to 
                make the CGI script do something unexpected? Is the CGI script 
                eliminating or escaping shell metacharacters if the data is going 
                to be passed to a subshell? Is all form input being checked to 
                ensure that all values are legal? Is text input being examined 
                for malicious HTML tags? 
              
-  Is the CGI script starting subshells? If so, why? Is there 
                a way to accomplish the same thing without starting a subshell? 
              
-  Is the CGI script relying on possibly insecure environment 
                variables such as PATH? 
              
-  If the CGI script is written in C, or another language that 
                doesn't support safe string and array handling, is there 
                any case in which input could cause the CGI script to store off 
                the end of a buffer or array? 
              
-  If the CGI script is written in Perl, is taint checking being 
                used? 
              
-  Is the CGI script SUID or SGID? If so, does it really need 
                to be? If it is running as the superuser, does it really need 
                that much privilege? Could a less privileged user be set up? Does 
                the CGI script give up its extra privileges when no longer needed? 
              
-  Are there any programs or files in CGI directories that don't 
                need to be there or should not be there, such as shells and interpreters? 
            
Language Considerations One very important thing to consider is what programming languages 
              will be allowed for CGI scripts. Perl has the best security features, 
              but it is an interpreted language and therefore inherently slower 
              than a compiled language like C. If the job can be done quickly 
              enough by a Perl CGI script, it's probably better to go with 
              Perl; otherwise, use C. Never allow CGI scripts to be written in 
              shell scripting languages. There are too many potential security 
              problems. 
            
 Using Other People's Code 
            
 Using code that you have downloaded from the Internet is fine. 
              In fact, it has its advantages. A CGI script that has been used 
              previously by lots of other people probably has fewer bugs than 
              one that somebody has just cooked up. However, you need to be cautious. 
              When you get CGI scripts off the Internet, make sure you check on 
              what bug fixes might be available. Use the most current, stable 
              version. Read through the fix history to make sure you have the 
              latest applicable security bug fixes. It should be checked as rigorously 
              as any script written in house. If it is written in a compiled language 
              like C, do not just download a binary and install it, even 
              if you've seen the source code. How do you know that the binary 
              matches the source? 
            
 Conclusion 
            
 CGI security is a difficult and complex subject to tackle. There 
              are many variables, involving the CGI script itself, its environment, 
              the Web server, the operating system, and whatever input all the 
              millions of users might throw at a CGI script. However, it is still 
              extremely important to to come to grips with CGI security. Not doing 
              so could be disastrous. 
            
 Charles Walker is a computer consultant specializing in IP 
              based protocols. Originally from the U.S., he currently lives in 
              London. He can be contacted at: chw@trionetworks.com. 
            
 Larry Bennett is a networking consultant specializing in security 
              and performance. He is based near London and can be contacted at: 
              larry.bennett@trionetworks.com.