| Sidebar: What Uses Regular Expressions?
 
Regular expressions are a way of representing string
patterns for 
searching in the UNIX editors, search programs, and
the awk programming 
language. They provide a way of representing a general
string pattern 
that could match either a specific fixed string, such
as someone's 
name, "Larry," or any number of possible strings,
such as 
"[Ll]arry," which matches whether the string
is capitalized 
or not. 
The editors ed, ex, and vi; the stream editor, 
sed; the search programs grep and egrep (but 
not fgrep); and the awk language all use regular expressions
in their searching operations. Other programs not included
with UNIX 
but frequently found on UNIX systems, such as emacs
and Perl, also 
use regular expressions. Furthermore, even programs
that you write 
can use regular expressions, since the regcmp(1) program
will 
precompile a regular expression into a C program, usable
by shell 
scripts, and the regcmp(3G) function call will allow
the use 
of regular expressions from within another C program. 
Certain characters function as metacharacters -- that
is, as characters 
that can be used either as commands or as literals in
a fixed string. 
However, if a metacharacter is to function as an ordinary
character, 
it's necessary to insert a symbol to prevent its being
interpreted 
as a command. This is rarely a serious problem, since
the metacharacters 
are punctuation characters that do not usually appear
with the alphanumeric 
characters commonly used for searching. 
The metacharacter most likely to be needed as a fixed
character is 
the period (.), which might appear as a decimal point
in the 
midst of a number you are searching for. If you specified
"23.45", 
for example, since the period as a metacharacter stands
for any character, 
you have actually asked for "23" followed
by any character 
followed by "45". Since an actual period could
be the "any 
character," this might work just as written, but
if the search 
finds "23945," that would match, too! To force
the period 
to be simply a period, put a backslash (\) in front
of it, 
"23\.45". This is known as "escaping"
the metacharacter. 
But if the backslash is used as an escape metacharacter,
how do you 
specify the backslash? Use "\\", which makes
the backslash 
metacharacter a regular backslash. 
The vi editor is a screen-oriented front end to the
ex 
editor. When you use vi's slash command (/) to search,
most of the regular expression metacharacters will work,
but some 
do not. However, a few more of the metacharacters work
with ex. 
Recall that any command in vi beginning with a colon
(:) 
is really an ex command: thus, ":s" is the
ex substitute 
command, which searches for a regular expression and
replaces it with 
another regular expression. So, all of the regular expressions
that do not work in vi but do work in ex can 
be made to work by using the equivalent ex colon command. 
The grep family of programs really consists of three
different 
programs that all do searching. The fgrep program works
only 
with fixed strings, so regular expressions cannot be
used with it. 
(Contrary to a lot of popular belief, the f in fgrep
does not stand for "fast", since egrep is
faster. It 
stands for "fixed-string.") The grep and egrep
programs do use regular expressions, but each uses a
different set 
from the other. egrep uses a larger set, but grep does
use one handy metacharacter set (the \{\} range specifier)
that egrep 
does not use. If you need the range specifier, use grep,
not 
egrep. Otherwise, egrep is the fastest of the three
greps, allows more metacharacters than grep, and can
handle more complex regular expressions when needed. 
The sed program uses the same regular expressions as
grep 
(but not the same ones as egrep). The grep programs
do not perform replacements, but the sed program does,
so a 
few metacharacters used for that are added to sed's
list. For 
instance, the \(\) set allows specification of a subexpression
that can be referred to later in the search expression
or in the replace 
expression. A reference to the first subexpression is
done with the 
\1 metacharacter set, to the second with the \2, and
so on for up to \9. The interesting advantage is that
the \# 
refers to the actual characters matched, which might
be unknown until 
the time they are matched. Thus, "\(ab.de\)"
when referred 
to by \1 might contain any character between the "b
and 
"d," and whatever it turns out to be will
be used by the \1". 
The search and replace use of regular expressions is
paramount to 
the successful use of sed. 
Finally, awk uses the same set of regular expression
metacharacters 
as egrep in the pattern part of its pattern 
{ action } syntax. Any of these very powerful programs
have 
very limited usefulness without the addition of regular
expressions. 
Regular expressions provide the ability to search for
sets of possible 
strings, the exact contents of which maybe unknown,
but for which 
the format is specifiable.  
 
 
 |