LevSelector.com New York
home > grep

grep tutorial

intro home - top of the page -

www.engin.umich.edu/htbin/mangate?manpage=grep - grep manpage
There are many tutorials and FAQs - search google for grep tutorial or grep faq

grep basically searches.
      shell> grep foo file(s)
      shell> grep foo *
returns all the lines in file(s) that contain a string matching the expression "foo" (which may be a regular expression).

Another way of using grep is to have it accept data through STDIN and filter to STDOUT. For example,
       shell> ls |grep blah
lists all files in the current directory containing the string "blah"

Some common option:
-v   - invert (print all lines except those that contain the pattern).
-i    - ignore case of letters (small and capital treated as the same)
-l    - (list) - print a list of the file names where mathes found
-s   - suppress error messages about nonexistent or unreadable files.

-c   - print only a count of the lines that  contain the pattern.
-n   - precede each line by its line number  in  the file (first line is 1).
-h   - prevents the name of the file containing  the matching  line  from  being  appended to that line (used when searching multiple files).
-w  - search for the expression as  a  word  as  if surrounded by \< and \>.

Example: - find all cgi scripts in the directory which calls certain stored procedure:
      ls | grep -i myproc

Example: - count how many 'httpd' processes are running:
      ps -ef | grep httpd | wc
Example: - pipe several greps to filter out things:
      ps -ef | grep  -v  ^oracle | grep -v  ^root | grep -v  ^nobody
Example: find all files in the directory tree containing certain pattern:
      find . -type f -print | xargs grep -ls 'your pattern' /dev/null/
Commonly used find commands:
find . -mtime -1 -print      - find files modified in the last 24 hours
find . -mtime -7 -name 'j*html'  -print       - find files modified no more than 7 days
find . -name  '*.pl'  -exec perl -wc {} \;    - compile many perl files at once

Note:  in the example above that the patterns are regular expressions. So the '^' means  "match at the beginning of the line".

Example:  using dot (matches any character) and star (tells that the preceding character may be 0 or more times):
The File for These Examples Wildcards #1 Wildcards #2 Wildcards #3
>cat file
bad bug 
>grep b.*g file
bad bug 
>grep b.*g. file
>grep ggg* file

Note:  If the pattern consists of several words - grep will think that the 2nd word is a file name.  So you need to surround the whole pattern with single quotes.  You may want to use double quotes if you want shell to expand the meaning of variables, for example:
      grep "$HOME" file       - searches file for the name of your home directory
      grep '$HOME' file        - searches for the string $HOME

Note:  If the pattern contains '$' or some other special characters ( ?  \  .  [  ]  ^  $ ) - shell will interpret them before passing them to grep.  To avoid this, you need to escape them with backslashes. Also some characters may have special meaning to grep itself (like dot or '^'). If you want just the character itself - you need to escape with backslash. You use escapes inside single-quoted pattern.

     grep 'hello\.gif' file    - matches hello.gif
     grep 'hello.gif' file     - matches lines containing hello-gif , hello1gif , helloagif , etc.

Example: using '?' (means 'may be one'):
     grep 'bugg\?y' file      -  matches all of the following: bugy , buggy but not bugggy
     grep 'Fred\(eric\)\? Smith' file     - matches Fred Smith or Frederic Smith

Other regex constructs:
    grep '\(abc\)*' file    - matches abc , abcabcabc etc. (i.e. , any number of repetitions of the string abc , including the empty string.)
    grep [Hh]ello  file    - matches lines containing hello or Hello

    [0-3]   is the same as   [0123]
    [a-k]   is the same as   [abcdefghijk]
    [A-C] is the same as [ABC]
    [A-Ca-k] is the same as [ABCabcdefghijk]
There are also some alternate forms :
    [[:alpha:]] is the same as [a-zA-Z]
    [[:upper:]] is the same as [A-Z]
    [[:lower:]] is the same as [a-z]
    [[:digit:]] is the same as [0-9]
    [[:alnum:]] is the same as [0-9a-zA-Z]
    [[:space:]] matches any white space including tabs

The [] may be used to search for non-matches. This is done by putting a '^' as a first character inside the square brackets.
      grep "([^()]*)a" file     -  returns any line containing a pair of parentheses that are innermost (don't have parenthesis inside them) and are followed by the letter "a". So it matches these lines
   (aksjdhaksj d ka)a
But not this

  A closing square bracket loses its special meaning if placed first in a list. For example []12] matches ] , 1, or 2.
  A dash - loses it's usual meaning inside lists if it is placed last.
  A carat ^ loses it's special meaning if it is not placed first
  Most special characters lose their meaning inside square brackets
  Note that a $ sign loses its meaning if characters follow it

Matching a Specific Number Of Repetitions of a Pattern:
Example: searching for a 7 digit phone number like this:
      grep "[:digit:]\{3\}[ -]\?[:digit:]\{4\}" file
This matches phone numbers, possibly containing a dash or whitespace in the middle.

The $ character matches the end of the line. The ^ character matches the beginning of the line.
      grep "^[[:space:]]*hello[[:space:]]*$" file
      grep "^From.*somename" /var/spool/mail/myname    - searches mail inbox for headers from a particular person.

Vertical line means 'either this or that':
      grep "cat\|dog"    - file matches lines containing the word "cat" or the word "dog"
      grep "I am a \(cat\|dog\)" matches lines containing the string "I am a cat" or the string "I am a dog".

The expression <H[1-6]>.*</H[1-6]> is not good enough to match html headers, since it matches <H1>Hello world</H3> (Error here - opening tag is different from the closing one). Solution - use a backreference
Backreference is an expression \n where n is a number.  It matches the contents of the n'th set of parentheses in the expression.

<H\([1-6]\).*</H\1> matches what we were trying to match before.
"Mr \(dog\|cat\) came home to Mrs \1 and they went to visit Mr \(dog\|cat\) and Mrs \2 to discuss the meaning of life matches"

despite the origin of the name (extended), egrep actually has less functionality as it is designed for compatibility with the traditional egrep. A better way to do an extended "grep" is to use grep -E which uses extended regular expression syntax without loss of functionality.
grep grep -E Available for egrep?
a\+ a+ yes
a\? a? yes
expression1\|expression2 expression1|expression2? yes
\(expression\) (expression1) yes
\{m,n\} {m,n} no
\{,n\} {,n} no
\{m,} {m,} no
\{m} {m} no


This page adapted from http://www.pegasus.rutgers.edu/~elflord/unix/grep.html