Egrep for linguists

Egrep for linguists is an extensive egrep tutorial written by Nikolaj Lindberg. It deals with different Unix/Linux commands useful for text processing. It contains examples of how to use regular expressions, egrep, sed, sort, uniq, cat, cut, tr, etcetera. The text has been used at university courses.

[pdf] [html]

$ egrep '^(Hate|Death|Sin)\b' sonnets.txt
  Sin of self-love possesseth all mine eye,
  Death's second self that seals up all in rest.
  Hate of my sin, grounded on sinful loving,
$ egrep -E '\b[Ss]ources (\w+ ){0,4}said\b' newstext
yesterday, Whitehall sources said the Government may be forced to sus 
          Leadership sources said last night the new initiative would 
  British diplomatic sources in Paris said the joint flypast is inten 
   Senior Tory Party sources said there were practical difficulties
                     Sources close to Hizbollah said in Beirut last n
$ cat newstext|tr -cs '[a-zA-Z0-9-]' '\012'|egrep '\w-\w+-'| 
  egrep -v year-old|sort|uniq -c|sort -rn|head
      5 black-and-white
      4 state-of-the-art
      4 brother-in-law
      4 Wem-ber-lee
      3 vis-a-vis
      3 up-to-date
      3 up-and-down
      3 two-and-a-half
      3 over-the-counter
      3 off-the-record