Egrep for linguists
Egrep for linguists was written around 1997. It deals with different Unix/Linux commands useful for text processing. It contains examples of how to use regular expressions, egrep, sed, sort, uniq, cat, cut, tr, etcetera. The text has been used at university courses.
[pdf] [html]$ egrep '^(Hate|Death|Sin)\b' sonnets.txt Sin of self-love possesseth all mine eye, Death's second self that seals up all in rest. Hate of my sin, grounded on sinful loving,
$ egrep -E '\b[Ss]ources (\w+ ){0,4}said\b' newstext
yesterday, Whitehall sources said the Government may be forced to sus
Leadership sources said last night the new initiative would
British diplomatic sources in Paris said the joint flypast is inten
Senior Tory Party sources said there were practical difficulties
Sources close to Hizbollah said in Beirut last n
$ cat newstext|tr -cs '[a-zA-Z0-9-]' '\012'|egrep '\w-\w+-'|
egrep -v year-old|sort|uniq -c|sort -rn|head
5 black-and-white
4 state-of-the-art
4 brother-in-law
4 Wem-ber-lee
3 vis-a-vis
3 up-to-date
3 up-and-down
3 two-and-a-half
3 over-the-counter
3 off-the-record