We have compiled two sample lexica for demonstration use. One contains Dublin street names, in RP English, the other a set of Stockholm street names, in standard Swedish.
Symbol setWe try to follow the SAMPA/SAMPROSA conventions as far as possible, but we use a space character, /
/, as phoneme delimiter. /
$/ is used as a syllable delimiter, and each word is delimited by /
The plain text file format of the sample lexicon (generated from an internal database) looks like this:
<ORTHOGRAPHY>(<TAB><COMMENT>)? <TAB><TRANSCRIPTION> ...
The orthography starts a new line, and is followed by an optional tab separated comment. One or more transcriptions then follow on lines starting with a tab.
The first transcription following an entry should be considered the preferred pronunciation, followed by zero or more variants.
If an entry consists of several words that all have multiple transcriptions, all possible combinations have been generated. For example, if an entry consists of two words, one of which has three pronunciations and the other has two, there will be six transcriptions of this item.