Welcome!

Most commercial speech technology applications rely on large amounts of annotated data. The performance of a system depends on the quality of the collected data. It is important that the data are analysed professionally by people with suitable knowledge.

By appointing STTS, you will have a reliable subcontractor experienced in development of text-to-speech, pronunciation lexica, and data processing for automatic speech recognition. We are comfortable with handling large amounts of data, and have our own methods and tools to guarantee a quality that is hard to match. Read more about our services, and contact us if you want more information about what we can do for you.

»»» BUY STTS' SWEDISH PRONOUNCING DICTIONARY! «««

STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo STTS logo



About STTS

STTS is a specialist company, focusing on computational linguistics and speech technology. The company offers speech technology advice, general and custom-made synthetic voices, lexicon development, annotation of speech and language data, and other consultancy services in speech technology, including software development.

STTS was founded in 2002 by Harald Berthelsen, Nikolaj Lindberg, Hanna Lindgren and Jessica Waywell. The company is owned by the founders, and has no external investors. Since January 2007, the company office is located at Östgötagatan, Södermalm, in Stockholm.

The founders have years of experience of research and development in computational linguistics, speech technology and dialogue systems. Their university education covers linguistics, computational linguistics, speech technology, phonetics, programming, Swedish, English, German, Irish and Finnish.

Since the foundation, STTS has primarily worked with development of pronouncing dictionaries for a number of European languages, recording and annotation of speech data for unit selection speech synthesis, and transcriptions and semantic labelling of speech data to be used in dialog systems.


Services

STTS offers development for language and speech technology, mainly lexicon databases, speech synthesis and speech recognition. Furthermore, we produce tools for development within this area. If you want more information on our services, please do not hesitate to contact us.

Lexicon

STTS offers general and customised pronunciation lexica for different languages and applications. We provide corrections and updates of existing lexica as well as development of completely new ones. Our services include, for example, development of transcription guidelines, transcriptions, testing and quality assurance.

Speech synthesis

STTS provides development of synthetic voices in different languages. We produce general-purpose as well as custom-made or limited-domain voices. We can be responsible for the whole process or parts of it, including manuscript design, speaker selection, recordings, post-processing and testing.

Speech recognition data

STTS provides assistance with development and maintenance of speech recognition data for different languages. We can manage transcriptions of speech data, semantic categorisation of utterances, etc.

STTS' Swedish pronouncing dictionary

STTS has developed a Swedish pronouncing dictionary of high quality, available for purchase. The lexicon includes morphological information, and is suitable for the development of text-to-speech and automatic speech recognition systems, etc.

The lexicon consists of about 8,500 lemmas along with expanded forms, some 47,000 word forms in total. The words are selected based on frequency, and transcribed according to the SAMPA conventions, with minor adjustments. More information:

DESCRIPTION (PDF) »
SAMPLE FILE (ISO-8859-1, TEXT) »

The lexicon is available for purchase in its current form at the cost of SEK 40,000. Quotations for additions and/or adjustments can be provided upon request.

If you are interested in purchasing this lexicon, or want more information, please contact us!


Sample lexica

We have compiled two sample lexica for demonstration use. One contains Dublin street names, in RP English, the other a set of Stockholm street names, in standard Swedish.

The samples were produced with no specific application in mind. This is not the usual way for STTS to work, since we prefer to formulate, implement and follow phonetic and other guidelines as strictly as possible. Such guidelines are typically tied to a specific application domain. These samples were produced from fresh data, in the sense that STTS has not previously transcribed this data, neither for internal use nor for any customer.

Symbol set

We try to follow the SAMPA/SAMPROSA conventions as far as possible, but we use a space character, / /, as phoneme delimiter. /$/ is used as a syllable delimiter, and each word is delimited by /#/.

File format

The plain text file format of the sample lexicon (generated from an internal database) looks like this:

 <ORTHOGRAPHY>(<TAB><COMMENT>)?
 <TAB><TRANSCRIPTION>
  ...  

The orthography starts a new line, and is followed by an optional tab separated comment. One or more transcriptions then follow on lines starting with a tab.

The first transcription following an entry should be considered the preferred pronunciation, followed by zero or more variants.

If an entry consists of several words that all have multiple transcriptions, all possible combinations have been generated. For example, if an entry consists of two words, one of which has three pronunciations and the other has two, there will be six transcriptions of this item.


Dublin

stts_dublin_demo.txt

This is a sample lexicon produced by STTS. It consists of a few Dublin street names, more or less randomly picked. The language is British RP English, and the phoneme set is based on SAMPA.

Words of one syllable have no stress symbol, while all other words have exactly one stressed syllable (/"/), and possibly a syllable with secondary stress (/%/). Syllabic consonants are followed by /=/, in accordance with the X-SAMPA conventions. The syllabification follows the "maximum onset" principle, and has been automatically verified.

We distinguish between the /i:/ and /i/ phonemes (which in SAMPA can be collapsed into a single phoneme), but we do not make a distinction between /u:/ and /u/ (which is also an optional distinction in SAMPA).

Dublin illustration


Stockholm

stts_sodermalm_demo.txt

This is a sample lexicon produced by STTS. It consists of most of the street names of Södermalm, an island part of central Stockholm. The language is standard Swedish, and the phoneme set based on SAMPA.

Words of one syllable have no stress symbol, while all other words have exactly one stressed syllable (/"/ for accent 1, /""/ for accent 2). Words with accent 2 can also have a secondary stress (/%/).

The syllabification usually follows the morphological boundaries. When no obvious boundaries exist, the "maximum onset" principle has been applied.

Stockholm illustration


Tools

STTS has developed a number of software tools for different purposes. Usually, a tool is created for a customer project with specific requirements for markup, testing and formats. We use the tools internally, and improve them as more projects are carried out.

We use Java, Scala, Tcl/Tk, Ruby, Python and Perl for software development. Our tools can be adapted to most computers and operating systems. Most of our tools are available for licensing or purchase, but they are primarily used for in-house projects. We can also develop tools, custom-made for your requirements. Contact us for more information on prices and terms.


Illustration Illustration

Lexicon

STTS has developed a lexicon tool, LTool, a graphical interface for transcribing pronunciation lexica. The tool can handle both dictionary files and relational databases. Among the most important features are automatic consistency checks and validation. The validation rules are specified per project in an XML format, and can for example be configured to check for transcriptions lacking stress, illegal stress patterns, syllables without vowels, illegal syllabification, or endings not transcribed according to project guidelines.

Speech synthesis

STTS has a number of tools for development of speech synthesis, for example a labelling tool, which we use to verify the automatic labelling of speech databases. We also have tools for lexicon development, see above.

Speech recognition data

STTS has developed a transcription tool, TTool, for efficient manual transcription of recorded speech. Its main components are a graphical user interface, a validation component and a relational database. It is pre-configured with a set of standard tags for transcriptions and labels/events, but can be adapted for other markup systems.

Our categorization tool, CTool, is used for semantic labelling of utterances, and can be configured with different number of semantic label types. It contains an automatic prediction component, which assigns a label that the user can accept or correct. The prediction component is customisable.


Research

STTS aims to work in close connection to research and academia. We do this partly because of our own interest in research, but mainly to improve our work by taking part in the latest development on speech and language technology. We are currently involved in the following:

Illustration


Texts

Below are some examples of texts produced by STTS.

XStream

An introduction (Swedish only) to a Java library for XML processing, XStream. In Datormagazin issue number 5, 2006. Can be ordered from www.datormagazin.se.

Java Webstart

An introduction (Swedish only) to Java Webstart, a painless way of distributing Java applications. In Datormagazin issue number 5, 2006. Can be ordered from www.datormagazin.se.

Weka

A text (Swedish only) introducing the freely available Weka machine learning system. Published in Datormagazin issue number 4, 2006. You can order a copy from www.datormagazin.se.

GnuPG mini tutorial

We use GnuPG to encrypt customer data. Here is a mini tutorial for encryption using GnuPG (pdf file, Swedish only).

Egrep for linguists

Egrep for linguists was written around 1997. It deals with different Unix/Linux commands useful for text processing. It contains examples of how to use regular expressions, egrep, sed, sort, uniq, cat, cut, tr, etcetera. The text has been used at university courses.

[pdf] [html]
$ egrep '^(Hate|Death|Sin)\b' sonnets.txt
  Sin of self-love possesseth all mine eye,
  Death's second self that seals up all in rest.
  Hate of my sin, grounded on sinful loving,
$ egrep -E '\b[Ss]ources (\w+ ){0,4}said\b' newstext
yesterday, Whitehall sources said the Government may be forced to sus 
          Leadership sources said last night the new initiative would 
  British diplomatic sources in Paris said the joint flypast is inten 
   Senior Tory Party sources said there were practical difficulties
                     Sources close to Hizbollah said in Beirut last n
$ cat newstext|tr -cs '[a-zA-Z0-9-]' '\012'|egrep '\w-\w+-'| 
  egrep -v year-old|sort|uniq -c|sort -rn|head
      5 black-and-white
      4 state-of-the-art
      4 brother-in-law
      4 Wem-ber-lee
      3 vis-a-vis
      3 up-to-date
      3 up-and-down
      3 two-and-a-half
      3 over-the-counter
      3 off-the-record

Jobs

At the moment, there are no open positions at STTS.

Projects

From time to time, we need to hire people for short-term projects. Usually, it has to do with annotation work, for example transcriptions of lexica or speech data for different languages. If you are interested in working in such projects, please contact us. We will get in touch with you if something comes up that matches your profile. University studies in linguistics, computational linguistics, language technology, phonetics or one or more languages are prioritised.

Contact

Email      [first-name] [at] stts.se      Illustration
 
Phone Jessica Waywell (sales)    +46 70 378 55 57
Harald Berthelsen +46 70 598 35 35
Nikolaj Lindberg +46 70 629 35 74
Hanna Lindgren +46 70 529 35 49
Sofie Dahl +46 73 640 80 11
 
Address STTS Södermalms talteknologiservice
Östgötagatan 36
SE-116 25 Stockholm, Sweden
 
Public
transport
Underground: Green line (17/18/19) to Medborgarplatsen or Skanstull
Commuter train: Södra station/Stockholm south
 
Maps Google Maps
eniro.se (in Swedish)
hitta.se (in Swedish)
 
VAT No SE556632010601



Copyright

Unless otherwise specified, all material on this website is protected by Swedish copyright law. The website and its contents are property of Södermalms talteknologiservice AB (STTS), and may not be distributed, transmitted, displayed or otherwise published without the written permission of STTS.

Cookies

This website uses cookies to cache the visitor's preferred language. If you do not want this information to be cached, you can modify your browser's cookie settings.


Valid HTML 4.01 Transitional Valid CSS!

© 2005–2009 Södermalms talteknologiservice AB. VAT No SE556632010601.