Welcome!
Most commercial speech technology applications rely on large amounts of annotated data. The performance of a system depends on the quality of the collected data. It is important that the data are analysed professionally by people with suitable knowledge.
By appointing STTS, you will have a reliable subcontractor experienced in development of text-to-speech, pronunciation lexica, and data processing for automatic speech recognition. We are comfortable with handling large amounts of data, and have our own methods and tools to guarantee a quality that is hard to match. Read more about our services, and contact us if you want more information about what we can do for you.
About STTS
STTS is a specialist company, focusing on computational linguistics and speech technology. The company offers speech technology advice, general and custom-made synthetic voices, lexicon development, annotation of speech and language data, and other consultancy services in speech technology, including software development.
STTS was founded in 2002 by Harald Berthelsen, Nikolaj Lindberg, Hanna Lindgren and Jessica Waywell. The company is owned by the founders, and has no external investors. Since January 2007, the company office is located at Östgötagatan, Södermalm, in Stockholm.
The founders have years of experience of research and development in computational linguistics, speech technology and dialogue systems. Their university education covers linguistics, computational linguistics, speech technology, phonetics, programming, Swedish, English, German, Irish and Finnish.
Since the foundation, STTS has primarily worked with development of pronouncing dictionaries for a number of European languages, recording and annotation of speech data for unit selection speech synthesis, and transcriptions and semantic labelling of speech data to be used in dialog systems.
Services
STTS offers development for language and speech technology, mainly lexicon databases, speech synthesis and speech recognition. Furthermore, we produce tools for development within this area. If you want more information on our services, please do not hesitate to contact us.
Lexicon
STTS offers general and customised pronunciation lexica for different languages and applications. We provide corrections and updates of existing lexica as well as development of completely new ones. Our services include, for example, development of transcription guidelines, transcriptions, testing and quality assurance.Speech synthesis
STTS provides development of synthetic voices in different languages. We produce general-purpose as well as custom-made or limited-domain voices. We can be responsible for the whole process or parts of it, including manuscript design, speaker selection, recordings, post-processing and testing.Speech recognition data
STTS provides assistance with development and maintenance of speech recognition data for different languages. We can manage transcriptions of speech data, semantic categorisation of utterances, etc.STTS' Swedish pronouncing dictionary
STTS has developed a Swedish pronouncing dictionary of high quality, available for purchase. The lexicon includes morphological information, and is suitable for the development of text-to-speech and automatic speech recognition systems, etc.
The lexicon consists of about 8,500 lemmas along with expanded forms, some 47,000 word forms in total. The words are selected based on frequency, and transcribed according to the SAMPA conventions, with minor adjustments. More information:
DESCRIPTION (PDF) »
SAMPLE FILE (ISO-8859-1, TEXT) »
The lexicon is available for purchase in its current form at the cost of SEK 40,000. Quotations for additions and/or adjustments can be provided upon request.
If you are interested in purchasing this lexicon, or want more information, please contact us!
Sample lexica
We have compiled two sample lexica for demonstration use. One contains Dublin street names, in RP English, the other a set of Stockholm street names, in standard Swedish.
The samples were produced with no specific application in mind. This is not the usual way for STTS to work, since we prefer to formulate, implement and follow phonetic and other guidelines as strictly as possible. Such guidelines are typically tied to a specific application domain. These samples were produced from fresh data, in the sense that STTS has not previously transcribed this data, neither for internal use nor for any customer.Symbol set
We try to follow the SAMPA/SAMPROSA conventions as far as possible, but we use a space character, / /, as phoneme delimiter. /$/ is used
as a syllable delimiter, and each word is delimited by
/#/.
File format
The plain text file format of the sample lexicon (generated from an internal database) looks like this:
<ORTHOGRAPHY>(<TAB><COMMENT>)? <TAB><TRANSCRIPTION> ...
The orthography starts a new line, and is followed by an optional tab separated comment. One or more transcriptions then follow on lines starting with a tab.
The first transcription following an entry should be considered the preferred pronunciation, followed by zero or more variants.
If an entry consists of several words that all have multiple transcriptions, all possible combinations have been generated. For example, if an entry consists of two words, one of which has three pronunciations and the other has two, there will be six transcriptions of this item.
Dublin
stts_dublin_demo.txt
This is a sample lexicon produced by STTS. It consists of a few Dublin street names, more or less randomly picked. The language is British RP English, and the phoneme set is based on SAMPA.
Words of one syllable have no stress symbol, while all other words
have exactly one stressed syllable (/"/), and
possibly a syllable with secondary stress (/%/). Syllabic
consonants are followed by /=/, in accordance with the X-SAMPA
conventions. The syllabification follows the "maximum onset"
principle, and has been automatically verified.
We distinguish between the /i:/ and /i/
phonemes (which in SAMPA can be collapsed into a single phoneme), but
we do not make a distinction between /u:/ and
/u/ (which is also an optional distinction in
SAMPA).
Stockholm
stts_sodermalm_demo.txt
This is a sample lexicon produced by STTS. It consists of most of the street names of Södermalm, an island part of central Stockholm. The language is standard Swedish, and the phoneme set based on SAMPA.
Words of one syllable have no stress symbol, while all other words
have exactly one stressed syllable (/"/ for accent 1,
/""/ for accent 2). Words with accent 2 can also have a
secondary stress (/%/).
The syllabification usually follows the morphological boundaries. When no obvious boundaries exist, the "maximum onset" principle has been applied.
Tools
STTS has developed a number of software tools for different purposes. Usually, a tool is created for a customer project with specific requirements for markup, testing and formats. We use the tools internally, and improve them as more projects are carried out.
We use Java, Scala, Tcl/Tk, Ruby, Python and Perl for software development. Our tools can be adapted to most computers and operating systems. Most of our tools are available for licensing or purchase, but they are primarily used for in-house projects. We can also develop tools, custom-made for your requirements. Contact us for more information on prices and terms.
Lexicon
STTS has developed a lexicon tool, LTool, a graphical interface for transcribing pronunciation lexica. The tool can handle both dictionary files and relational databases. Among the most important features are automatic consistency checks and validation. The validation rules are specified per project in an XML format, and can for example be configured to check for transcriptions lacking stress, illegal stress patterns, syllables without vowels, illegal syllabification, or endings not transcribed according to project guidelines.
Speech synthesis
STTS has a number of tools for development of speech synthesis, for example a labelling tool, which we use to verify the automatic labelling of speech databases. We also have tools for lexicon development, see above.
Speech recognition data
STTS has developed a transcription tool, TTool, for efficient manual transcription of recorded speech. Its main components are a graphical user interface, a validation component and a relational database. It is pre-configured with a set of standard tags for transcriptions and labels/events, but can be adapted for other markup systems.
Our categorization tool, CTool, is used for semantic labelling of utterances, and can be configured with different number of semantic label types. It contains an automatic prediction component, which assigns a label that the user can accept or correct. The prediction component is customisable.
Research
STTS aims to work in close connection to research and academia. We do this partly because of our own interest in research, but mainly to improve our work by taking part in the latest development on speech and language technology. We are currently involved in the following:- STTS was one of the partners of CTT (Centre for speech
technology) at KTH, the Royal
Institute of Technology in Stockholm. As part of the partnership, STTS
produced Swedish and English unit selection synthesis for Festival.
- Since 2005, Harald Berthelsen is a PhD student at Trinity College in Dublin. His work
deals with Irish speech synthesis.
- STTS has assisted in supervising students at CTT and the Language engineering programme at the university of Uppsala.
Texts
Below are some examples of texts produced by STTS.XStream
An introduction (Swedish only) to a Java library for XML processing, XStream. In Datormagazin issue number 5, 2006. Can be ordered from www.datormagazin.se.Java Webstart
An introduction (Swedish only) to Java Webstart, a painless way of distributing Java applications. In Datormagazin issue number 5, 2006. Can be ordered from www.datormagazin.se.Weka
A text (Swedish only) introducing the freely available Weka machine learning system. Published in Datormagazin issue number 4, 2006. You can order a copy from www.datormagazin.se.GnuPG mini tutorial
We use GnuPG to encrypt customer data. Here is a mini tutorial for encryption using GnuPG (pdf file, Swedish only).Egrep for linguists
Egrep for linguists was written around 1997. It deals with different Unix/Linux commands useful for text processing. It contains examples of how to use regular expressions, egrep, sed, sort, uniq, cat, cut, tr, etcetera. The text has been used at university courses.
[pdf] [html]$ egrep '^(Hate|Death|Sin)\b' sonnets.txt Sin of self-love possesseth all mine eye, Death's second self that seals up all in rest. Hate of my sin, grounded on sinful loving,
$ egrep -E '\b[Ss]ources (\w+ ){0,4}said\b' newstext
yesterday, Whitehall sources said the Government may be forced to sus
Leadership sources said last night the new initiative would
British diplomatic sources in Paris said the joint flypast is inten
Senior Tory Party sources said there were practical difficulties
Sources close to Hizbollah said in Beirut last n
$ cat newstext|tr -cs '[a-zA-Z0-9-]' '\012'|egrep '\w-\w+-'|
egrep -v year-old|sort|uniq -c|sort -rn|head
5 black-and-white
4 state-of-the-art
4 brother-in-law
4 Wem-ber-lee
3 vis-a-vis
3 up-to-date
3 up-and-down
3 two-and-a-half
3 over-the-counter
3 off-the-record
Jobs
At the moment, there are no open positions at STTS.Projects
From time to time, we need to hire people for short-term projects. Usually, it has to do with annotation work, for example transcriptions of lexica or speech data for different languages. If you are interested in working in such projects, please contact us. We will get in touch with you if something comes up that matches your profile. University studies in linguistics, computational linguistics, language technology, phonetics or one or more languages are prioritised.Contact
| [first-name] [at] stts.se |
|
|||
| Phone | Jessica Waywell (sales) | +46 70 378 55 57 | ||
| Harald Berthelsen | +46 70 598 35 35 | |||
| Nikolaj Lindberg | +46 70 629 35 74 | |||
| Hanna Lindgren | +46 70 529 35 49 | |||
| Sofie Dahl | +46 73 640 80 11 | |||
| Address |
STTS Södermalms talteknologiservice Östgötagatan 36 SE-116 25 Stockholm, Sweden |
|||
| Public transport |
Underground: Green line (17/18/19) to Medborgarplatsen or Skanstull Commuter train: Södra station/Stockholm south |
|||
| Maps | Google Maps | |||
| eniro.se (in Swedish) | ||||
| hitta.se (in Swedish) | ||||
| VAT No | SE556632010601 | |||
Copyright
Unless otherwise specified, all material on this website is protected by Swedish copyright law. The website and its contents are property of Södermalms talteknologiservice AB (STTS), and may not be distributed, transmitted, displayed or otherwise published without the written permission of STTS.Cookies
This website uses cookies to cache the visitor's preferred language. If you do not want this information to be cached, you can modify your browser's cookie settings.
© 2005–2009 Södermalms talteknologiservice AB. VAT No SE556632010601.