SC speech engine opened!

Hey there!

A lot of you already know about the work being done to make a speech engine for Siberia Complex, and several are participating in our second voice competition, The Sentence II. As promissed in that competition, the speech engine code for our current work will be made public, as Open Source.

And this happens now!

The speech engine, loosely named “Chatter” (or Chad amongst friends) is still fairly basic; this dialog is what is being used in the competition. It still lacks emotional tones in the voices, and it takes a little skill to make them sound as you want them to. The latter is partly because there is no dictionary yet; you write the sounds spoken, not the English words. This does give a lot more control, of course! But with a bit of practice, it matches most minor speech engines out there, including several commercial ones (it actually is now surpassing the small commercial one we use for draft animations!).

Chad is in version 0.6.10, the code generally clean and with basic documentation. It is a single C++ file; no nightmare of file linking! In fact, it seems to be pretty compact, compared to other Open Source ones out there. Hardened programmers will quickly find some redundancies and inefficient code, but it has been cleaned up pretty nicely, IMHO. Version 0.7 will be a near-full rewrite of the code to allow for better voices, more voice variation, and better toning of voices for anger, whispering, etc., so though The Sentence II has a submissions deadline April 30., Chad 0.6.10 is now being released to the public.

Of course, the example sentences included are not those from the competition; those will be released on April 30.!

THE CODE for reading or download. It is in one .txt file, and can thus easily be read online, downloaded, and/or copied into your favorite C++ development environment.

Further documentation, including an introduction to the individual speech sounds in there, will follow in the coming weeks.

VERY IMPORTANT

One shortcut we took in the coding is that the speech engine does not actually create sound files :eek: We used the GoldWave sound editor for much of the research, and the files produced are “numerical sound files”. Goldwave will load them with no trouble, and can save them as standard sound files in the format of your choice (there is even a Batch function under Files. Very fast and VERY nice for complete dialog conversions). If someone wants to write some code to make the engine produce actual sound files, please feel free to do so (hey, it’s Open Source, feel free to do anything. Except selling it; it’s not really yours, remember ;)). If you succeed, please let us know :smiley:

GoldWave is very cheap, not affiliated with us, and has a very generous trial version. Plus it’s a good program. Check it out.

Very interesting.

If there was a really good open speech engine I would use it. The example of dialogue you’ve provided is a good start for a basic engine but is kinda hard to understand and needs a lot of improvement! Shame about Goldwave not being open source or even free.

Keep up the project though as a quality, open speech engine would be a great thing to have!

Koba

Yes, it is a shame. But the program is good and cheap, so anyone doing serious sound work on a budget should look into it.

But beyond its use in research, it really is not theoretically essential to Chad. It is ONLY used to convert the numerical sound files to .wav’s. All that is needed is someone who knows how to program in that function. From my own research, it should be a very standard operation, but my audio guy knows no programming, and my programmer knows no audio, so I have nobody who knows actual audio programming :frowning: Our attempts to date have yielded nothing but creative errors.

If someone out there knows how to program a sound file converter, please speak up…

On the matter of problems in understanding, yes, we are now up to 20-25% of listeners understanding what is being said with fair precision. That’s what The Sentence II is about: Figure out the full dialog and win a free pick at the Blender E-shop :smiley: Actually, people are doing pretty good at it…

Can this be compiled with turbo.c or in linux cc ?

It’s nice to see some work being done on making an opensource speech synthesizer.

Looking at the code, I’m not really convinced that it needs to be written in C++. Plain C would probably have done the job, but I guess the programmer might not have been familiar with it.

How exactly is the sound data being dumped? With that knowledge, it would be possible to write a python script to convert from these text files to a raw soundfile I think. Having said that, I know very little about audio-related issues such as this.

Aligorith

@kkrawal: I do not know those programs, but the code is standard C++. It was written in Dev++ on Windows XP (yeah… sorry about that…) but should be multiplatform, easy.

@ Aligorith: C++ was chosen for its wide popularity, and in part because Blender is written in the language. That latter reason is not so important anymore, but all things come from somewhere :slight_smile: I have personally been looking at some python code for creating soundfiles, as it seems easier under python, and C++ is capable (I believe) of calling a Python program. My experience with Python is limited, though, so it has been postponed.

Anyone with knowledge of Python audiofile creation are more than welcome to contact me!

The current audio data is in “numeric text”. Goldwave uses it, and it makes research much easier to do. But real audio IS the next and vital step…

Cognis, you might also want to look at Festival, it is open source/free and from what I understand is already fairly advanced.

http://linux-sound.org/speech.html

LetterRip

Thanks, LetterRip. We are already looking at it, but it poses some very troubling problems. Firstly, we simply cannot get it to work :slight_smile: Secondly, it is made up of a range of components that we are trying to get a full overview of, to avoid unwittingly using proprietary software.

Still, there are hopes that Festival, or a derivative version thereof, can be used, should Chad fall short.