Speech Recognition

I d like to ask to this lovely python programmers to create an speech recognition script so we can design and manipulate the interface with voice commands. I know it might be a difficult task but i think the effort worth it. Wadaya think???
I m a designer, not a programmer so i can just make the proposal and give some ideas.
Would you like to take that challenge?

Why re-write one? Try using dragon speech recognition, or similar, and see how difficult it might be to control.

How would you see it working? How would that be useful? Would it be faster? Do you gain functionality, or is this for those who cannot use a keyboard.

I wish that speech recognition had progressed beyond what it was 5 or even 10 years ago but it hasn’t enough. I tried, again, Dragon about 3 months ago and it is still the same dance : much less than 100% recognition in the best of all conditions, a long time to train the software, error rate that goes through the roof as soon as something changed (can’t eat, can’t chew gum, cant drink, can’t have the flu, a cold, a cold sore ;), can’t argue with your wife, your boss, the both of them, can’t listen to music and hum, can’t do anything with your mouth but speak, not even salivate when the nice secretary passes by).
Not worth the effort.

Jean

Speech recognition, processing, and synthesis is a very hard and open research problem. There are many, many people studying it (and have been for some time now) and while gains are being made, its still quite hard (which is why voice recognition software tends to not work so well…to avoid all the technical details, its largely based on statistical signal processing and developing very, very robust AI).

Even if there was a library for python that was halfway decent for voice processing, it’d be hard (presently) to integrate into blender and have the script control most UI functions. Even “simple things” like extrude and scale would have to be manually implemented using space handlers and probably some wacky event catching scheme (since script operation currently blocks the user interface). Perhaps you could code something in Python external to Blender that could somehow interface with the application, but that may not be worth the trouble.

I’ve never used Dragon, but I assume it allows you to map certain words/phrases to a key combination, correct? You’d just have to find a way to do that in python or (insert language here), but at that point, you’re really talking about a standalone application.

There are several free/oss speech recognition software and some toolkits including:
http://simon-listens.org/index.php?id=122&L=1
http://cmusphinx.sourceforge.net/html/cmusphinx.php

This is traditionally the part where someone says “wait for 2.5” because then you’ll be able to set up some python bindings between the UI and something like this: http://www.surguy.net/articles/speechrecognition.xml

That being said, I’m not sure having speech recognition in Blender would be even remotely worth the hassle; probably just get annoying after about 10 minutes.

Well, post 2.50 we will be able to plug two mouses/tablets (maybe three if one can fly by the seat of one’s pants :wink: ) : it could be that we will have to use speech recognition anyway… although it is possible to use Morse code by blinking.

Jean

a very cool function of Avid is that it matches the speech to the script, so its accuracy is much much higher since it has the words being spoken. It then also keys when those words are spoken, and spreads the script out to the actual timeline. very cool feature vid on the Avid site if you have some time to waste.

uhm, i think that what you are not taking into account is that we wouldn´t need a complete set of words recognition, we wouldn´t need to cover all the English dictionary; it is just a matter of creating “vocal shortcuts” so, for example, instead of adding an sphere and writing “R, X, 90” we would say “3DView, Add, Sphere, R, X, 90” or “3D View, Cube, Ipo,material, N, min 1, max 2”, in which 3DView is the already opened window that will get the focus of the commands, Cube is the name of the object, Ipo is the new focused window, material is the property to be modified, N is the short for the properties window on the IpoEditor and so.
It is just a matter of taking the existing writed shorts and converting them to voice commands. That would be much more easier for training the program to understand us and also for building it i guess.
We do not need aaaaaaall the UI manipulated through the voice, but at least a half and a half would free us a half of being just another imput device of the computer.
Wadaya think?

Edit
Zoel, very interesting link.

Again: the software exist to do just that and trial as well as OSS versions are available, so, go ahead and try it.
Then come back with some experience. Chances are that you’ll understand that other methods of input are generally preferable. There are good reasons why speech recognition is rarely used.

J.

Personally I like the silence when I am modelling, but yes, I know persons who want to make noices, like “bink”, even if it does not effect the workflow.

But one thing what I came to think after your suggestion: In some phase we will need better tools for manipulating voices in overall. There are multiple needs in animation like lipsync, sound effects, syncing the music to film, speech synthesis. What I feel this could be, is manipulating the voice spectrum as we use the nodes today to manipulate color channels or brightness. Node editors and logic bricks could even be used to study speech synthesis in more depth. Today there are some tools like Praat for these purposes, but I feel Blender could be more ideal environment for this kind of things. And after we success to somewhat understand manipulating the voice spectre to produce voice effects, we are not far away from using it the other way around.

pablow, just search the web. This is still not production ready possibility at all. But the subject of science research. So Blender definitely is not the right apps for such testing. As there will be some operating systems and apps fully implementing speech recognition without hereabove mentioned problems, maybe some coder will try to bring this also to Blender. But his may be after 5 or 10 years I guess.

BUt now it would be just a waste of their precious time.

mac os x has speakable items,supposedly it can program some application comands
although its tricky and not consistant for me ;i got fed up with it