Tag Archives: multimodal

Sharing (old) knowledge#1: Speech recognition and synthesis

23 Mar

The following information was written by me one year ago on the Intelligent Multimodal Interfaces course internal forum. Since there are new students in need of fresh info, I reposted it here, make the best of it:


Sphinx Speech Recognition (open source), a cool demo can be seen here (controlling PureData with speech) – also I’ve found this how-to interesting (using python+sphynx, from the same author)

Sphinx4 (rewrite of Sphinx into java – more cross platform), there’s also a pocket version for mobile systems (iphone and so on) – it’s all part of this project from Carnegie Mellon.


eSpeak (written in C, either Win or Linux)
FreeTTS (java, cross)
flite (written in C, once again from CMU)

Web version: At&t text-to-speech (not open licensed)


Next conference: RecPad 2010

22 Oct

I will be presenting a poster at RecPad 2010, alongside Guilherme Fernandes. We will present our Trainable DTW-classifier for feeg gesture recognition, which we built on a foot-controller device that allows to control Mt-Djing application, as shown below.


(Controlling Mt-Djing with feet gestures)

The program is available now here.