Deeje Cooley has been calling for an automated podcast transcription service for a while now. I’ve been meaning to comment on this, but wanted to tie in some other thoughts (what’s new).
But today, I learned (via Rubel) that a company named TVEyes intends to offer a service named Podscope (by the end of the month) that will do exactly that. (Video, err, “vodcasts”, “podshows”, insert-clever-meme-name-here, will be supported too. In fact, it’s hard to imagine that any audio content, including recordings of VOIP/Skype conference calls, university/conference lectures, etc. wouldn’t eventually be supported.)
Beautiful! Anything that makes audio and video more searchable, referenceable, etc. would be a huge boon if that were the only application.
But let’s project out to a future point (just riffing, not doing feasability assessment)… Apple, Odeo and others have made creating audio and video an order of magnitude easier. Google, Yahoo and Ourmedia‘s efforts allow for a ton of personal audio and video to be hosted server-side at no-cost to the publisher… All of it is automatically transcripted to text (and largely, publishers care enough to fix inevitable transcription mistakes)…
Now, let’s make a leap and assume that all of your transcripted audio is persisted within a personal voice profile (part of Identity 2.0, perhaps) and that it’s accessible via an API. That is, a single service, with a very-large vocabulary of your continuous, speaker dependent (i.e., personal) voice could be invoked by apps that you approve.
This would seem to be a huge boon for IVR applications. Finally, your bank would reliably understand you when you say “account balance” or “let me talk to the operator you lousy piece of…”. More interestingly for our focus here, Yahoo! and Google voice-driven search from your cell phone (etc) would have a large index of your pronunciations of common and industry specific jargon to work with.
Lots of other possibilities of course… drop on by if you care to riff. 🙂
Aside: The other topic… I wanted to tie all of this into the excellent series of posts that Tim Oren has been writing about Machine (Language) Translation, using blog text for language pair seeding. (I guess I just did.) Yes yes, tons of “fidelity” might be lost in early systems, but Voice->Text->Translated Text seems very compelling… and if you could close the loop by going from Translated Text -> Translated Voice… well ok, that’s just crazy talk. 😉
This post got me thinking that a transcription service could be done on a collaborative basis, much like Project Gutenberg.
I’ve posted further thoughts on my weblog.
Jared – An interesting idea in the very near-term. It also made me realize that I should have called out more clearly that the TVEyes solution is 100% automated transcription from audio file to text [I’ve updated my post to reflect this — thank you!].
We humans can relax while it churns, then double-check its work. Even better than parceling out the bits! 🙂
The problem is that the ASR [automated speech recognition] engines of today are not quite there. ASR has been around for a number of years but it’s about 80% perfect and it’ll be a long time before it’s close to 100%. While it’s a good idea to automate the translation from speech to text, there are going to be hiccups, trust me, I know a bit about this area 😉
On the flipside though, I’d like to see a TTS [text to speech] engine take people’s blogs I like and translate them into speech so I can use ODEO or AudioBlog and download the podcasts to my iPod and listen to them during my daily commute. I can then make coffee and not worry about missing out on my fav bloggers thoughts for the morning.
The downside is everyone’s podcast would have the same synthesized voice but at least I’d be able to drive and learn at the same time. We’ve always limped along with technology and I’d be willing to do that for this little experiment.