The New York Times recently published an article about Google’s large scale deep learning project, which learns to discover patterns in large datasets, including... cats on YouTube!
What’s the point of building a gigantic cat detector you might ask? When you combine large amounts of data, large-scale distributed computing and powerful machine learning algorithms, you can apply the technology to address a large variety of practical problems.
With the launch of the latest Android platform release, Jelly Bean, we’ve taken a significant step towards making that technology useful: when you speak to your Android phone, chances are, you are talking to a neural network trained to recognize your speech.
Using neural networks for speech recognition is nothing new: the first proofs of concept were developed in the late 1980s(1), and after what can only be described as a 20-year dry-spell, evidence that the technology could scale to modern computing resources has recently begun to emerge(2). What changed? Access to larger and larger databases of speech, advances in computing power, including GPUs and fast distributed computing clusters such as the Google Compute Engine, unveiled at Google I/O this year, and a better understanding of how to scale the algorithms to make them effective learners.
The research, which reduces the error rate by over 20%, will be presented(3) at a conference this September, but true to our philosophy of integrated research, we’re delighted to bring the bleeding edge to our users first.
1 Phoneme recognition using time-delay neural networks, A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K.J. Lang. IEEE Transactions on Acoustics, Speech and Signal Processing, vol.37, no.3, pp.328-339, Mar 1989.
2 Acoustic Modeling using Deep Belief Networks, A. Mohamed, G. Dahl and G. Hinton. Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing.
3 Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition, N. Jaitly, P. Nguyen, A. Senior and V. Vanhoucke, Accepted for publication in the Proceedings of Interspeech 2012.