Nuance - Dragon NaturallySpeaking Solutions

Skip to content

Dragon AudioMining

Enable the Text Search of Words and Phrases Spoken in Audio

The success of any online information management solution hinges upon making the information easily accessible. When the information is not in text form, but instead is in spoken content trapped within audio media, easy access to this information becomes more difficult. Conventional searches for words or phrases will not pinpoint where those terms are discussed within media files — until now. Dragon AudioMining is a revolutionary solution that automatically makes it possible to use text keywords and phrases to search audio files. Using Dragon AudioMining, the valuable information in news clips, government hearings, earnings announcements and analyst briefings can now be searched as easily as Web pages.

Nuance's AudioMining technology creates XML speech index data for every word spoken within an audio file. The XML speech index data includes word, timestamp, confidence levels and meta data associated with the speech information, and can be created from broadcast and telephony quality sources. XML speech index data makes the speech information within rich media files visible to text-based Web crawlers and search products, unlocking the information hidden within digital audio files on the Web and within private media archives.

Automatic XML speech indexing eliminates the time and cost associated with manually indexing rich media, enables the indexing of 100% of the speech information within audio files, and integrates with standard text-search products to enable the rapid access to specific audio content. Applications include enabling text-based search and the precise playback of audio within Web search, content management, CRM, media archive applications.

The underlying speech recognition software that AudioMining employs is the same that is used in Dragon NaturallySpeaking. This technology has been proven in the marketplace on hundreds of thousands of machines. AudioMining extends the underlying statistical models that support this recognition engine to scalable server architecture, specifically tailored for use in the production and presentation of streaming media.

The Dragon AudioMining software development kit (SDK) enables XML speech indexing and search capabilities to be added to commercial, Web and custom applications. Ideal for application and Web developers, system integrators and OEM customers.

Many organizations in the corporate, legal, call recording, and broadcasting markets are looking for ways to leverage the information embedded in their spoken content assets. The Dragon AudioMining SDK allows software developers to integrate high accuracy speech cataloging capabilities into virtually any hardware or software environment.

The AudioMining SDK provides developers with outstanding speech recognition accuracy on a wide variety of speech qualities, including broadcast and telephony. This is accomplished by offering developers a choice of acoustic models, which work in conjunction with complex language analysis to produce unsurpassed results. Accuracy can be further enhanced through the use of the AudioMining SDK Voc Tool. This tool provides an automated method for the speech engine active dictionary to be updated with unique terms, such as industry specific terminology and proper names.

The AudioMining SDK will create an XML speech index file of the spoken content in digital media. This standard format file contains all the recognized words, as well as the time stamps for when those words were spoken in the digital media file. Developers can easily use this XML file as source data in virtually any searching or parsing process. The AudioMining SDK has been designed for integration, maintaining a small system footprint and memory requirements while still delivering better accuracy than systems twice its size.

The Dragon AudioMining SDK utilizes the same core speech recognition technologies that have powered the award-winning Dragon NaturallySpeaking family of speech recognition products for over a decade. Now this technology takes a groundbreaking new step with Nuance's introduction of the Dragon AudioMining SDK, compatible with both the Windows 2000 and the Windows XP operating systems. Other components of the Dragon AudioMining SDK include:

  • AudioMining Runtime License Packs
    For deploying applications developed using the Dragon AudioMining SDK Development System.
  • AudioMining Custom Vocabularies
    For increasing recognition accuracy in content specific to the legal, medical, financial and technology markets.

Dragon Audiomining SDK supports the following audio file types in both mono and stereo (8 kHz to 99 kHz):

  • WAVE PCM (mono only)
  • MS ADPCM
  • IMA ADPCM
  • a-law
  • mu-law
  • VOX
  • MP3

Datasheet

  • AudioMining datasheet