Word boosting / context biasing

#34

by hoavu1234 - opened May 20

Discussion

hoavu1234

May 20

•

edited May 20

Hi Nvidia team,

Thanks for publishing the model. I am wondering if we could increase the probability of certain words without training/finetuning the model?

I've tried changing the decoding strategy to "flashlight" and do word boosting but the model (TDT) does not support that. Is there any other way? Thank you.

aandrusenko

NVIDIA org May 20

Hi,

We are currently working on word/phrase boosting in a shallow-fusion mode for RNNT/TDT models (there is no need for additional training/finetuning; you need only a list with your key phrases). I hope this option will appear in the NeMo framework within 1-2 months. At the moment, you can use NGPU-LM for the shallow-fusion (https://github.com/NVIDIA/NeMo/pull/12729), but it needs training text data to build ngram-lm and does not work as a fair word boosting approach. It works as a context-biasing for the specific data domain.

hoavu1234

May 21

Thank you @aandrusenko

peaceli123

May 21

Hi，
What should I do if I want to transcribe in real time?

hoavu1234

15 days ago

Hi @aandrusenko , by any chance could you give me the PRs, features that allow training by a list of key phrases? I could also contribute to speed up the process

blueblob11

6 days ago

Sorry to bump. @aandrusenko , is there a ticket I could follow for the TDT context biasing support? I'm not 100% sure I'm following everything but at least for greedy decoding, it seems the approach from your paper should work with the TDT model. Support is just basically refactoring the merging portion of the context biasing code for a different set of hypothesis? Or now that the durations are predicted and you don't have a 1:1 mapping between frames and state building the context graph itself changes?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment