Word boosting / context biasing

#34
by hoavu1234 - opened

Hi Nvidia team,

Thanks for publishing the model. I am wondering if we could increase the probability of certain words without training/finetuning the model?

I've tried changing the decoding strategy to "flashlight" and do word boosting but the model (TDT) does not support that. Is there any other way? Thank you.

NVIDIA org

Hi,

We are currently working on word/phrase boosting in a shallow-fusion mode for RNNT/TDT models (there is no need for additional training/finetuning; you need only a list with your key phrases). I hope this option will appear in the NeMo framework within 1-2 months. At the moment, you can use NGPU-LM for the shallow-fusion (https://github.com/NVIDIA/NeMo/pull/12729), but it needs training text data to build ngram-lm and does not work as a fair word boosting approach. It works as a context-biasing for the specific data domain.

Hi,
What should I do if I want to transcribe in real time?

Hi @aandrusenko , by any chance could you give me the PRs, features that allow training by a list of key phrases? I could also contribute to speed up the process

Sorry to bump. @aandrusenko , is there a ticket I could follow for the TDT context biasing support? I'm not 100% sure I'm following everything but at least for greedy decoding, it seems the approach from your paper should work with the TDT model. Support is just basically refactoring the merging portion of the context biasing code for a different set of hypothesis? Or now that the durations are predicted and you don't have a 1:1 mapping between frames and state building the context graph itself changes?

Sign up or log in to comment