Xenova/whisper-medium.en · Add/update the quantized ONNX model files and README.md for Transformers.js v3

Add/update the quantized ONNX model files and README.md for Transformers.js v30b7684a2

2 days ago

Applied Quantizations

✅ Based on `decoder_model.onnx` with slimming

↳ ✅ fp16: decoder_model_fp16.onnx (added)
↳ ✅ int8: decoder_model_int8.onnx (added)
↳ ✅ uint8: decoder_model_uint8.onnx (added)
↳ ✅ q4: decoder_model_q4.onnx (added)
↳ ✅ q4f16: decoder_model_q4f16.onnx (added)
↳ ✅ bnb4: decoder_model_bnb4.onnx (added)

✅ Based on `encoder_model.onnx` with slimming

↳ ❌ int8: encoder_model_int8.onnx (added but JS-based E2E test failed)

dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:25
            __classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
                                                                                           ^

Error: Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'
    at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:25:92)
    at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:67:29)
    at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0

↳ ✅ uint8: encoder_model_uint8.onnx (added)
↳ ✅ q4: encoder_model_q4.onnx (added)
↳ ✅ q4f16: encoder_model_q4f16.onnx (added)
↳ ✅ bnb4: encoder_model_bnb4.onnx (added)

✅ Based on `decoder_with_past_model.onnx` with slimming

↳ ✅ fp16: decoder_with_past_model_fp16.onnx (added)
↳ ✅ int8: decoder_with_past_model_int8.onnx (added)
↳ ✅ uint8: decoder_with_past_model_uint8.onnx (added)
↳ ✅ q4: decoder_with_past_model_q4.onnx (added)
↳ ✅ q4f16: decoder_with_past_model_q4f16.onnx (added)
↳ ✅ bnb4: decoder_with_past_model_bnb4.onnx (added)

✅ Based on `decoder_model_merged.onnx` without slimming

↳ ✅ fp16: decoder_model_merged_fp16.onnx (replaced because it was invalid)
↳ ✅ int8: decoder_model_merged_int8.onnx (added)
↳ ✅ uint8: decoder_model_merged_uint8.onnx (added)
↳ ✅ q4: decoder_model_merged_q4.onnx (added)
↳ ✅ q4f16: decoder_model_merged_q4f16.onnx (added)
↳ ✅ bnb4: decoder_model_merged_bnb4.onnx (added)

Upload README.md with huggingface_hubb0f0e4ef

Upload README.md with huggingface_hubce75d076

Upload README.md with huggingface_hub7278a97b

Upload README.md with huggingface_hub3327e812

Add/update the quantized ONNX model files and README.md for Transformers.js v3

Applied Quantizations

✅ Based on decoder_model.onnx with slimming

✅ Based on encoder_model.onnx with slimming

✅ Based on decoder_with_past_model.onnx with slimming

✅ Based on decoder_model_merged.onnx without slimming

✅ Based on `decoder_model.onnx` with slimming

✅ Based on `encoder_model.onnx` with slimming

✅ Based on `decoder_with_past_model.onnx` with slimming

✅ Based on `decoder_model_merged.onnx` without slimming