Commit
·
a6788a4
1
Parent(s):
a3834ca
Upload Salesforce/codegen-350M-mono ctranslate fp16 weights
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on
|
|
| 11 |
|
| 12 |
quantized version of [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono)
|
| 13 |
```bash
|
| 14 |
-
pip install hf-hub-ctranslate2>=2.0.
|
| 15 |
```
|
| 16 |
Converted on 2023-05-21 using
|
| 17 |
```
|
|
@@ -33,10 +33,11 @@ model = GeneratorCT2fromHfHub(
|
|
| 33 |
model_name_or_path=model_name,
|
| 34 |
device="cuda",
|
| 35 |
compute_type="int8_float16",
|
| 36 |
-
tokenizer=AutoTokenizer.from_pretrained("Salesforce/codegen-350M-mono")
|
| 37 |
)
|
| 38 |
outputs = model.generate(
|
| 39 |
text=["def print_hello_world():", "def hello_name(name:"],
|
|
|
|
| 40 |
)
|
| 41 |
print(outputs)
|
| 42 |
```
|
|
|
|
| 11 |
|
| 12 |
quantized version of [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono)
|
| 13 |
```bash
|
| 14 |
+
pip install hf-hub-ctranslate2>=2.0.8
|
| 15 |
```
|
| 16 |
Converted on 2023-05-21 using
|
| 17 |
```
|
|
|
|
| 33 |
model_name_or_path=model_name,
|
| 34 |
device="cuda",
|
| 35 |
compute_type="int8_float16",
|
| 36 |
+
# tokenizer=AutoTokenizer.from_pretrained("Salesforce/codegen-350M-mono")
|
| 37 |
)
|
| 38 |
outputs = model.generate(
|
| 39 |
text=["def print_hello_world():", "def hello_name(name:"],
|
| 40 |
+
max_length=64
|
| 41 |
)
|
| 42 |
print(outputs)
|
| 43 |
```
|