Update README.md
Browse files
README.md
CHANGED
@@ -83,7 +83,7 @@ print(markdown)
|
|
83 |
|
84 |
```bash
|
85 |
pip install vllm
|
86 |
-
vllm serve scb10x/typhoon-ocr-
|
87 |
# then you can supply base_url in to ocr_document
|
88 |
```
|
89 |
|
@@ -105,7 +105,7 @@ from openai import OpenAI
|
|
105 |
from PIL import Image
|
106 |
from typhoon_ocr.ocr_utils import render_pdf_to_base64png, get_anchor_text
|
107 |
|
108 |
-
|
109 |
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
110 |
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
111 |
f"If the document contains images, use a placeholder like dummy.png for each image.\n"
|
@@ -128,7 +128,7 @@ def get_prompt(prompt_name: str) -> Callable[[str], str]:
|
|
128 |
:param prompt_name: The identifier for the desired prompt.
|
129 |
:return: The system prompt as a string.
|
130 |
"""
|
131 |
-
return
|
132 |
|
133 |
|
134 |
|
@@ -169,8 +169,8 @@ print(text_output)
|
|
169 |
*(Not Recommended): Local Model - Transformers (GPU Required)*:
|
170 |
```python
|
171 |
# Initialize the model
|
172 |
-
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("scb10x/typhoon-ocr-
|
173 |
-
processor = AutoProcessor.from_pretrained("scb10x/typhoon-ocr-
|
174 |
|
175 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
176 |
model.to(device)
|
@@ -209,7 +209,7 @@ print(text_output[0])
|
|
209 |
This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
|
210 |
|
211 |
```python
|
212 |
-
|
213 |
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
214 |
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
215 |
f"If the document contains images, use a placeholder like dummy.png for each image.\n"
|
@@ -240,7 +240,7 @@ repetition_penalty: 1.2
|
|
240 |
We recommend to inference typhoon-ocr using [vllm](https://github.com/vllm-project/vllm) instead of huggingface transformers, and using typhoon-ocr library to ocr documents. To read more about [vllm](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
|
241 |
```bash
|
242 |
pip install vllm
|
243 |
-
vllm serve scb10x/typhoon-ocr-
|
244 |
# then you can supply base_url in to ocr_document
|
245 |
```
|
246 |
|
|
|
83 |
|
84 |
```bash
|
85 |
pip install vllm
|
86 |
+
vllm serve scb10x/typhoon-ocr-3b --max-model-len 32000 --served-model-name typhoon-ocr-preview # OpenAI Compatible at http://localhost:8000 (or other port)
|
87 |
# then you can supply base_url in to ocr_document
|
88 |
```
|
89 |
|
|
|
105 |
from PIL import Image
|
106 |
from typhoon_ocr.ocr_utils import render_pdf_to_base64png, get_anchor_text
|
107 |
|
108 |
+
PROMPTS = {
|
109 |
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
110 |
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
111 |
f"If the document contains images, use a placeholder like dummy.png for each image.\n"
|
|
|
128 |
:param prompt_name: The identifier for the desired prompt.
|
129 |
:return: The system prompt as a string.
|
130 |
"""
|
131 |
+
return PROMPTS.get(prompt_name, lambda x: "Invalid PROMPT_NAME provided.")
|
132 |
|
133 |
|
134 |
|
|
|
169 |
*(Not Recommended): Local Model - Transformers (GPU Required)*:
|
170 |
```python
|
171 |
# Initialize the model
|
172 |
+
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("scb10x/typhoon-ocr-3b", torch_dtype=torch.bfloat16 ).eval()
|
173 |
+
processor = AutoProcessor.from_pretrained("scb10x/typhoon-ocr-3b")
|
174 |
|
175 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
176 |
model.to(device)
|
|
|
209 |
This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
|
210 |
|
211 |
```python
|
212 |
+
PROMPTS = {
|
213 |
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
214 |
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
215 |
f"If the document contains images, use a placeholder like dummy.png for each image.\n"
|
|
|
240 |
We recommend to inference typhoon-ocr using [vllm](https://github.com/vllm-project/vllm) instead of huggingface transformers, and using typhoon-ocr library to ocr documents. To read more about [vllm](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
|
241 |
```bash
|
242 |
pip install vllm
|
243 |
+
vllm serve scb10x/typhoon-ocr-3b --max-model-len 32000 --served-model-name typhoon-ocr-preview # OpenAI Compatible at http://localhost:8000
|
244 |
# then you can supply base_url in to ocr_document
|
245 |
```
|
246 |
|