Commits · Ansemin101/Markit

Update embedding model to Google Generative AI and enhance vector store functionality

4dfec96

Running

AnseMin commited on about 1 month ago

Refactor LimitedEnsembleRetriever for improved compatibility and functionality

5da24ca

AnseMin commited on Jun 29

Enhance vector store retrieval with limited results

9e9e9ff

AnseMin commited on Jun 29

Refactor OCR configuration in DoclingParser to use EasyOCR exclusively

18e6067

AnseMin commited on Jun 29

Enhance DoclingParser for CPU-only processing and improved error handling

5e0609f

AnseMin commited on Jun 29

Implement ZeroGPU support in DoclingParser for enhanced document processing

d66e90c

AnseMin commited on Jun 29

Enhance README and parser functionality for improved document processing

4a97b0c

AnseMin commited on Jun 29

Integrate Gemini API for enhanced image processing in MarkItDown

033e4ba

AnseMin commited on Jun 27

Refactor UI components for modular architecture and enhance functionality

6ea41ec

AnseMin commited on Jun 27

Refactor document ingestion and chunking to support LaTeX content

63279a9

AnseMin commited on Jun 26

Enhance UI with new Query Ranker feature and improve document search capabilities

623ad58

AnseMin commited on Jun 25

Add advanced retrieval strategies and update dependencies for RAG implementation

21c909d

AnseMin commited on Jun 25

Enhance Docling and Mistral OCR parsers with improved response handling and logging

c61b4e2

AnseMin commited on Jun 25

Enhance multi-document processing capabilities in parsers

d437733

AnseMin commited on Jun 25

Implement multi-document processing capabilities and enhance UI

111954a

AnseMin commited on Jun 25

Add data clearing service and vector store management

f46dfbd

AnseMin commited on Jun 24

Refactor document ingestion and output file handling

3f1b4af

AnseMin commited on Jun 24

Import configuration from core module in gemini_flash_parser.py to enhance parser functionality.

63f3b68

AnseMin commited on Jun 24

Update .gitignore and enhance README with data management instructions

a4f1c9e

AnseMin commited on Jun 24

Enhance RAG (Retrieval-Augmented Generation) functionality and dependencies

575f1c7

AnseMin commited on Jun 23

Add Docling support for advanced document processing

c0c51c2

AnseMin commited on Jun 23

Implement environment-based UI launch configuration in main.py

57f6aa0

AnseMin commited on Jun 22

Refactor and enhance application structure for Markit_v2

a773878

AnseMin commited on Jun 22

Minor UI edit

55627c9

AnseMin commited on Mar 31

New feature: Mistral OCR

98482ce

AnseMin commited on Mar 31

Modifying the UI

49c5606

AnseMin commited on Mar 31

Change in UI : the bottom margin was white which was ugly so it's being change

7022f7f

AnseMin commited on Mar 31

Initial Implementation of Markitdown. Implemented:

dbdd7c8

AnseMin commited on Mar 31

Approach #2 -- converting latex output from GOT OCR to markdown

5b7f920

AnseMin commited on Mar 19

restore to version 1

23ad33e

AnseMin commited on Mar 19

Tabular is not defined

34d180e

AnseMin commited on Mar 19

Latex2Markdown display changes --attemtp1

33f1b65

AnseMin commited on Mar 19

Error: Error processing document with GOT-OCR: cannot pickle '_thread.lock' object

4cac30a

AnseMin commited on Mar 19

ERROR - Failed to load GOT-OCR model: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.

3415bc4

AnseMin commited on Mar 19

runtime error fix

36e49b4

Ansemin101 commited on Mar 19

Please work

4fab3b3

AnseMin commited on Mar 19

handling zerogpu usage

610b772

AnseMin commited on Mar 19

enable zerogpu

62f9c09

AnseMin commited on Mar 19

CUDA dfloat 16 issue again

dcdb7ac

AnseMin commited on Mar 18

fixing the missing path of run_ocr_2.0.py by editing got_ocr_parser.py

7d77a56

AnseMin commited on Mar 18

changes on logging for better debugging

5bb2b30

AnseMin commited on Mar 18

missing run_ocr_2.0.py file

c9c21c7

AnseMin commited on Mar 18

change in strategy --implementing github got ocr instead of hugging face model

0f5865d

AnseMin commited on Mar 18

script to convert markdown to latex, changing UI output to fit right with got ocr

ad248f7

AnseMin commited on Mar 18

restore to check point

2184c47

AnseMin commited on Mar 18

failed to load got ocr model

f89451e

AnseMin commited on Mar 18

Error: Too many output

fa54d05

AnseMin commited on Mar 18

Error: Error processing document with GOT-OCR: GOTQwenForCausalLM.chat() got an unexpected keyword argument 'format'

c4c3253

AnseMin commited on Mar 18

adding format=true

1312a63

AnseMin commited on Mar 18

complete reimplementation of got ocr

3332d94

AnseMin commited on Mar 18

Commit History

Update embedding model to Google Generative AI and enhance vector store functionality 4dfec96 Running

Refactor LimitedEnsembleRetriever for improved compatibility and functionality 5da24ca

Enhance vector store retrieval with limited results 9e9e9ff

Refactor OCR configuration in DoclingParser to use EasyOCR exclusively 18e6067

Enhance DoclingParser for CPU-only processing and improved error handling 5e0609f

Implement ZeroGPU support in DoclingParser for enhanced document processing d66e90c

Enhance README and parser functionality for improved document processing 4a97b0c

Integrate Gemini API for enhanced image processing in MarkItDown 033e4ba

Refactor UI components for modular architecture and enhance functionality 6ea41ec

Refactor document ingestion and chunking to support LaTeX content 63279a9

Enhance UI with new Query Ranker feature and improve document search capabilities 623ad58

Add advanced retrieval strategies and update dependencies for RAG implementation 21c909d

Enhance Docling and Mistral OCR parsers with improved response handling and logging c61b4e2

Enhance multi-document processing capabilities in parsers d437733

Implement multi-document processing capabilities and enhance UI 111954a

Add data clearing service and vector store management f46dfbd

Refactor document ingestion and output file handling 3f1b4af

Import configuration from core module in gemini_flash_parser.py to enhance parser functionality. 63f3b68

Update .gitignore and enhance README with data management instructions a4f1c9e

Enhance RAG (Retrieval-Augmented Generation) functionality and dependencies 575f1c7

Add Docling support for advanced document processing c0c51c2

Implement environment-based UI launch configuration in main.py 57f6aa0

Refactor and enhance application structure for Markit_v2 a773878

Minor UI edit 55627c9

New feature: Mistral OCR 98482ce

Modifying the UI 49c5606

Change in UI : the bottom margin was white which was ugly so it's being change 7022f7f

Initial Implementation of Markitdown. Implemented: dbdd7c8

Approach #2 -- converting latex output from GOT OCR to markdown 5b7f920

restore to version 1 23ad33e

Tabular is not defined 34d180e

Latex2Markdown display changes --attemtp1 33f1b65

Error: Error processing document with GOT-OCR: cannot pickle '_thread.lock' object 4cac30a

ERROR - Failed to load GOT-OCR model: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment. 3415bc4

runtime error fix 36e49b4

Please work 4fab3b3

handling zerogpu usage 610b772

enable zerogpu 62f9c09

CUDA dfloat 16 issue again dcdb7ac

fixing the missing path of run_ocr_2.0.py by editing got_ocr_parser.py 7d77a56

changes on logging for better debugging 5bb2b30

missing run_ocr_2.0.py file c9c21c7

change in strategy --implementing github got ocr instead of hugging face model 0f5865d

script to convert markdown to latex, changing UI output to fit right with got ocr ad248f7

restore to check point 2184c47

failed to load got ocr model f89451e

Error: Too many output fa54d05

Error: Error processing document with GOT-OCR: GOTQwenForCausalLM.chat() got an unexpected keyword argument 'format' c4c3253

adding format=true 1312a63

complete reimplementation of got ocr 3332d94

Update embedding model to Google Generative AI and enhance vector store functionality

4dfec96

Running

Refactor LimitedEnsembleRetriever for improved compatibility and functionality

5da24ca

Enhance vector store retrieval with limited results

9e9e9ff

Refactor OCR configuration in DoclingParser to use EasyOCR exclusively

18e6067

Enhance DoclingParser for CPU-only processing and improved error handling

5e0609f

Implement ZeroGPU support in DoclingParser for enhanced document processing

d66e90c

Enhance README and parser functionality for improved document processing

4a97b0c

Integrate Gemini API for enhanced image processing in MarkItDown

033e4ba

Refactor UI components for modular architecture and enhance functionality

6ea41ec

Refactor document ingestion and chunking to support LaTeX content

63279a9

Enhance UI with new Query Ranker feature and improve document search capabilities

623ad58

Add advanced retrieval strategies and update dependencies for RAG implementation

21c909d

Enhance Docling and Mistral OCR parsers with improved response handling and logging

c61b4e2

Enhance multi-document processing capabilities in parsers

d437733

Implement multi-document processing capabilities and enhance UI

111954a

Add data clearing service and vector store management

f46dfbd

Refactor document ingestion and output file handling

3f1b4af

Import configuration from core module in gemini_flash_parser.py to enhance parser functionality.

63f3b68

Update .gitignore and enhance README with data management instructions

a4f1c9e

Enhance RAG (Retrieval-Augmented Generation) functionality and dependencies

575f1c7

Add Docling support for advanced document processing

c0c51c2

Implement environment-based UI launch configuration in main.py

57f6aa0

Refactor and enhance application structure for Markit_v2

a773878

Minor UI edit

55627c9

New feature: Mistral OCR

98482ce

Modifying the UI

49c5606

Change in UI : the bottom margin was white which was ugly so it's being change

7022f7f

Initial Implementation of Markitdown. Implemented:

dbdd7c8

Approach #2 -- converting latex output from GOT OCR to markdown

5b7f920

restore to version 1

23ad33e

Tabular is not defined

34d180e

Latex2Markdown display changes --attemtp1

33f1b65

Error: Error processing document with GOT-OCR: cannot pickle '_thread.lock' object

4cac30a

ERROR - Failed to load GOT-OCR model: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.

3415bc4

runtime error fix

36e49b4

Please work

4fab3b3

handling zerogpu usage

610b772

enable zerogpu

62f9c09

CUDA dfloat 16 issue again

dcdb7ac

fixing the missing path of run_ocr_2.0.py by editing got_ocr_parser.py

7d77a56

changes on logging for better debugging

5bb2b30

missing run_ocr_2.0.py file

c9c21c7

change in strategy --implementing github got ocr instead of hugging face model

0f5865d

script to convert markdown to latex, changing UI output to fit right with got ocr

ad248f7

restore to check point

2184c47

failed to load got ocr model

f89451e

Error: Too many output

fa54d05

Error: Error processing document with GOT-OCR: GOTQwenForCausalLM.chat() got an unexpected keyword argument 'format'

c4c3253

adding format=true

1312a63

complete reimplementation of got ocr

3332d94