torchgeo (TorchGeo)

ZennyKenny

posted an update 1 day ago

Post

146

🎉 Novoyaz is live.

A few months ago, I built a quick POC in Hugging Face that used a fine-tuned variant of OpenAI's OSS-20B model that I trained to convert the text from pre-reform Russian-language documents into modern Russian orthography.

⚡️ This morning, I launched novoyaz.io.

This is a production app, the frontend for which I built in like two hours with Lovable, that uses that same fine-tuned model for transliteration, but now has a bunch of extra features that make using it even easier (like taking and uploading pictures with your on-device camera for example 😅).

👉 If you're a researcher, or know a researcher, for whom this app will improve their day-to-day workflows, please get in touch with me.

prithivMLmods

posted an update 1 day ago

Post

1454

Introducing Photo-Mate-v2, based on FLUX.1-Kontext-dev, for advanced image manipulation tasks. It supports transforming scenes into top-down/bottom-up perspectives, CAM-right/left-view and its reverse, as well as general kontext-specified object removal. Below is the list of demos and adapters.🔥🤗

➤ Spaces [Demo] : prithivMLmods/Kontext-Photo-Mate-v2

Kontext-Adapters :
✦ Kontext-Bottom-Up-View: prithivMLmods/Kontext-Bottom-Up-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Top-Down-View: prithivMLmods/Kontext-Top-Down-View
✦ Kontext-CAM-Left-View: prithivMLmods/Kontext-CAM-Left-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Unblur-Upscale: prithivMLmods/Kontext-Unblur-Upscale
✦ Kontext-0811-exp: prithivMLmods/Kontext-0811-exp

Photo-Mate Collection:
✦ Kontext CAM Angles: https://huggingface.co/collections/prithivMLmods/kontext-cam-angles
✦ i2i - Kontext (exp): https://huggingface.co/collections/prithivMLmods/i2i-kontext-exp
✦ LZO-1 (Lossless Zoom Operator): https://huggingface.co/collections/prithivMLmods/lzo-1-lossless-zoom-operator

Related-Apps:
✦ Photo-Mate [Version 1.0]: prithivMLmods/Photo-Mate-i2i
✦ Image Generation Apps [Collection]: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
@prithivMLmods

ronantakizawa

posted an update 5 days ago

Post

1070

Introducing JFLEG-JA, a new Japanese language error correction benchmark with 1,335 sentences, each paired with 4 high-quality human corrections 🎉

Inspired by the English JFLEG dataset, this dataset covers diverse error types, including particle mistakes, kanji mix-ups, incorrect contextual verb, adjective, and literary technique usage.

You can use this for evaluating LLMs, few-shot learning, error analysis, or fine-tuning correction systems.

ronantakizawa/jfleg-japanese

#japanese #evals #benchmark

upgraedd

posted an update 6 days ago

Post

255

upgraedd/AGI_COMPLETE

If you take the time to inspect this code, I promise many things will change in your mind.

prithivMLmods

posted an update 6 days ago

Post

1202

A week ago, I shared a post about the latest transformers test implementation of DeepSeek-OCR Compatibility (https://tinyurl.com/ykc4mm66). Now, I’m dropping the most compatible version of it to support the model with the latest transformers. 🤗🔥

➠ DeepSeek-OCR-Latest-BF16.I64: prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
➠ DeepSeek OCR [exp] : prithivMLmods/DeepSeek-OCR-experimental

✅Supports the latest transformers v4.57.1
✅torch: 2.6.0+cu124 (or) the latest version (i.e., torch 2.9.0)
✅cuda version: 12.4
✅users can also opt out of specific attention implementations if desired.

✨Previous version: strangervisionhf/deepseek-ocr-latest-transformers
↗️Related Blog: https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms
✨Community Page:

strangervisionhf
✨Original Model Page: deepseek-ai/DeepSeek-OCR

To know more about it, visit the app page or the respective model page!

ZennyKenny

posted an update 9 days ago

Post

309

Anyone got the scoop on a good OCR model that's available on inference?

Keen to make use of an endpoint (gated or not -- happy to pay for usage) for a personal project, but not so keen to pay for the GPU hosting myself.

🙈🙈🙈

4 replies

·

ronantakizawa

posted an update 9 days ago

Post

1703

Introducing the Medical-o1-Reasoning-SFT-Japanese dataset 🎉

This dataset is a Japanese dataset consisting questions, reasoning, and answer results for complex medical topics.

#japanese #medical #dataset

ronantakizawa/Medical-o1-Reasoning-SFT-Japanese

prithivMLmods

posted an update 10 days ago

Post

2533

A small blog post titled - Hall of Multimodal OCR VLMs and Demonstrations has been published on ↗️ https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms on behalf of

strangervisionhf

It discusses the latest trends in OCR models, the multilingual support offered by modern OCR systems, their unique capabilities, OCR benchmark model comparisons, transformer-based implementations, and strategies for streamlining transformers compatibility.

prithivMLmods

posted an update 12 days ago

Post

3805

Implemented DeepSeek-OCR to support the latest transformers on the

strangervisionhf page. The page includes the model weights and corrected configuration, which fix the issues and allow transformers inference to run smoothly.🤗🔥

> Model: strangervisionhf/deepseek-ocr-latest-transformers
> Demo Space: prithivMLmods/DeepSeek-OCR-experimental

✅Supports the latest transformers
✅You can also opt out of the attention implementation if needed.
✅Supports torch version 2.6.0 or higher
✅torch version cuda: 12.4

If you are interested in experimenting with new things and streamlining compatibility, the

strangervisionhf organization is open for you, and you can join the community.

> Multimodal Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0, https://huggingface.co/collections/strangervisionhf/october-2025-models

> Thank you, @merve , for assigning the blazing-fast Zero GPU support!

> Notebook : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepSeek-OCR-Demo/deepseek_ocr_demo.ipynb

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 13 days ago

Post

1494

Introducing Gliese-OCR-7B-Post2.0-final, a document content-structure retrieval VLM designed for content extraction (OCR), summarization, and document visual question answering. This is the fourth and final model in the Camel Doc OCR VLM series, following Gliese-OCR-7B-Post1.0. The model delivers superior accuracy across a wide range of document types, including scanned PDFs, handwritten pages, structured forms, and analytical reports.🚀🤗

> Gliese-OCR-7B-Post2.0-final : prithivMLmods/Gliese-OCR-7B-Post2.0-final
> Gliese-OCR-7B-Post1.0 (previous) : prithivMLmods/Gliese-OCR-7B-Post1.0
> Gliese OCR Post-x.0 (collection) : https://huggingface.co/collections/prithivMLmods/gliese-ocr-post-x0
> Multimodal Implementations (collection) : https://huggingface.co/collections/prithivMLmods/multimodal-implementations
> Qwen VL Captions (other-collection) : https://huggingface.co/collections/prithivMLmods/qwen-vl-captions
> Run Demo Here : prithivMLmods/Gliese-OCR-7B-Post2.0-final
> GitHub (4bit) : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Gliese-OCR-7B-Post2.0-final(4bit)/Gliese_OCR_7B_Post2_0_final.ipynb

.
.
.
> To know more about it, visit the app page or the respective model page!!

prithivMLmods

posted an update 14 days ago

Post

1828

Here is the official Florence-2 Transformers-converted demo for the following vision models: florence-community/Florence-2-large, florence-community/Florence-2-large-ft, florence-community/Florence-2-base, and florence-community/Florence-2-base-ft. These models support tasks such as object detection, captioning, detailed captioning, more detailed captioning, dense region captioning, region proposal, OCR, and OCR with region. Try the official demo at the link below:

> Space: prithivMLmods/florence2-vision-models
> Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

> To know more about it, visit the app page or the respective model page!!

ronantakizawa

posted an update 14 days ago

Post

1470

Introducing the Finance-Instruct-500k-Japanese dataset 🎉

This is a Japanese-translated version of the @Josephgflowers Finance-Instruct-500k dataset, which includes complex questions and answers related to finance and Economics.

#datasets #finance #finance-instruct #japanese

ronantakizawa/Finance-Instruct-500k-Japanese

SelmaNajih001

posted an update 16 days ago

Post

2773

How Financial News Can Be Used to Train Good Financial Models 📰
Numbers tell you what happened, but news tells you why.
I’ve written an article explaining how news can be used to train AI models for sentiment analysis and better forecasting. Hope you find it interesting!

Read it here: https://huggingface.co/blog/SelmaNajih001/llms-applied-to-finance

I would love to read your opinions! I’m open to suggestions on how to improve the methodology and the training

1 reply

·

SelmaNajih001

posted an update 18 days ago

Post

3000

Which is the best model to use as a signal for investment?
Here who is gaining the most:
SelmaNajih001/InvestmentStrategyBasedOnSentiment

The Space uses titles from this dataset:
📊 SelmaNajih001/Cnbc_MultiCompany

Given a news title, it calculates a sentiment score : if the score crosses a certain threshold, the strategy decides to buy or sell.
Each trade lasts one day, and the strategy then computes the daily return.
For Tesla the best model seems to be the regression 👀
Just a quick note: the model uses the closing price as the buy price, meaning it already reflects the impact of the news.

ZennyKenny

posted an update 18 days ago

Post

311

Has anyone tried Strawberry Browser? https://strawberrybrowser.com/?ref_id=8D41NQCY7

😇 Shamelessly sharing my referral link here to move up in the waitlist line. Help me out, give it a click.

2 replies

·

ronantakizawa

posted an update 19 days ago

Post

1551

Excited to announce 4 AWQ quantized models from #AllenAI! 🎉

Molmo-7B-D AWQ (14GB→5GB): Efficient VLM performing between GPT-4V and GPT-4o on academic benchmarks, with just 6.1% perplexity degradation.

MolmoAct-7B-D AWQ (14GB→6GB): Specialized robotic manipulation model reduced by ~57%.

Molmo-72B AWQ (145GB→38GB): VLM with Qwen2-72B decoder that performs competitively with GPT-4, achieving only 10.5% perplexity degradation while saving 107GB of memory.

OLMo-2-32B-Instruct AWQ (64GB→17GB): LLM post-trained on Tülu 3 with 3% perplexity degradation while saving ~50GB.

All VLMs only had their text models quantized.

ronantakizawa/molmo-7b-d-awq
ronantakizawa/molmoact-7b-d-awq
ronantakizawa/molmo-72b-awq
ronantakizawa/olmo2-32b-instruct-awq

prithivMLmods

posted an update 20 days ago

Post

2263

Let’s have the comparison again with Multimodal OCR3:

nanonets/Nanonets-OCR2-3B vs allenai/olmOCR-2-7B-1025 vs rednote-hilab/dots.ocr vs datalab-to/chandra

Try it here @ prithivMLmods/Multimodal-OCR3

Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

ronantakizawa

posted an update 22 days ago

Post

3810

Introducing AWQ and GPTQ quantized versions of SmolVLM from Hugging Face!

These models only had their text models quantized, and had a 50% model size reduction (4GB~2GB) while keeping model degradation under 1% on the DocVQA benchmark.

#huggingface #smolvlm #smollm

ronantakizawa/SmolVLM-Instruct-awq

ronantakizawa/SmolVLM-Instruct-gptq

SelmaNajih001

posted an update 24 days ago

Post

665

How Financial News Can Be Used to Train Good Financial Models 📰
Numbers tell you what happened, but news tells you why.
I’ve written an article explaining how news can be used to train AI models for sentiment analysis and better forecasting. Hope you find it interesting!

Read it here: https://huggingface.co/blog/SelmaNajih001/llms-applied-to-finance

I would love to read your opinions! I’m open to suggestions on how to improve the methodology and the training

1 reply

·

prithivMLmods

posted an update 25 days ago

Post

1897

Now you can try all the latest state-of-the-art multimodal vision-language models from the Qwen3-VL series demo on Hugging Face Spaces — including 4B, 8B, and 30B (Instruct, 4B-Thinking) variants. I’ve also uploaded the weights for the Abliterated variants of these models, up to 30B parameters. Check out the Spaces and model links below! 🤗🔥

✨ Qwen3-VL[4B,8B]: prithivMLmods/Qwen3-VL-Outpost
✨ Qwen3-VL-30B-A3B-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✨ Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Qwen3-VL Abliterated Model Collection [ Version 1.0 ]

✨ Qwen3-VL-8B-Instruct-abliterated: prithivMLmods/Qwen3-VL-8B-Instruct-abliterated
✨ Qwen3-VL-4B-Instruct-abliterated: prithivMLmods/Qwen3-VL-4B-Instruct-abliterated
✨ Qwen3-VL-8B-Thinking-abliterated: prithivMLmods/Qwen3-VL-8B-Thinking-abliterated
✨ Qwen3-VL-4B-Thinking-abliterated: prithivMLmods/Qwen3-VL-4B-Thinking-abliterated
✨ Qwen3-VL-30B-A3B-Instruct-abliterated: prithivMLmods/Qwen3-VL-30B-A3B-Instruct-abliterated
✨ Qwen3-VL-30B-A3B-Thinking-abliterated: prithivMLmods/Qwen3-VL-30B-A3B-Thinking-abliterated

⚡Collection: prithivMLmods/qwen3-vl-abliteration-oct-1625-68f0e3e567ef076594605fac

Note: This is version 1.0 of the Abliteration of the Qwen3-VL series of models. It may perform sub-optimally in some cases. If you encounter any issues, please open a discussion.

TorchGeo

AI & ML interests

Recent Activity

AI & ML interests

Recent Activity

Team members 41

torchgeo's activity