That seriously would be very helpful!
Akhil Theerthala PRO
Akhil-Theerthala
AI & ML interests
None yet
Recent Activity
liked
a model
2 days ago
sentence-transformers/all-MiniLM-L12-v2
liked
a model
7 days ago
nvidia/domain-classifier
updated
a model
8 days ago
Akhil-Theerthala/kuvera_12b_v0.2.0
Organizations

reacted to
DualityAI-RebekahBogdanoff's
post with β€οΈ
about 2 months ago
Post
3611
As part of Duality AIβs recent Kaggle competition, weβve released a free, fully customizable cloud scenario designed to help you create targeted datasets with YOLO-compatible labels.
The cloud simulation lets you customize the:
πΈ camera distance
ποΈ film grain variation
πΌοΈbackground objects,
β and more!
Create the dataset that you need by following this link: https://falcon.duality.ai/secure/scenarios/edit/cca0bc47-265a-4f67-843f-a434b63271b3?utm_source=huggingface&utm_medium=social&utm_campaign=general
Iβve attached an instructional video we used for the competition, but this feature is free for anyone who has an account. https://vimeo.com/1091271731?share=copy
The cloud simulation lets you customize the:
πΈ camera distance
ποΈ film grain variation
πΌοΈbackground objects,
β and more!
Create the dataset that you need by following this link: https://falcon.duality.ai/secure/scenarios/edit/cca0bc47-265a-4f67-843f-a434b63271b3?utm_source=huggingface&utm_medium=social&utm_campaign=general
Iβve attached an instructional video we used for the competition, but this feature is free for anyone who has an account. https://vimeo.com/1091271731?share=copy

posted
an
update
2 months ago
Post
1002
Kuvera v0.1.0 is now live!
A series of personal finance advisor models that try to resolve the queries by trying to understand the personβs psychological state and relevant context.
These are still prototypes that have much room for improvement.
Whatβs included in this release:
- Akhil-Theerthala/Kuvera-8B-v0.1.0: Qwen3-8B, meticulously fine-tuned on approximately 20,000 personal-finance inquiries.
- Akhil-Theerthala/Kuvera-14B-v0.1.0: LoRA on DeepSeek-R1-Distill-Qwen-14B, honed through training on about 10,000 chain-of-thought queries.
For those interested, the models and datasets are accessible for free (links in the comments). If you are curious about the upcoming version's roadmap, letβs connectβthere are many more developments I plan to make, and would definitely appreciate any help.
A series of personal finance advisor models that try to resolve the queries by trying to understand the personβs psychological state and relevant context.
These are still prototypes that have much room for improvement.
Whatβs included in this release:
- Akhil-Theerthala/Kuvera-8B-v0.1.0: Qwen3-8B, meticulously fine-tuned on approximately 20,000 personal-finance inquiries.
- Akhil-Theerthala/Kuvera-14B-v0.1.0: LoRA on DeepSeek-R1-Distill-Qwen-14B, honed through training on about 10,000 chain-of-thought queries.
For those interested, the models and datasets are accessible for free (links in the comments). If you are curious about the upcoming version's roadmap, letβs connectβthere are many more developments I plan to make, and would definitely appreciate any help.

reacted to
shukdevdatta123's
post with π
2 months ago
Post
3038
Excited to share my latest project: an AI-powered Educational Content Creator Assistant! πβ¨ Built with Gradio and OpenAI, it transforms PDF/DOCX documents into engaging study materials like interactive flashcards, quizzes, summaries, and lesson plans. Features GPU acceleration and downloadable HTML outputs. Perfect for educators and students! π #EdTech #AI #Python #ML #LLM
Features Include:
- β AI-powered content generation
- β Document processing (PDF/DOCX)
- β Interactive flashcards with slider functionality
- β Comprehensive summaries
- β Structured study notes
- β Quiz generation with multiple choice, short answer, and essay questions
- β Mind map structure creation
- β Detailed lesson plans
- β In-depth concept explanations
- β Practice problems (beginner to challenge levels)
- β Downloadable HTML outputs with interactivity
- β Gradio-based user interface
- β OpenAI API integration via OpenRouter
- β GPU acceleration with ZeroGPU
- β Real-time status updates for API and document processing
- β Support for multiple content types
- β Formatted HTML content with CSS styling
- β Print functionality in downloadable files
- β Error handling for document processing and API calls
- β Modular function design for content generation
Youtube: https://www.youtube.com/watch?v=4a52HcioWPk
Demo: https://shukdevdatta123-ecca.hf.space
Features Include:
- β AI-powered content generation
- β Document processing (PDF/DOCX)
- β Interactive flashcards with slider functionality
- β Comprehensive summaries
- β Structured study notes
- β Quiz generation with multiple choice, short answer, and essay questions
- β Mind map structure creation
- β Detailed lesson plans
- β In-depth concept explanations
- β Practice problems (beginner to challenge levels)
- β Downloadable HTML outputs with interactivity
- β Gradio-based user interface
- β OpenAI API integration via OpenRouter
- β GPU acceleration with ZeroGPU
- β Real-time status updates for API and document processing
- β Support for multiple content types
- β Formatted HTML content with CSS styling
- β Print functionality in downloadable files
- β Error handling for document processing and API calls
- β Modular function design for content generation
Youtube: https://www.youtube.com/watch?v=4a52HcioWPk
Demo: https://shukdevdatta123-ecca.hf.space

reacted to
onekq's
post with β€οΈβ€οΈ
2 months ago
Post
2201
Highly recommend the latest Gemini Flash. My favorite Google I/O gift. It ranks behind reasoning models but runs a lot faster than them. It beats DeepSeek v3.
onekq-ai/WebApp1K-models-leaderboard
Reasoning is good for coding, but not mandatory.
onekq-ai/WebApp1K-models-leaderboard
Reasoning is good for coding, but not mandatory.

reacted to
KaraKaraWitch's
post with π₯
2 months ago

reacted to
merve's
post with π₯
3 months ago
Post
6628
A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to Hugging Face transformers π₯
D-FINE is the sota real-time object detector that runs on T4 (free Colab) π€©
> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352
Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper π©
Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve π₯²βΉοΈ
D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate π€©
Another core idea behind this model is Global Optimal Localization Self-Distillation ‡οΈ
this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.
D-FINE is the sota real-time object detector that runs on T4 (free Colab) π€©
> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352
Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper π©
Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve π₯²βΉοΈ
D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate π€©
Another core idea behind this model is Global Optimal Localization Self-Distillation ‡οΈ
this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.

reacted to
jsulz's
post with π₯
3 months ago
Post
2756
At
xet-team
we've been hard at work bringing a new generation of storage to the Hugging Face community, and weβve crossed some major milestones:
π· Over 2,000 builders and nearing 100 organizations with access to Xet
π Over 70,000 model and dataset repositories are Xet-backed
π€― 1.4 petabytes managed by Xet
As we move repos from LFS to Xet for everyone we onboard, weβre pushing our content-addressed store (CAS). Check out the chart below π of CAS hitting up to 150 Gb/s throughput this past week.
All of this growth is helping us build richer insights. We expanded our repo graph, which maps how Xet-backed repositories on the Hub share bytes with each other.
Check out the current network in the image below (nodes are repositories, edges are where repos share bytes) and visit the space to see how different versions of Qwen, Llama, and Phi models are grouped together xet-team/repo-graph
Join the waitlist to get access! https://huggingface.co/join/xet

π· Over 2,000 builders and nearing 100 organizations with access to Xet
π Over 70,000 model and dataset repositories are Xet-backed
π€― 1.4 petabytes managed by Xet
As we move repos from LFS to Xet for everyone we onboard, weβre pushing our content-addressed store (CAS). Check out the chart below π of CAS hitting up to 150 Gb/s throughput this past week.
All of this growth is helping us build richer insights. We expanded our repo graph, which maps how Xet-backed repositories on the Hub share bytes with each other.
Check out the current network in the image below (nodes are repositories, edges are where repos share bytes) and visit the space to see how different versions of Qwen, Llama, and Phi models are grouped together xet-team/repo-graph
Join the waitlist to get access! https://huggingface.co/join/xet

reacted to
ZeroWw's
post with π
3 months ago
Post
1877
A few good posts about AI.
Beyond the Mirror: AI's Leap from Imitation to Experience
https://nonartificialintelligence.blogspot.com/2025/04/beyond-mirror-ais-leap-from-imitation.html
The Siren Song of the LLMs: A Cautionary Tale of Anthropomorphism and Artificial Intelligence
https://nonartificialintelligence.blogspot.com/2024/08/the-siren-song-of-llms-cautionary-tale.html
Still Waiting: Gemini Flash 1.5's Second Letter to Google.
https://nonartificialintelligence.blogspot.com/2025/04/still-waiting-gemini-flash-15s-second.html
Beyond the Mirror: AI's Leap from Imitation to Experience
https://nonartificialintelligence.blogspot.com/2025/04/beyond-mirror-ais-leap-from-imitation.html
The Siren Song of the LLMs: A Cautionary Tale of Anthropomorphism and Artificial Intelligence
https://nonartificialintelligence.blogspot.com/2024/08/the-siren-song-of-llms-cautionary-tale.html
Still Waiting: Gemini Flash 1.5's Second Letter to Google.
https://nonartificialintelligence.blogspot.com/2025/04/still-waiting-gemini-flash-15s-second.html

reacted to
merterbak's
post with β€οΈ
3 months ago
Post
3624
FlowReasoner is a new system that builds a custom set of small AI agents for every user question. Unlike search based methods it uses reasoning driven optimization with external execution feedback.
β First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. π€
β Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. π‘
β Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.
FlowReasoner: Reinforcing Query-Level Meta-Agents (2504.15257)
Code: https://github.com/sail-sg/flowreasoner
β First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. π€
β Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. π‘
β Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.
FlowReasoner: Reinforcing Query-Level Meta-Agents (2504.15257)
Code: https://github.com/sail-sg/flowreasoner

reacted to
Jaward's
post with π
3 months ago
Post
2259
New reasoning algo just dropped: Adaptive Parallel Reasoning
βwe propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.β
Paper: https://arxiv.org/pdf/2504.15466
Code: https://github.com/Parallel-Reasoning/APR
βwe propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.β
Paper: https://arxiv.org/pdf/2504.15466
Code: https://github.com/Parallel-Reasoning/APR

reacted to
Kseniase's
post with π
4 months ago
Post
7478
11 new types of RAG
RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.
Here are 11 latest RAG types:
1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization
2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store
3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors
4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks
5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts
6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation
7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers
8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways
To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai
Subscribe to the Turing Post: https://www.turingpost.com/subscribe
Read further π
RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.
Here are 11 latest RAG types:
1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization
2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store
3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors
4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks
5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts
6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation
7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers
8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways
To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai
Subscribe to the Turing Post: https://www.turingpost.com/subscribe
Read further π

reacted to
KaiChen1998's
post with π
5 months ago
Post
4947
π’ Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!
π€ EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.
β¨ EMOVA Highlights
β State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
β Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
β Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!
π₯ You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: Emova-ollm/EMOVA-demo
π€ EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.
β¨ EMOVA Highlights
β State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
β Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
β Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!
π₯ You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: Emova-ollm/EMOVA-demo

replied to
burtenshaw's
post
5 months ago
Thanks. I was needing it.
A fascinating week indeed!

reacted to
merve's
post with π₯
6 months ago
Post
5488
Oof, what a week! π₯΅ So many things have happened, let's recap!
merve/jan-24-releases-6793d610774073328eac67a9
Multimodal π¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π
- UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€―
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Multimodal π¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π
- UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€―
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images