Akhil-Theerthala (Akhil Theerthala)

replied to Tonic's post 23 days ago

That seriously would be very helpful!

reacted to Tonic's post with 👍 23 days ago

Post

3270

🙋🏻‍♂️ Normalize adding compute & runtime traces to your model cards

2 replies

·

reacted to DualityAI-RebekahBogdanoff's post with ❤️ about 2 months ago

Post

3611

As part of Duality AI’s recent Kaggle competition, we’ve released a free, fully customizable cloud scenario designed to help you create targeted datasets with YOLO-compatible labels.

The cloud simulation lets you customize the:
📸 camera distance
🎞️ film grain variation
🖼️background objects,
➕ and more!

Create the dataset that you need by following this link: https://falcon.duality.ai/secure/scenarios/edit/cca0bc47-265a-4f67-843f-a434b63271b3?utm_source=huggingface&utm_medium=social&utm_campaign=general

I’ve attached an instructional video we used for the competition, but this feature is free for anyone who has an account. https://vimeo.com/1091271731?share=copy

posted an update 2 months ago

Post

1002

Kuvera v0.1.0 is now live!

A series of personal finance advisor models that try to resolve the queries by trying to understand the person’s psychological state and relevant context.

These are still prototypes that have much room for improvement.

What’s included in this release:
- Akhil-Theerthala/Kuvera-8B-v0.1.0: Qwen3-8B, meticulously fine-tuned on approximately 20,000 personal-finance inquiries.
- Akhil-Theerthala/Kuvera-14B-v0.1.0: LoRA on DeepSeek-R1-Distill-Qwen-14B, honed through training on about 10,000 chain-of-thought queries.

For those interested, the models and datasets are accessible for free (links in the comments). If you are curious about the upcoming version's roadmap, let’s connect—there are many more developments I plan to make, and would definitely appreciate any help.

reacted to shukdevdatta123's post with 👍 2 months ago

Post

3038

Excited to share my latest project: an AI-powered Educational Content Creator Assistant! 📚✨ Built with Gradio and OpenAI, it transforms PDF/DOCX documents into engaging study materials like interactive flashcards, quizzes, summaries, and lesson plans. Features GPU acceleration and downloadable HTML outputs. Perfect for educators and students! 🚀 #EdTech #AI #Python #ML #LLM

Features Include:

- ✅ AI-powered content generation
- ✅ Document processing (PDF/DOCX)
- ✅ Interactive flashcards with slider functionality
- ✅ Comprehensive summaries
- ✅ Structured study notes
- ✅ Quiz generation with multiple choice, short answer, and essay questions
- ✅ Mind map structure creation
- ✅ Detailed lesson plans
- ✅ In-depth concept explanations
- ✅ Practice problems (beginner to challenge levels)
- ✅ Downloadable HTML outputs with interactivity
- ✅ Gradio-based user interface
- ✅ OpenAI API integration via OpenRouter
- ✅ GPU acceleration with ZeroGPU
- ✅ Real-time status updates for API and document processing
- ✅ Support for multiple content types
- ✅ Formatted HTML content with CSS styling
- ✅ Print functionality in downloadable files
- ✅ Error handling for document processing and API calls
- ✅ Modular function design for content generation

Youtube: https://www.youtube.com/watch?v=4a52HcioWPk
Demo: https://shukdevdatta123-ecca.hf.space

reacted to onekq's post with ❤️❤️ 2 months ago

Post

2201

Highly recommend the latest Gemini Flash. My favorite Google I/O gift. It ranks behind reasoning models but runs a lot faster than them. It beats DeepSeek v3.

onekq-ai/WebApp1K-models-leaderboard

Reasoning is good for coding, but not mandatory.

1 reply

·

reacted to KaraKaraWitch's post with 🔥 2 months ago

Post

2736

> New Model
> Looks at Model Card
> "Open-Weights"

1 reply

·

reacted to merve's post with 🔥 3 months ago

Post

6628

A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to Hugging Face transformers 🔥

D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩

> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352

Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper 🎩

Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve 🥲☹️

D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate 🤩

Another core idea behind this model is Global Optimal Localization Self-Distillation ⤵️

this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.

2 replies

·

reacted to jsulz's post with 🔥 3 months ago

Post

2756

At

xet-team we've been hard at work bringing a new generation of storage to the Hugging Face community, and we’ve crossed some major milestones:

👷 Over 2,000 builders and nearing 100 organizations with access to Xet
🚀 Over 70,000 model and dataset repositories are Xet-backed
🤯 1.4 petabytes managed by Xet

As we move repos from LFS to Xet for everyone we onboard, we’re pushing our content-addressed store (CAS). Check out the chart below 👇 of CAS hitting up to 150 Gb/s throughput this past week.

All of this growth is helping us build richer insights. We expanded our repo graph, which maps how Xet-backed repositories on the Hub share bytes with each other.

Check out the current network in the image below (nodes are repositories, edges are where repos share bytes) and visit the space to see how different versions of Qwen, Llama, and Phi models are grouped together xet-team/repo-graph

Join the waitlist to get access! https://huggingface.co/join/xet

reacted to ZeroWw's post with 🚀 3 months ago

Post

1877

A few good posts about AI.

Beyond the Mirror: AI's Leap from Imitation to Experience
https://nonartificialintelligence.blogspot.com/2025/04/beyond-mirror-ais-leap-from-imitation.html

The Siren Song of the LLMs: A Cautionary Tale of Anthropomorphism and Artificial Intelligence
https://nonartificialintelligence.blogspot.com/2024/08/the-siren-song-of-llms-cautionary-tale.html

Still Waiting: Gemini Flash 1.5's Second Letter to Google.
https://nonartificialintelligence.blogspot.com/2025/04/still-waiting-gemini-flash-15s-second.html

reacted to merterbak's post with ❤️ 3 months ago

Post

3624

FlowReasoner is a new system that builds a custom set of small AI agents for every user question. Unlike search based methods it uses reasoning driven optimization with external execution feedback.

✅ First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. 🤖
✅ Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. 💡
✅ Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.

FlowReasoner: Reinforcing Query-Level Meta-Agents (2504.15257)
Code: https://github.com/sail-sg/flowreasoner

reacted to Jaward's post with 👀 3 months ago

Post

2259

New reasoning algo just dropped: Adaptive Parallel Reasoning
“we propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.”
Paper: https://arxiv.org/pdf/2504.15466
Code: https://github.com/Parallel-Reasoning/APR

reacted to Kseniase's post with 👍 4 months ago

Post

7478

11 new types of RAG

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.

Here are 11 latest RAG types:

1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization

2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store

3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors

4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks

5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts

6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation

7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers

8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways

To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai

Subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further 👇

1 reply

·

reacted to KaiChen1998's post with 👍 5 months ago

Post

4947

📢 Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!

🤗 EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.

✨ EMOVA Highlights
✅ State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
✅ Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
✅ Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!

🔥 You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: Emova-ollm/EMOVA-demo

replied to burtenshaw's post 5 months ago

Thanks. I was needing it.

replied to merve's post 6 months ago

A fascinating week indeed!

reacted to merve's post with 🔥 6 months ago

Post

5488

Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images

7 replies

·

Akhil Theerthala PRO

AI & ML interests

Recent Activity

Organizations

Akhil Theerthala PRO

AI & ML interests

Recent Activity

Organizations

Akhil-Theerthala's activity