AI & ML interests

Text classification

Recent Activity

codelion 
posted an update 1 day ago
view post
Post
2015
🧬 We just published our comprehensive analysis of OpenEvolve - an open-source evolutionary coding agent that automatically optimizes algorithms using LLMs!

Our key findings from 29 experiments across 10 models:

- Gemini Flash 2.5 achieved 2.04x speedup across 30 benchmark tasks
- Open models like Gemma 3 27B (1.63x) and Qwen3-Coder 480B (1.41x) rivaled proprietary models
- The system discovered entirely new algorithms - not just code optimizations!
- One task evolved from DFS to BFS to Union-Find approaches
- Specialized coding models outperformed much larger general models 200 iterations beat 100 iterations by 24%
- Ensembles surprisingly failed due to conflicting optimization strategies

Most fascinating: watching models evolve code step-by-step, like transforming matrix operations from basic eigendecomposition to vectorized one-liners with 32x speedup.

Our systematic experimental approach reveals that open-source evolutionary coding is becoming seriously competitive with proprietary solutions. We tested everything from temperature settings to evolution strategies to find optimal configurations.

This research shows automated code optimization is ready for real-world applications. The methodology we developed can guide anyone building evolutionary coding systems.

Full paper with code examples, detailed methodology, and all experimental results: https://huggingface.co/blog/driaforall/towards-open-evolutionary-agents

What optimization challenges could benefit from evolutionary approaches in your work?
codelion 
posted an update 3 days ago
view post
Post
3347
Extended the ICM paper to show cross-model capability transfer - used Qwen3's mathematical reasoning to improve Gemma3 without any human supervision.

Key results:

Qwen3-0.6B: 63.2 → 66.0 on MATH-500 (+4%)
Gemma3-1B: 41.0 → 45.6 on MATH-500 (+11%)

The method extracts coherent reasoning patterns from one model via Internal Coherence Maximization, converts them to DPO training data, and uses that to improve a completely different model architecture.
This goes beyond the original ICM paper which only improved models using their own labels. We're showing you can transfer capabilities between any models - imagine extracting capabilities from strong models to improve your local ones.

Models available:

codelion/Qwen3-0.6B-ICM-DPO
codelion/gemma-3-1b-it-ICM-DPO

Complete collection with code and datasets:
codelion/internal-coherence-maximization-687a1bd1c1f5f1d6f76e9b3b

Full methodology and results:
https://huggingface.co/blog/codelion/internal-coherence-maximization

Planning to extend this to code generation next. The approach could enable community-driven capability sharing between different model families without expensive annotation.
codelion 
posted an update 10 days ago
view post
Post
3227
Implemented Test-Time Diffusion Deep Researcher (TTD-DR) in OptiLLM! 🚀

Just shipped a game-changing feature that turns any LLM into a powerful research agent. TTD-DR applies diffusion-inspired techniques to iteratively refine research reports while grounding them in real web sources.

How it works:
• Generates initial draft
• Identifies knowledge gaps
• Searches web for missing info
• Iteratively refines through "denoising" steps
• Produces comprehensive reports with 15-30+ sources

The magic? It works with ANY model so you can choose your favorite open-source models on HF!

Key results:
- 47 complex research queries tested
- Every report backed by real web sources
- Quality rivals human research analysts
- No more hallucinations on current events!

Try it:
pip install optillm
Then use "deep_research-your-model-name" as the model identifier

- Implementation: https://github.com/codelion/optillm/tree/main/optillm/plugins/deep_research
- Paper: https://arxiv.org/abs/2507.16075v1
- Sample reports: https://github.com/codelion/optillm/tree/main/optillm/plugins/deep_research/sample_reports

Special thanks to the TTD-DR paper authors for this brilliant approach!

#research #llm #opensource #inference
codelion 
posted an update 13 days ago
view post
Post
1558
New research: Understanding how different LLMs approach reasoning through "thought anchors"

I just published a comparative study analyzing the reasoning patterns of Qwen3-0.6B vs DeepSeek-R1-Distill-1.5B using thought anchors - critical sentences that significantly impact task success probability.

Key findings:
- DeepSeek-R1: Uses concentrated reasoning with fewer, high-impact steps (0.408 avg impact)
- Qwen3: Employs distributed reasoning spreading impact across multiple steps (0.278 avg impact)
- Different risk-reward profiles: DeepSeek more consistent (82.7% positive steps), Qwen3 more exploratory (71.6% positive)

This reveals different cognitive architectures rather than simple performance differences. The models optimize for different reasoning strategies - consistency vs exploration.

Both datasets are now available on HF:
- Qwen3 thought anchors: codelion/Qwen3-0.6B-pts-thought-anchors
- DeepSeek-R1 thought anchors: codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors

Built using our open-source PTS library for mechanistic interpretability analysis. All methodology is fully reproducible.

Full article: https://huggingface.co/blog/codelion/understanding-model-reasoning-thought-anchors

What reasoning patterns have you noticed in your model experiments? Would love to hear about other architectures showing similar cognitive diversity!