🏗️ Building on HF

Mr Munk

GODELEV

AI & ML interests

High schooler by day, LLM builder by night. Driven by a deep love for both Physics and AI. Currently spending my runtime building on Hugging Face, experimenting with transformer architectures, and training custom LLMs.

Recent Activity

liked a model about 1 hour ago

WhirlwindAI/MetaNova-1-60M

reacted to Banaxi-Tech's post with 👀 about 21 hours ago

Today we are releasing BananaMind-KV1-8M-2Bit-Experimental, a KV-cache-aware trained model that stores its generation KV cache in 2-bit precision instead of the usual 16-bit precision. Result: 5.33x smaller KV cache vs FP16, with 0.0916 mean KLD against a 16-bit KV cache reference on WikiText-2. Model: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental The important part: this is not just post-training KV cache quantization. Instead we take the BitNet approach. KV1 is trained with a 2-bit-aware K/V path. Instead of training a normal model and quantizing the cache afterwards, the model learns during training to operate under the low-bit KV constraint, closer in spirit to the BitNet idea of training for the low-bit regime. During generation, each K/V vector is quantized into 4 affine levels and packed into uint8 tensors, with four 2-bit values stored per byte. WikiText-2 eval vs 16-bit KV cache reference: Mean KLD: 0.0916 nats/token Mean KLD: 0.1322 bits/token Average KV cache shrink vs FP16: 5.33x Evaluated positions: 372,675 If this actually gets used in models like Qwen or Gemma, then it may be possible to run 128K or even 256K Context on a Normal Machine! Try it here: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental Code: https://github.com/Banaxi-Tech/kv1

reacted to Quazim0t0's post with 👀 1 day ago

Created research language model whose channel-mixing block is not an MLP. It is a differentiable Neighbour-Sensing fungal-colony-growth model: each token is expanded into a colony of hyphal tips that grow in a bounded latent region, sense a shared density field, and steer their own growth — the "MLP" is replaced by a few differentiable steps of colony growth, read back out into the hidden state. https://huggingface.co/Quazim0t0/Mycel-LM-79M Also the original SpikeWhale project — the one that sparked all the other SpikeWhale related projects. Every spiking primitive here is hand-written in plain PyTorch: the leaky integrate-and-fire (LIF) neuron dynamics, the fast-sigmoid surrogate gradient, and the backprop-through-time training loop. No snntorch, no spikingjelly, no norse, no bindsnet — the network is a genuine from-scratch SNN. https://huggingface.co/Quazim0t0/SpikeWhale-SNN-216M

View all activity

Organizations

None yet

liked a model about 1 hour ago

WhirlwindAI/MetaNova-1-60M

62.7M • Updated about 18 hours ago • 4

reacted to Banaxi-Tech's post with 👀 about 21 hours ago

Post

1146

Today we are releasing BananaMind-KV1-8M-2Bit-Experimental, a KV-cache-aware trained model that stores its generation KV cache in 2-bit precision instead of the usual 16-bit precision.

Result: 5.33x smaller KV cache vs FP16, with 0.0916 mean KLD against a 16-bit KV cache reference on WikiText-2.

Model: BananaMind/BananaMind-KV1-8M-2Bit-Experimental

The important part: this is not just post-training KV cache quantization.
Instead we take the BitNet approach.

KV1 is trained with a 2-bit-aware K/V path. Instead of training a normal model and quantizing the cache afterwards, the model learns during training to operate under the low-bit KV constraint, closer in spirit to the BitNet idea of training for the low-bit regime.

During generation, each K/V vector is quantized into 4 affine levels and packed into uint8 tensors, with four 2-bit values stored per byte.

WikiText-2 eval vs 16-bit KV cache reference:

Mean KLD: 0.0916 nats/token
Mean KLD: 0.1322 bits/token
Average KV cache shrink vs FP16: 5.33x
Evaluated positions: 372,675

If this actually gets used in models like Qwen or Gemma, then it may be possible to run 128K or even 256K Context on a Normal Machine!
Try it here: BananaMind/BananaMind-KV1-8M-2Bit-Experimental

Code: https://github.com/Banaxi-Tech/kv1

reacted to Quazim0t0's post with 👀 1 day ago

Post

2299

Created research language model whose channel-mixing block is not an MLP. It is a differentiable Neighbour-Sensing fungal-colony-growth model: each token is expanded into a colony of hyphal tips that grow in a bounded latent region, sense a shared density field, and steer their own growth — the "MLP" is replaced by a few differentiable steps of colony growth, read back out into the hidden state.

Quazim0t0/Mycel-LM-79M

Also the original SpikeWhale project — the one that sparked all the other SpikeWhale related projects. Every spiking primitive here is hand-written in plain PyTorch: the leaky integrate-and-fire (LIF) neuron dynamics, the fast-sigmoid surrogate gradient, and the backprop-through-time training loop. No snntorch, no spikingjelly, no norse, no bindsnet — the network is a genuine from-scratch SNN.

Quazim0t0/SpikeWhale-SNN-216M

liked a Space 2 days ago

SLM Arena

🏟

Compact model arena with GLM and GPT OSS commentary

New activity in GODELEV/Arithmetic-XL 2 days ago

[bot] Conversion to Parquet

#1 opened 5 days ago by

parquet-converter

reacted to Banaxi-Tech's post with 👀 4 days ago

Post

10640

A new model is coming!
Its going to take a long time on my 5070 Ti so expect a release in ~1 month.
We think this model is going to be SOTA For its size.
Our Mini Version will be 25M Parameters and Pro with 140M.
The Pro version has a 3072 Context Window (Extensible to up to 6K with RoPE) And the Mini version has a context window of 4096 (Up to 8K with RoPE)
Meanwhile we are currently working on a Instruct Version of our BananaMind 1.5 Base.

The training will start this weekend

We are very exited to release it when its done!

12 replies

New activity in CyanMonkey/BananaMind-1.5-BB 5 days ago

Upload eval_results.json with huggingface_hub

#1 opened 5 days ago by

GODELEV

New activity in AtomixLabs/AtomixS2-5M-v1.0 5 days ago

Just Curious...

#1 opened 5 days ago by

GODELEV

liked 2 models 5 days ago

AtomixLabs/AtomixS2-5M-v1.0

Text Generation • 5.98M • Updated 5 days ago • 126 • 4

BananaMind/BananaMind-1.5-Base

Text Generation • 75.1M • Updated 5 days ago • 211 • 7

updated a dataset 5 days ago

GODELEV/Arithmetic-XL

Viewer • Updated 5 days ago • 24M • 71 • 1

published a dataset 5 days ago

GODELEV/Arithmetic-XL

Viewer • Updated 5 days ago • 24M • 71 • 1

replied to Banaxi-Tech's post 6 days ago

My bad , But what do you mean by "Yes" custom Arch.. or Lamma ?

replied to Banaxi-Tech's post 6 days ago

so the model you are training is "lamma For Casual LLM" or your custom architecture ?

replied to Banaxi-Tech's post 6 days ago

Wow !! You are on absolute right track ,
Its going to be really a very good model
what i can say so far.

All the Best 👌

liked a Space 6 days ago

Blog

🐨

Read research updates and project news

reacted to Banaxi-Tech's post with 🔥 6 days ago

Post

3865

Hello AI Community! 👋

We have just released BananaMind 1.5 Base and it outperforms other models at its size.
It outperforms GPT 2 124M while being ~50M params smaller

Check it out: BananaMind/BananaMind-1.5-Base

OLD POST CONTENTS EDITED:
We currently have a new AI Model and we are currently training it.
We are training it on 27B tokens and are currently 8% done.
Follow us to be notified when it releases 🚀
Some Info:
Parameters 75M
GPU: RTX Pro 6000
We expect to be able to release it in the coming dayshttps://huggingface.co/BananaMind/BananaMind-1.5-Base