Why VRAM Nearly doubled from Small 3.1?

#14
by rdodev - opened

Folks,

Trying to run this model essentially entails doubling our infrastructure. Small 3.1 fit easily on a single H100 with plenty of headroom. With 3.2 we need to use 2xH100 because VRAM itself is >55Gb and then the kvcache and map puts it past the 80GB of a single H100. Both are using same quant. Why the big difference?

Any info here @juliendenize ?

Sign up or log in to comment