Why VRAM Nearly doubled from Small 3.1?
#14
by
rdodev
- opened
Folks,
Trying to run this model essentially entails doubling our infrastructure. Small 3.1 fit easily on a single H100 with plenty of headroom. With 3.2 we need to use 2xH100 because VRAM itself is >55Gb and then the kvcache and map puts it past the 80GB of a single H100. Both are using same quant. Why the big difference?
Any info here @juliendenize ?