when qwen_next (i just tested both model did well)
qwen_next is the best model even its better then gpt 120b impo
cuda+cpu
It is close to being merged in mainline: https://github.com/ggml-org/llama.cpp/pull/16252
If it gets ported to ik then I might take a look!
Seeing minimax-m2 coming out experimental for mainline too: https://huggingface.co/cturan/MiniMax-M2-GGUF https://github.com/ggml-org/llama.cpp/pull/16831
I cannot wait for minimax m2!
By the way guys, I have a new toy to play with:
https://youtu.be/HliRC6qCkqk
I was able to compile llama.cop on it just fine but il_llama won't compile there.
@DevQuasar just released some mainline lcpp quants here: https://huggingface.co/DevQuasar/MiniMaxAI.MiniMax-M2-GGUF for testing pwilkin's PR linked above
oh interesting, i would think ik would compile for nvidia dgx spark might need to specify that 121 arch specifically, or what is the error you're getting?
# maybe something like this?
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_VULKAN=OFF -DGGML_RPC=OFF -DGGML_BLAS=OFF -DCMAKE_CUDA_ARCHITECTURES="121"
cmake --build build --config Release -j $(nproc)