when qwen_next (i just tested both model did well)

#10
by gopi87 - opened

qwen_next is the best model even its better then gpt 120b impo

cuda+cpu

https://github.com/cturan/llama.cpp

It is close to being merged in mainline: https://github.com/ggml-org/llama.cpp/pull/16252

If it gets ported to ik then I might take a look!

Seeing minimax-m2 coming out experimental for mainline too: https://huggingface.co/cturan/MiniMax-M2-GGUF https://github.com/ggml-org/llama.cpp/pull/16831

I cannot wait for minimax m2!

By the way guys, I have a new toy to play with:
https://youtu.be/HliRC6qCkqk

I was able to compile llama.cop on it just fine but il_llama won't compile there.

@DevQuasar just released some mainline lcpp quants here: https://huggingface.co/DevQuasar/MiniMaxAI.MiniMax-M2-GGUF for testing pwilkin's PR linked above

@mtcl

oh interesting, i would think ik would compile for nvidia dgx spark might need to specify that 121 arch specifically, or what is the error you're getting?

# maybe something like this?
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_VULKAN=OFF -DGGML_RPC=OFF -DGGML_BLAS=OFF -DCMAKE_CUDA_ARCHITECTURES="121"
cmake --build build --config Release -j $(nproc)

Sign up or log in to comment