Instructions to use OpenMOSE/HRWKV7-Reka-Flash3-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenMOSE/HRWKV7-Reka-Flash3-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OpenMOSE/HRWKV7-Reka-Flash3-Preview")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenMOSE/HRWKV7-Reka-Flash3-Preview", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use OpenMOSE/HRWKV7-Reka-Flash3-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OpenMOSE/HRWKV7-Reka-Flash3-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenMOSE/HRWKV7-Reka-Flash3-Preview", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/OpenMOSE/HRWKV7-Reka-Flash3-Preview
- SGLang
How to use OpenMOSE/HRWKV7-Reka-Flash3-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OpenMOSE/HRWKV7-Reka-Flash3-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenMOSE/HRWKV7-Reka-Flash3-Preview", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OpenMOSE/HRWKV7-Reka-Flash3-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenMOSE/HRWKV7-Reka-Flash3-Preview", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use OpenMOSE/HRWKV7-Reka-Flash3-Preview with Docker Model Runner:
docker model run hf.co/OpenMOSE/HRWKV7-Reka-Flash3-Preview
Enhance model card: Add metadata, paper/code links, and Transformers usage
This PR significantly enhances the model card for HRWKV7-Reka-Flash3-Preview by:
- Adding
pipeline_tag: text-generationto the metadata, which ensures the model appears in relevant searches on the Hugging Face Hub and enables the interactive inference widget. - Adding
library_name: transformersto the metadata, indicating compatibility with the Hugging Face Transformers library and enabling the "Use in Transformers" widget with associated code snippets. - Adding relevant
tagssuch aslinear-attention,reka,rwkv,knowledge-distillation, and specifyinglanguages: ['mul']to reflect its multilingual nature, improving discoverability. - Introducing a prominent "Paper and Project Details" section at the top, linking directly to the Hugging Face Papers page for RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale and the main project's GitHub repository (https://github.com/recursal/RADLADS-paper).
- Including a standard
transformerscode snippet for text generation, making it easier for users to get started with the model. The originalRWKV-Inferusage is retained for completeness. - Adding the BibTeX citation for the RADLADS paper to ensure proper attribution.
These changes collectively make the model card more informative, discoverable, and user-friendly on the Hugging Face Hub.
Thank you for pointing that out.
I'm currently coding HF compatible inference code.
Thanks, feel free to remove library_name: transformers and the Transformers code snippet as those seem wrong for this particular checkpoint which does not seem Transformers compatible.
I apologize for any misunderstanding.
This model is based on the RADLADS distillation method,
but the training code and model architecture are different.
RADLADS1: Modified RWKV v6 (Gated Linear Attention kernel)
My: Modified RWKV v7(RWKV kernel) + No Position Embedding GQA Hybrid
Please feel free to point out any issues. :)