llama-aviation-pte
Quantized Llama 3.2 1B Aviation model for iOS deployment
Model Details
- Base Model: Llama 3.2 1B Instruct
- Format: PTE (PyTorch ExecuTorch)
- Backend: XNNPACK (optimized for iOS)
- Quantization: 4-bit GPTQ
- Size: 1.07 GB
- Platform: iOS 16+
Files
llama3_2_1b_aviation_spinquant.pte(1083.3 MB)tokenizer.json(8.7 MB)tokenizer_config.json(0.1 MB)
Usage
Download
wget https://huggingface.co/bluelarkaviation/llama-aviation-pte/resolve/main/model.pte
iOS Integration
- Add
model.pteto your Xcode project - Install ExecuTorch framework
- Load the model:
import ExecuTorch
let modelPath = Bundle.main.path(forResource: "model", ofType: "pte")!
let module = try Module(filePath: modelPath)
Performance
- Optimized for: iPhone 12 and newer
- RAM Required: 2-3 GB
- Backend: XNNPACK (5-10x faster on ARM CPUs)
License
Same as base Llama 3.2 model
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support