llama-aviation-pte

Quantized Llama 3.2 1B Aviation model for iOS deployment

Model Details

  • Base Model: Llama 3.2 1B Instruct
  • Format: PTE (PyTorch ExecuTorch)
  • Backend: XNNPACK (optimized for iOS)
  • Quantization: 4-bit GPTQ
  • Size: 1.07 GB
  • Platform: iOS 16+

Files

  • llama3_2_1b_aviation_spinquant.pte (1083.3 MB)
  • tokenizer.json (8.7 MB)
  • tokenizer_config.json (0.1 MB)

Usage

Download

wget https://huggingface.co/bluelarkaviation/llama-aviation-pte/resolve/main/model.pte

iOS Integration

  1. Add model.pte to your Xcode project
  2. Install ExecuTorch framework
  3. Load the model:
import ExecuTorch

let modelPath = Bundle.main.path(forResource: "model", ofType: "pte")!
let module = try Module(filePath: modelPath)

Performance

  • Optimized for: iPhone 12 and newer
  • RAM Required: 2-3 GB
  • Backend: XNNPACK (5-10x faster on ARM CPUs)

License

Same as base Llama 3.2 model

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support