Simple Self-Distillation
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
TopoPrimer: The Missing Topological Context in Forecasting Models
Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
Team members 792 private
Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 77 -
FastVLM WebGPU
π446Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 20.1k β’ 392 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 3.13k β’ 80
-
apple/DiffuCoder-7B-cpGRPO
8B β’ Updated β’ 1.21k β’ 318 -
apple/DiffuCoder-7B-Instruct
8B β’ Updated β’ 1.99k β’ 63 -
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Paper β’ 2506.20639 β’ Published β’ 31 -
apple/DiffuCoder-7B-Base
8B β’ Updated β’ 182 β’ 31
-
apple/coreml-depth-anything-v2-small
Depth Estimation β’ Updated β’ 562 β’ 96 -
apple/coreml-depth-anything-small
Depth Estimation β’ Updated β’ 133 β’ 40 -
apple/coreml-detr-semantic-segmentation
Image Segmentation β’ Updated β’ 265 β’ 32 -
apple/coreml-FastViT-T8
Image Classification β’ Updated β’ 21 β’ 17
Benchmark for the design of efficient continual learning of image-text models over years.
-
TiC-CLIP: Continual Training of CLIP Models
Paper β’ 2310.16226 β’ Published β’ 10 -
apple/TiC-DataComp
Preview β’ Updated β’ 1.55k β’ 4 -
apple/TiC-CLIP-basic-cumulative
Zero-Shot Image Classification β’ Updated β’ 83 β’ 3 -
apple/TiC-CLIP-basic-oracle
Zero-Shot Image Classification β’ Updated β’ 8 β’ 1
-
apple/coreml-stable-diffusion-mixed-bit-palettization
Updated β’ 8 β’ 30 -
apple/coreml-stable-diffusion-xl-base
Text-to-Image β’ Updated β’ 139 β’ 70 -
apple/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 136 β’ 55 -
pcuenq/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 37 β’ 4
AIM: Autoregressive Image Models
CLaRa models
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint.
-
apple/aimv2-large-patch14-224
Image Feature Extraction β’ 0.3B β’ Updated β’ 934 β’ 62 -
apple/aimv2-huge-patch14-224
Image Feature Extraction β’ 0.7B β’ Updated β’ 140 β’ 13 -
apple/aimv2-1B-patch14-224
Image Feature Extraction β’ 1B β’ Updated β’ 48 β’ 8 -
apple/aimv2-3B-patch14-224
Image Feature Extraction β’ 3B β’ Updated β’ 73 β’ 4
-
apple/OpenELM-270M-Instruct
Text Generation β’ 0.3B β’ Updated β’ 2.09k β’ 145 -
apple/OpenELM-450M-Instruct
Text Generation β’ 0.5B β’ Updated β’ 1.11k β’ 51 -
apple/OpenELM-1_1B-Instruct
Text Generation β’ Updated β’ 1.61M β’ 75 -
apple/OpenELM-3B-Instruct
Text Generation β’ 3B β’ Updated β’ 2.79k β’ 340
MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities.
DataCompDR: Improved datasets for training image-text SOTA models.
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper β’ 2311.17049 β’ Published β’ 7 -
apple/mobileclip_s0_timm
Image Classification β’ Updated β’ 181 β’ 12 -
apple/mobileclip_s1_timm
Image Classification β’ Updated β’ 53 β’ 3 -
apple/mobileclip_s2_timm
Image Classification β’ Updated β’ 35 β’ 6
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
-
apple/DepthPro-hf
Depth Estimation β’ 1.0B β’ Updated β’ 24.3k β’ 102 -
apple/DepthPro
Depth Estimation β’ Updated β’ 1.83k β’ 510 -
apple/DepthPro-mixin
Depth Estimation β’ Updated β’ 11 β’ 8 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification β’ 0.4B β’ Updated β’ 32.1M β’ 2.01k
CLIP Models trained using DFN-2B/DFN-5B datasets
DCLM Models + Datasets
Simple Self-Distillation
CLaRa models
Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 77 -
FastVLM WebGPU
π446Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 20.1k β’ 392 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 3.13k β’ 80
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
-
apple/DiffuCoder-7B-cpGRPO
8B β’ Updated β’ 1.21k β’ 318 -
apple/DiffuCoder-7B-Instruct
8B β’ Updated β’ 1.99k β’ 63 -
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Paper β’ 2506.20639 β’ Published β’ 31 -
apple/DiffuCoder-7B-Base
8B β’ Updated β’ 182 β’ 31
A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint.
-
apple/aimv2-large-patch14-224
Image Feature Extraction β’ 0.3B β’ Updated β’ 934 β’ 62 -
apple/aimv2-huge-patch14-224
Image Feature Extraction β’ 0.7B β’ Updated β’ 140 β’ 13 -
apple/aimv2-1B-patch14-224
Image Feature Extraction β’ 1B β’ Updated β’ 48 β’ 8 -
apple/aimv2-3B-patch14-224
Image Feature Extraction β’ 3B β’ Updated β’ 73 β’ 4
-
apple/coreml-depth-anything-v2-small
Depth Estimation β’ Updated β’ 562 β’ 96 -
apple/coreml-depth-anything-small
Depth Estimation β’ Updated β’ 133 β’ 40 -
apple/coreml-detr-semantic-segmentation
Image Segmentation β’ Updated β’ 265 β’ 32 -
apple/coreml-FastViT-T8
Image Classification β’ Updated β’ 21 β’ 17
-
apple/OpenELM-270M-Instruct
Text Generation β’ 0.3B β’ Updated β’ 2.09k β’ 145 -
apple/OpenELM-450M-Instruct
Text Generation β’ 0.5B β’ Updated β’ 1.11k β’ 51 -
apple/OpenELM-1_1B-Instruct
Text Generation β’ Updated β’ 1.61M β’ 75 -
apple/OpenELM-3B-Instruct
Text Generation β’ 3B β’ Updated β’ 2.79k β’ 340
MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities.
DataCompDR: Improved datasets for training image-text SOTA models.
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper β’ 2311.17049 β’ Published β’ 7 -
apple/mobileclip_s0_timm
Image Classification β’ Updated β’ 181 β’ 12 -
apple/mobileclip_s1_timm
Image Classification β’ Updated β’ 53 β’ 3 -
apple/mobileclip_s2_timm
Image Classification β’ Updated β’ 35 β’ 6
Benchmark for the design of efficient continual learning of image-text models over years.
-
TiC-CLIP: Continual Training of CLIP Models
Paper β’ 2310.16226 β’ Published β’ 10 -
apple/TiC-DataComp
Preview β’ Updated β’ 1.55k β’ 4 -
apple/TiC-CLIP-basic-cumulative
Zero-Shot Image Classification β’ Updated β’ 83 β’ 3 -
apple/TiC-CLIP-basic-oracle
Zero-Shot Image Classification β’ Updated β’ 8 β’ 1
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
-
apple/DepthPro-hf
Depth Estimation β’ 1.0B β’ Updated β’ 24.3k β’ 102 -
apple/DepthPro
Depth Estimation β’ Updated β’ 1.83k β’ 510 -
apple/DepthPro-mixin
Depth Estimation β’ Updated β’ 11 β’ 8 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification β’ 0.4B β’ Updated β’ 32.1M β’ 2.01k
-
apple/coreml-stable-diffusion-mixed-bit-palettization
Updated β’ 8 β’ 30 -
apple/coreml-stable-diffusion-xl-base
Text-to-Image β’ Updated β’ 139 β’ 70 -
apple/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 136 β’ 55 -
pcuenq/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 37 β’ 4
CLIP Models trained using DFN-2B/DFN-5B datasets
AIM: Autoregressive Image Models
DCLM Models + Datasets