Upload model-notes.md with huggingface_hub
Browse files- model-notes.md +46 -0
model-notes.md
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Model Training Notes
|
| 2 |
+
|
| 3 |
+
## Validation Accuracy: `train0`
|
| 4 |
+
*Note: "v1" = IMAGENET1K_V1, "v2" = V2*
|
| 5 |
+
|
| 6 |
+
| Model | Run 0 |
|
| 7 |
+
|---------------|----------|
|
| 8 |
+
| ResNet50 v1 | 0.764273 |
|
| 9 |
+
| ResNet50 v2 | 0.729282 |
|
| 10 |
+
| ResNet101 v1 | 0.775936 |
|
| 11 |
+
| ResNet101 v2 | 0.790055 |
|
| 12 |
+
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
## Validation Accuracy: `train1`
|
| 16 |
+
*Utilizes new labeled test set from Stanford Cars for more training data!*
|
| 17 |
+
|
| 18 |
+
| Model | Run 0 |
|
| 19 |
+
|---------------|----------|
|
| 20 |
+
| ResNet50 v1 | 0.848023 |
|
| 21 |
+
| ResNet50 v2 | 0.833607 |
|
| 22 |
+
| ResNet101 v1 | **0.867381** |
|
| 23 |
+
| ResNet101 v2 | 0.861614 |
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## Hyperparameterization: ResNet101v1 (`train1` best model)
|
| 28 |
+
*Hyperparameters changed: optimizer and learning rate*
|
| 29 |
+
|
| 30 |
+
| Description | Run 0 |
|
| 31 |
+
|----------------|-----------|
|
| 32 |
+
| Adam, lr=1e-4 | **0.867381** (baseline) ⭐ |
|
| 33 |
+
| Adam, lr=3e-4 | 0.717875 |
|
| 34 |
+
| Adam, lr=5e-5 | 0.841050 |
|
| 35 |
+
| SGD, lr=1e-2 | 0.691104 |
|
| 36 |
+
| SGD, lr=5e-3 | 0.417627 |
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## Observations & Conclusions
|
| 41 |
+
|
| 42 |
+
- **More data improves accuracy:** All models saw substantial gains in `train1` compared to `train0`.
|
| 43 |
+
- **Deeper models help:** ResNet101 generally outperforms ResNet50.
|
| 44 |
+
- **Optimizer matters:** Adam (`lr=1e-4`) yielded the highest accuracy; both lower/higher learning rates and SGD performed worse.
|
| 45 |
+
- **IMAGENET v1 vs v2:** The difference between v1 and v2 initializations is minor compared to the effect of data volume and model size.
|
| 46 |
+
- **Performance margins:** The right optimizer and learning rate can more than double validation accuracy for the same architecture.
|