---
language: zh
license: mit
tags:
- text-classification
- chinese
- sms
- travel
- bert
- pytorch
- apple-silicon-optimized
datasets:
- sms-travel-classification
widget:
- text: "【雄獅旅遊】恭喜您！日本東京賞櫻5日遊行程已確認，請準備好護照並於出發前3小時抵達機場。"
  example_title: "旅遊相關簡訊"
- text: "您好，這是台新銀行的通知，您的信用卡帳單已產生，請於繳款期限前完成付款。"
  example_title: "非旅遊相關簡訊"
model-index:
- name: bert-chinese-sms-travel-classifier
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      type: sms-travel-classification
      name: SMS Travel Classification
    metrics:
    - type: accuracy
      value: 1.0
      name: Accuracy
---

# BERT Chinese SMS Travel Classifier

## 模型描述

這是一個基於 `ckiplab/bert-base-chinese` 微調的中文 BERT 模型，專門用於對 SMS 簡訊進行旅遊相關內容的二元分類。

## 模型用途

- **任務**: 文本分類 (Text Classification)
- **語言**: 中文 (Chinese)
- **領域**: SMS 簡訊、旅遊
- **分類**: 二元分類 (旅遊相關 vs 非旅遊相關)

## 性能表現

| 指標 | 數值 |
|------|------|
| 驗證準確率 | 1.0000 |
| 最佳準確率 | 1.0000 |

## 訓練資料

- **訓練集大小**: 6400 筆
- **驗證集大小**: 800 筆  
- **測試集大小**: 801 筆

## 使用方法

```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch

# 載入模型和分詞器
model_name = "renhehuang/bert-chinese-sms-travel-classifier"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

# 預測函數
def predict_sms(text, max_length=256):
    model.eval()
    encoding = tokenizer.encode_plus(
        text,
        add_special_tokens=True,
        max_length=max_length,
        return_token_type_ids=False,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt'
    )

    with torch.no_grad():
        outputs = model(
            input_ids=encoding['input_ids'],
            attention_mask=encoding['attention_mask']
        )

        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=-1)

        predicted_class = torch.argmax(logits, dim=-1).item()
        confidence = probabilities[0][predicted_class].item()

    return predicted_class, confidence

# 使用範例
text = "【可樂旅遊】歐洲浪漫10日遊即將出發！"
predicted_class, confidence = predict_sms(text)
print(f"預測類別: {predicted_class} (信心度: {confidence:.4f})")
```

## 訓練配置

- **基礎模型**: ckiplab/bert-base-chinese
- **訓練週期**: 5 epochs
- **批次大小**: 8
- **學習率**: 3e-05
- **最大序列長度**: 256
- **訓練設備**: mps
- **早停機制**: 未啟用

## 特殊優化

✅ **Apple Silicon 優化**: 此模型針對 Apple Silicon (M1/M2/M3) 晶片進行了特別優化，包括 MPS GPU 加速和記憶體優化。

## 標籤說明

- `0`: 非旅遊相關
- `1`: 旅遊相關

## 限制和注意事項

1. 此模型專門針對中文 SMS 簡訊文本進行訓練
2. 最佳性能表現在長度不超過 256 個 token 的文本上
3. 主要適用於繁體中文內容
4. 在簡體中文上的表現可能略有差異

## 引用

如果您使用了此模型，請引用：

```
@misc{bert-chinese-sms-travel-classifier,
  title={BERT Chinese SMS Travel Classifier},
  author={renhehuang},
  year={2025},
  publisher={Hugging Face},
  journal={Hugging Face Model Hub},
}
```

## 聯絡資訊

如有問題或建議，請通過 Hugging Face 或 GitHub 聯絡作者。