--- base_model: - meta-llama/Llama-3.3-70B-Instruct datasets: - flexifyai/cross_rulings_hts_dataset_for_tariffs language: - en library_name: transformers license: mit metrics: - accuracy pipeline_tag: text-classification tags: - legal - trade - htsus - semiconductor - tariffs - hts - cross - cbp pretty_name: Atlas (LLaMA-3.3-70B) — HTS Classification authors: - name: Pritish Yuvraj affiliation: Flexify.AI homepage: https://www.pritishyuvraj.com/ - name: Siva Devarakonda affiliation: Flexify.AI --- # Atlas — LLaMA-3.3-70B fine-tuned for Harmonized Tariff Schedule (HTS) classification This model is presented in the paper [ATLAS: Benchmarking and Adapting LLMs for Global Trade via Harmonized Tariff Code Classification](https://huggingface.co/papers/2509.18400). Atlas is a domain-specialized LLaMA-3.3-70B model fine-tuned on U.S. Customs CROSS rulings for Harmonized Tariff Schedule (HTS) code assignment. It targets both **10-digit U.S. HTS (compliance)** and **6-digit HS (globally harmonized)** accuracy. - **10-digit exact match:** 40.0% - **6-digit exact match:** 57.5% Atlas outperforms general-purpose LLMs while remaining deployable/self-hostable. - **Model repo:** [flexifyai/atlas-llama3.3-70b-hts-classification](https://huggingface.co/flexifyai/atlas-llama3.3-70b-hts-classification) - **Dataset:** [flexifyai/cross_rulings_hts_dataset_for_tariffs](https://huggingface.co/datasets/flexifyai/cross_rulings_hts_dataset_for_tariffs) - **Demo:** [flexifyai/atlas-llama3_3-70b-hts-demo](https://flexifyai-atlas-llama3-3-70b-hts-demo.hf.space/?__theme=system&deep_link=auHidY8xF00) - **Project page:** https://tariffpro.flexify.ai/ **Example (from the demo):** **User:** What is the HTS US Code for 4\[N-(2,4-Diamino-6-Pteridinylmethyl)-N-Methylamino] Benzoic Acid Sodium Salt? **Model:** HTS US Code -> `2933.59.4700` Reasoning -> Falls under heterocyclic compounds with nitrogen hetero-atom(s); specifically classified within pteridine derivatives used in pharmaceutical or biochemical applications per CROSS rulings. --- ## TL;DR - **Task:** Assign an HTS code given a product description (and optionally rationale). - **Why it matters:** Misclassification halts shipments; 6-digit HS is global, 10-digit is U.S.-specific. - **What’s new:** First open benchmark + strong open model baseline focused on semiconductors/manufacturing. --- ## Intended use & limitations ### Use cases - Automated HTS/HS pre-classification with human-in-the-loop review. - Decision support for brokers, compliance, and trade workflows. - Research on domain reasoning, retrieval, and alignment. ### Limitations - Not legal advice; rulings change and are context-dependent. - Training data is concentrated in semiconductors/manufacturing; performance may vary elsewhere. - Model can produce confident but incorrect codes; keep a human validator for high-stakes usage. - Always verify against the current HTS/USITC and local customs guidance. --- ## Data - **Source:** CROSS (U.S. Customs Rulings Online Search System). - **Splits:** 18,254 train / 200 valid / 200 test. - Each example includes: - product description - chain-of-reasoning style justification - ground-truth HTS code **Dataset card:** [flexifyai/cross_rulings_hts_dataset_for_tariffs](https://huggingface.co/datasets/flexifyai/cross_rulings_hts_dataset_for_tariffs) --- ## Training setup (summary) - **Base:** LLaMA-3.3-70B (dense) - **Objective:** Supervised fine-tuning (token-level NLL) - **Optimizer:** AdamW (β1=0.9, β2=0.95, wd=0.1), cosine LR schedule, peak LR 1e-7 - **Precision:** bf16, gradient accumulation (effective batch ≈ 64 seqs) - **Hardware:** 16× A100-80GB, 5 epochs (~1.4k steps) We chose a dense model for simpler finetuning/inference and reproducibility under budget constraints. **Future work:** retrieval, DPO/GRPO, and smaller distilled variants. --- ## Results (200-example held-out test) | Model | 10-digit exact | 6-digit exact | Avg. digits correct | |-------------------------|----------------|---------------|----------------------| | GPT-5-Thinking | 25.0% | 55.5% | 5.61 | | Gemini-2.5-Pro-Thinking | 13.5% | 31.0% | 2.92 | | DeepSeek-R1 (05/28) | 2.5% | 26.5% | 3.24 | | GPT-OSS-120B | 1.5% | 8.0% | 2.58 | | LLaMA-3.3-70B (base) | 2.1% | 20.7% | 3.31 | | **Atlas (this model)** | **40.0%** | **57.5%** | **6.30** | 💰 **Cost note:** Self-hosting Atlas on A100s can be significantly cheaper per 1k inferences than proprietary APIs. --- ## Prompting Atlas expects an instruction like: --- User: What is the HTS US Code for [product_description]? Model: HTS US Code -> [10-digit code] Reasoning -> [short justification] --- ### Minimal example **User:** What is the HTS US Code for 300mm silicon wafers, polished, un-doped, for semiconductor fabrication? **Model:** HTS US Code -> `3818.00.0000` Reasoning -> Classified under chemical elements/compounds doped for electronics; wafer form per CROSS precedents. --- ## Authors - **Pritish Yuvraj** (Flexify.AI) — [pritishyuvraj.com](https://www.pritishyuvraj.com) - **Siva Devarakonda** (Flexify.AI) ## 📖 Citation If you find this work useful, please cite our paper: ```bibtex @misc{yuvraj2025atlasbenchmarkingadaptingllms, title={ATLAS: Benchmarking and Adapting LLMs for Global Trade via Harmonized Tariff Code Classification}, author={Pritish Yuvraj and Siva Devarakonda}, year={2025}, eprint={2509.18400}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2509.18400}, } ```