LiquidAI/LFM2-1.2B finetuned on Japanese content to output JSON with PII. The model should output only single json with four fields:

{
    "full_name": "name of the person",
    "company_name": "name of the company",
    "address": "address of the plance",
    "phone_number": "phone number"
}

Coded during Liquid AI hackathon in Tokyo.

Evaluations

Evaluation on test split of stockmark/ner-wikipedia-dataset:

  • Test accuracy on raw model using wiki dataset: 0.4250 --> 1.0 after fine-tunning.

That dataset is somehow simple. We generated 64 samples of long contracts containing PII in Japanese. We used it only to evaluate the final perfomrance of the models

  • Test accuracy on raw model using OUR dataset: 0.0312 --> 1.0 after fine-tunning.

Evaluation methodology

We use an exact match on generated JSON. The output of SLM must be a valid JSON with exactly four required fields, no less, no more.

Model Details

System prompt

Use the following system prompt while extracting PIIs:

Identify and extract information matching the following schema.
Return data as a JSON object. 
For each field, select most suitable value from text
If provided text does not contain sufficient information to fill out the field, make the field empty string.
Output only JSON, and output only four fields. 

{
    "full_name": "name of the person",
    "company_name": "name of the company",
    "address": "address of the plance",
    "phone_number": "phone number"
}
Downloads last month
25
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kainoj/LiquidAI-LFM2-1.2B-ja-pii-finetuned

Base model

LiquidAI/LFM2-1.2B
Finetuned
(37)
this model

Dataset used to train kainoj/LiquidAI-LFM2-1.2B-ja-pii-finetuned