RuBERT-ruLaw

This model is a continued pretraining of DeepPavlov/rubert-base-cased on the RusLawOD dataset — a large corpus of Russian legal texts (court decisions and normative acts).
The goal of this training is to improve RuBERT’s performance on legal domain tasks such as classification, information extraction, and retrieval.

Repository: https://github.com/TryDotAtwo/ruBERT-ruLaw

Training Details

Base model: DeepPavlov/rubert-base-cased
Task: Masked Language Modeling (MLM)
Max sequence length: 512 tokens (stride 128)
Batch size: 160 per device
Gradient accumulation: 1
Epochs: 8 (3 in test mode)
Max steps: 40,000
Warmup steps: 2,000
Mixed precision: BF16 (on A100/H100)
Optimizer & scheduler: Default Hugging Face Trainer settings
Evaluation metric: eval_loss (best checkpoint loaded at end)
Hardware: 3× NVIDIA H200 GPUs
Final eval loss: < 0.3

Dataset

We use the RusLawOD dataset.
Before tokenization, empty or None entries are filtered out. Tokenization is performed with the RuBERT tokenizer with truncation and sliding window (stride 128) to maximize coverage of long documents.

Usage

from transformers import AutoTokenizer, AutoModelForMaskedLM

model_name = "TryDotAtwo/rubert-rulaw"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)

Evaluation Overview

Models were tested on the sud-resh-benchmark legal texts using a masked language modeling setup. Tokens were randomly masked at varying probabilities (10–40%), and models predicted them using their pre-trained heads.

Note: The ruBERT-ruLaw model was pre-trained on legal texts such as laws and statutes, but not specifically on judicial decisions. The evaluation reflects how well it generalizes to predicting masked tokens in Russian court rulings.

Top-1 Accuracy: fraction of masked tokens predicted exactly.
Top-5 Accuracy: fraction of masked tokens predicted within the top 5 candidates.

Results reflect performance across all masked tokens, aggregated for the dataset.

MLM Accuracy Comparison

MLM Probability	Metric	ruBERT-ruLaw	rubert-base-cased	legal-bert-base-uncased
10%	Top-1	81.0%	73.0%	45.3%
10%	Top-5	92.2%	87.0%	77.2%
15%	Top-1	78.8%	67.9%	45.3%
15%	Top-5	90.8%	83.2%	76.7%
20%	Top-1	76.3%	53.8%	45.0%
20%	Top-5	89.0%	71.5%	75.9%
25%	Top-1	73.6%	18.0%	44.4%
25%	Top-5	87.0%	31.9%	75.0%
30%	Top-1	70.4%	5.9%	43.8%
30%	Top-5	84.6%	10.9%	74.0%
35%	Top-1	66.9%	6.0%	42.9%
35%	Top-5	81.9%	9.1%	72.9%
40%	Top-1	62.9%	6.0%	41.9%
40%	Top-5	78.5%	8.5%	71.7%

Citation

A paper describing the dataset and training process will be released on arXiv soon. [Link — coming soon]

Downloads last month: 307

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for TryDotAtwo/ruBERT-ruLaw

Base model

DeepPavlov/rubert-base-cased

Finetuned

(61)

this model

TryDotAtwo
/

ruBERT-ruLaw