🗣️ Talk2Ref Query Talk Encoder

This model encodes scientific talks (transcripts, titles, and years) into dense vector representations, designed for Reference Prediction from Talks (RPT) — the task of retrieving relevant cited papers for a given talk.
It was trained as part of the Talk2Ref dataset project.

The model forms the query-side encoder in a dual-encoder (DPR-style) setup, paired with the Talk2Ref Cited Paper Encoder.

🎯 Usage

Example with transformers:

from transformers import AutoModel
import torch

# Load model
model = AutoModel.from_pretrained("s8frbroy/talk2ref_query_talk_encoder")

# Example input
title = "Attention Is All You Need"
year = 2017
query_text = f"The following presentation is about the paper of the title: '{title}'. Published in {year}. " + \
              "In this talk, we introduce the Transformer architecture and discuss its impact on sequence modeling."

# Compute embedding
with torch.no_grad():
    embedding = model([query_text])

print(embedding.shape)  # (1, hidden_dim)

🧩 Model Overview

Property	Description
Architecture	Sentence-BERT (all-MiniLM-L6-v2 backbone)
Pooling	Mean pooling
Max sequence length	512 tokens
Training data	Talk2Ref dataset (≈ 43 k cited papers linked to 6 k talks)
Objective	Contrastive binary (DPR-style) loss
Task	Encode cited papers into a shared semantic space with talk transcripts

Citation

If you use this dataset, please cite the following paper:

@misc{broy2025talk2refdatasetreferenceprediction,
  title        = {Talk2Ref: A Dataset for Reference Prediction from Scientific Talks},
  author       = {Frederik Broy and Maike Züfle and Jan Niehues},
  year         = {2025},
  eprint       = {2510.24478},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2510.24478}
}

Downloads last month: 61

Safetensors

Model size

22.8M params

Tensor type

F32

Model tree for s8frbroy/talk2ref_query_talk_encoder

Base model

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(586)

this model

Collection including s8frbroy/talk2ref_query_talk_encoder

Talk2Ref

Collection

3 items • Updated 16 days ago