πŸ—£οΈ Talk2Ref Query Talk Encoder

This model encodes scientific talks (transcripts, titles, and years) into dense vector representations, designed for Reference Prediction from Talks (RPT) β€” the task of retrieving relevant cited papers for a given talk.
It was trained as part of the Talk2Ref dataset project.

The model forms the query-side encoder in a dual-encoder (DPR-style) setup, paired with the Talk2Ref Cited Paper Encoder.


🎯 Usage

Example with transformers:

from transformers import AutoModel
import torch

# Load model
model = AutoModel.from_pretrained("s8frbroy/talk2ref_query_talk_encoder")

# Example input
title = "Attention Is All You Need"
year = 2017
query_text = f"The following presentation is about the paper of the title: '{title}'. Published in {year}. " + \
              "In this talk, we introduce the Transformer architecture and discuss its impact on sequence modeling."

# Compute embedding
with torch.no_grad():
    embedding = model([query_text])

print(embedding.shape)  # (1, hidden_dim)

🧩 Model Overview

Property Description
Architecture Sentence-BERT (all-MiniLM-L6-v2 backbone)
Pooling Mean pooling
Max sequence length 512 tokens
Training data Talk2Ref dataset (β‰ˆ 43 k cited papers linked to 6 k talks)
Objective Contrastive binary (DPR-style) loss
Task Encode cited papers into a shared semantic space with talk transcripts

Citation

If you use this dataset, please cite the following paper:

@misc{broy2025talk2refdatasetreferenceprediction,
  title        = {Talk2Ref: A Dataset for Reference Prediction from Scientific Talks},
  author       = {Frederik Broy and Maike ZΓΌfle and Jan Niehues},
  year         = {2025},
  eprint       = {2510.24478},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2510.24478}
}
Downloads last month
61
Safetensors
Model size
22.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for s8frbroy/talk2ref_query_talk_encoder

Finetuned
(586)
this model

Collection including s8frbroy/talk2ref_query_talk_encoder