--- pipeline_tag: text-ranking library_name: transformers language: - en base_model: - Qwen/Qwen3-14B tags: - text-generation-inference --- # Introduction > Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval For more details, please refer to our [paper](https://arxiv.org/pdf/2509.24869) and [Github repository](https://github.com/VectorSpaceLab/agentic-search). # Usage ## Using SGLang ```python import re import sglang as sgl PROMPT_TEMPLATE = """\ Here is the **relevance definition** in a retrieval task: {relevance_definition} Now given a **query** ({query_type}) and a **document** ({doc_type}) in this retrieval task, your mission is to perform the following steps. 1. Query Analysis: Think to reason and describe what information would be most helpful in answering the query. 2. Document Analysis: Discuss how the information provided by the document fulfills or fails to fulfill the requirements implied by the query. 3. Relevance Annotation: Based on the relevance definition and the insights from the previous two steps, clearly justify your final relevance annotation result and annotate an integer score from a scale of 0 to 100. Please use the following guide: - **80-100 (Highly Relevant):** The document directly and comprehensively addresses the query's intent. It is a core and authoritative answer. - **60-80 (Relevant):** The document substantially addresses the query's intent, providing most of the key information, but might miss some minor details. - **40-60 (Moderately Relevant):** The document is on-topic and addresses a part of the query's intent, but it is not a comprehensive answer. - **20-40 (Slightly Relevant):** The document mentions keywords from the query, but its main topic is different. It offers very limited value. - **0-20 (Irrelevant):** The document does not address the query's intent at all and is off-topic. After providing your detailed analysis and justification for all the steps above, conclude your entire response with the final relevance score. The score must be placed strictly between the tags. There should be no other text or explanation inside the tags: [From a scale of 0 to 100, annotate the degree of relevance between the query and the document.] Query ({query_type}): [Begin of Query] {query} [End of Query] Document ({doc_type}): [Begin of Document] {doc} [End of Document] """ def main(): query = "In a party, how many guests do you need to have to ensure that either four people all know each other or four people are all complete strangers to one another?" doc = "\\section{Infinite Ramsey's Theorem}\nTags: Ramsey Theory, Named Theorems\n\n\\begin{theorem}\nLet $k, n \\in \\N$.\nFor any set $S$, let $S^{\\paren n}$ denote the set $\\set {\\set {s_1, \\ldots, s_n}: \\text{each } s_i \\in S}$ of cardinality $n$ subsets of $S$.\nLet $X$ be an infinite set.\nThen:\n:for every partition $P$ of $X^{\\paren n}$ into $k$ many components\n:there is an infinite subset $Y \\subseteq X$\nsuch that:\n:each member of $Y^{\\paren n}$ is in the same component of $P$.\n\\end{theorem}\n\n\\begin{proof}\nWe will prove the theorem for fixed $k$ by induction on $n$.\n\\end{proof}\n\n" query_type = "math problem" doc_type = "math-related passage" relevance_definition = "Given a query (math problem) and a document (math-related passage), the document is relevant to the query if the theorem described in the document can help solve the problem in the query." prompts = [ PROMPT_TEMPLATE.format( relevance_definition=relevance_definition, query_type=query_type, doc_type=doc_type, query=query, doc=doc ) ] llm = sgl.Engine( model_path="ljw13/retro-star-qwen3-14b-0928", tp_size=8, dp_size=1, ) tokenizer = llm.tokenizer_manager.tokenizer messages = [[{"role": "user", "content": prompt}] for prompt in prompts] input_texts = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False ) sampling_params = { "n": 1, "temperature": 0.6, "max_new_tokens": 1024, "skip_special_tokens": False, "spaces_between_special_tokens": False, } outputs = llm.generate( input_texts, sampling_params=sampling_params, ) llm.shutdown() scores = [] for i, output in enumerate(outputs): print(output["text"]) print("==" * 30) try: score = int(re.search(r"\s*(\d+)\s*", output["text"]).group(1)) except AttributeError: score = 0 scores.append(score) print("Scores:", scores) if __name__ == "__main__": main() ``` # Citation If you find this model useful, please consider giving a like and citation: ``` @article{lan2025retro, title={Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval}, author={Lan, Junwei and Chen, Jianlyu and Liu, Zheng and Li, Chaofan and Bao, Siqi and Lian, Defu}, journal={arXiv preprint arXiv:2509.24869}, year={2025} } ```