--- language: - en - zh license: mit pipeline_tag: text-generation tags: - reinforcement-learning - agentic-reasoning - math-reasoning - tool-use library_name: transformers --- # rStar2-Agent-14B: Advanced Agentic Reasoning Model This model is part of the research presented in the paper [rStar2-Agent: Agentic Reasoning Technical Report](https://huggingface.co/papers/2508.20722). Find the official code and training recipes on the [GitHub repository](https://github.com/microsoft/rStar). ## Model Description This is a reproduced version of rStar2-Agent, a 14B parameter math reasoning model that achieves performance comparable to 67B DeepSeek-R1 through pure agentic reinforcement learning. The model excels at planning, reasoning, and autonomously using coding tools to efficiently explore, verify, and reflect for complex problem-solving. ## Usage This is an example usage. To reproduce the math evaluation results in technical report, please refer to [@microsoft/rstar](https://github.com/microsoft/rStar). ### 1. Start SGLang Server First, serve the model using SGLang with the following command: ```bash python -m sglang.launch_server \ --model-path rstar2-reproduce/rstar2-agent \ --port 30000 \ --tensor-parallel-size 4 \ --tool-call-parser qwen25 ``` **Parameters:** - `--model-path`: Path to the rStar2-Agent model - `--port`: Server port (default: 30000) - `--tensor-parallel-size`: Number of GPUs for parallel processing - `--tool-call-parser`: Parser for tool calls (use "qwen25" for this model) ### 2. Use with OpenAI-compatible API ```python from openai import OpenAI import json # Initialize OpenAI client pointing to SGLang server client = OpenAI( base_url="http://localhost:30000/v1", # SGLang server URL api_key="EMPTY" # No API key required for local server ) # Define Python code execution tool for the model tools = [ { "type": "function", "function": { "name": "execute_python_code_with_standard_io", "description": "Execute Python code with standard input and capture standard output. This function takes a Python code string and an input string, provides the input string through standard input (stdin) to the code, and captures and returns any output produced through standard output (stdout). If the executed code raises an exception, the error message will be captured and returned instead.", "parameters": { "type": "object", "properties": { "code": { "type": "string", "description": "A string containing Python code to be executed. The code can read from standard input using the input() function." }, "input": { "type": "string", "description": "A string that will be provided as standard input to the code when it calls input()." } }, "required": ["code", "input"] } } } ] # Define Python code execution function def execute_python_code_with_standard_io(code, input_data): """ Execute Python code with standard input and capture output. Args: code (str): Python code to execute input_data (str): Input data to provide to the code Returns: str: Output from the executed code or error message """ import subprocess import sys try: # Create subprocess to execute Python code process = subprocess.Popen( [sys.executable, "-c", code], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True ) # Send input and get output stdout, stderr = process.communicate(input=input_data) if stderr: return f"Error: {stderr}" return stdout.strip() except Exception as e: return f"Execution error: {str(e)}" # Example: Create a math problem conversation messages = [ { "role": "user", "content": "You must put your answer inside tags, i.e., answer here . And your final answer will be extracted automatically by the \\boxed{} tag. Solve this math problem: Find the sum of all prime numbers less than 20." } ] # Main conversation loop - handle tool calls until completion turn_idx = 0 while True: print(f'========== Turn: {turn_idx} ==========') turn_idx += 1 # Get model response with tool support response = client.chat.completions.create( model="rstar2-reproduce/rstar2-agent", messages=messages, tools=tools, tool_choice="auto", # Let model decide when to use tools temperature=0.6 # Adjust for creativity vs consistency ) # Add the assistant's response to conversation history messages.append(response.choices[0].message) print(f'{response.choices[0].message.content}') # Check if model wants to use tools if response.choices[0].message.tool_calls: # Process each tool call for tool_call in response.choices[0].message.tool_calls: function_args = json.loads(tool_call.function.arguments) print(f">>> Executing Code: {function_args['code']}") input_text = function_args.get('input', '') print(f">>> With Input: {input_text if input_text else '(no input)'}") # Execute the Python code result = execute_python_code_with_standard_io( function_args["code"], function_args.get("input", "") ) print(f">>> Tool result: {result}") # Add tool response to conversation messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result }) else: # No more tool calls, conversation finished print("✅ No more tool calls. Conversation finished.") break ``` ## Citation If you use this model in your research, please cite: ```bibtex @misc{shang2025rstar2agentagenticreasoningtechnical, title={rStar2-Agent: Agentic Reasoning Technical Report}, author={Ning Shang and Yifei Liu and Yi Zhu and Li Lyna Zhang and Weijiang Xu and Xinyu Guan and Buze Zhang and Bingcheng Dong and Xudong Zhou and Bowen Zhang and Ying Xin and Ziming Miao and Scarlett Li and Fan Yang and Mao Yang}, year={2025}, eprint={2508.20722}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.20722}, } ``` ## License This model is released under the MIT License.