DualTune: Decoupled Fine-Tuning for On-Device Agentic Systems Paper • 2510.00229 • Published Sep 30 • 1
Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings Paper • 2506.01435 • Published Jun 2
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices Paper • 2506.17538 • Published Jun 21 • 7
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval Paper • 2502.20969 • Published Feb 28 • 11
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Paper • 2502.20583 • Published Feb 27 • 13
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Paper • 2502.20583 • Published Feb 27 • 13
NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper • 2408.12757 • Published Aug 22, 2024 • 19
WikiSplit++: Easy Data Refinement for Split and Rephrase Paper • 2404.09002 • Published Apr 13, 2024 • 2
Comparison and Combination of Sentence Embeddings Derived from Different Supervision Signals Paper • 2202.02990 • Published Feb 7, 2022
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Paper • 2402.07033 • Published Feb 10, 2024 • 17
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence Paper • 2306.07075 • Published Jun 12, 2023 • 10