Publications

(2024). FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning. ArXiv 2024.

PDF Cite Code DOI

(2024). SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification. ASPLOS 2024.

PDF Cite Code DOI

(2024). Optimal Kernel Orchestration for Tensor Programs with Korch. ASPLOS 2024.

(2024). Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models. ArXiv 2024.

PDF Cite DOI

(2023). Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems. ArXiv 2023.

PDF Cite DOI

(2023). Direct Telemetry Access. SIGCOMM 2023.

PDF Cite Code DOI

(2021). Zero-CPU Collection with Direct Telemetry Access. HotNets 2021.

PDF Cite DOI

(2021). A Probabilistic In-band network Telemetry CHeckER (PITCHER). Harvard BS Thesis.

Cite