Jonatan Langlet,
Ran Ben Basat,
Gabriele Oliaro,
Michael Mitzenmacher,
Minlan Yu,
Gianni Antichi
(2023).
Direct Telemetry Access.
In
SIGCOMM ‘23.
Gabriele Oliaro,
Xupeng Miao,
Zhihao Zhang,
Xinhao Cheng,
Zeyu Wang,
Rae Ying Yee Wong,
Alan Zhu,
Lijie Yang,
Xiaoxiang Shi,
Chunan Shi,
Zhuoming Chen,
Daiyaan Arfeen,
Reyna Abhyankar,
Zhihao Jia
(2023).
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification.
In
ArXiv.