Gabriele Oliaro
Gabriele Oliaro
Home
Publications
Industry Experience
Contact
CV
Light
Dark
Automatic
3
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software. Create your slides in Markdown - click the Slides button to check out the dta.
Gabriele Oliaro
,
Xupeng Miao
,
Xinhao Cheng
,
Vineeth Kada
,
Ruohan Gao
,
Yingyi Huang
,
Remi Delacourt
,
April Yang
,
Yingcheng Wang
,
Mengdi Wu
,
Colin Unger
,
Zhihao Jia
PDF
Cite
Code
DOI
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software. Create your slides in Markdown - click the Slides button to check out the dta.
Rui Pan
,
Yinwei Dai
,
Zhihao Zhang
,
Gabriele Oliaro
,
Zhihao Jia
,
Ravi Netravali
PDF
Cite
DOI
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software. Create your slides in Markdown - click the Slides button to check out the dta.
Zikun Li
,
Zhuofu Chen
,
Remi Delacourt
,
Gabriele Oliaro
,
Zeyu Wang
,
Qinghan Chen
,
Shuhuai Lin
,
April Yang
,
Zhihao Zhang
,
Zhuoming Chen
,
Sean Lai
,
Xupeng Miao
,
Zhihao Jia
PDF
Cite
DOI
SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software. Create your slides in Markdown - click the Slides button to check out the dta.
Gabriele Oliaro
,
Zhihao Jia
,
Daniel Campos
,
Aurick Qiao
PDF
Cite
DOI
Cite
×