vllm-triton-backend: How to get state-of-the-art performance on NVIDIA and AMD with just tritonBurkhard RingleinThomas Parnellet al.2025PyTorch Conference 2025
Accelerating Decision-Tree-based Inference through Adaptive ParallelizationJan van Lunteren2023PACT 2023