Best AI Llm Inference Tools
17 tools ranked by community signals and data.
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs.
Nvidia Framework for LLM Inference
NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inferenc
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
Blazingly fast LLM inference.
Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all w
Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at for LLMs-based applicati
MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.
Inference for text-embeddings in Rust, HFOIL Licence.
Inference for text-embeddings in Python
A high-throughput and low-latency inference and serving framework for LLMs and VLs
Efficient Triton Kernels for LLM Training.
A distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices.
Easily deploy any LLM on a VM with minimal configuration, using Ansible.