Beste AI-tools voor Llm Inference

17 tools gerangschikt op communitysignalen en data.

1
SGLang

SGLang is a fast serving framework for large language models and vision language models.

Gratis8 pt
2
vLLM

A high-throughput and memory-efficient inference and serving engine for LLMs.

Gratis8 pt
3
TensorRT-LLM

Nvidia Framework for LLM Inference

Gratis8 pt
4
FasterTransformer

NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)

Gratis8 pt
5
MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inferenc

Gratis8 pt
6
exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Gratis8 pt
7
FastChat

A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.

Gratis8 pt
8
mistral.rs

Blazingly fast LLM inference.

Gratis8 pt
9
SkyPilot

Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all w

Gratis8 pt
10
OpenLLM

Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at for LLMs-based applicati

Gratis8 pt
11
DeepSpeed-Mii

MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.

Gratis8 pt
12
Text-Embeddings-Inference

Inference for text-embeddings in Rust, HFOIL Licence.

Gratis8 pt
13
Infinity

Inference for text-embeddings in Python

Gratis8 pt
14
LMDeploy

A high-throughput and low-latency inference and serving framework for LLMs and VLs

Gratis8 pt
15
Liger-Kernel

Efficient Triton Kernels for LLM Training.

Gratis8 pt
16
prima.cpp

A distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices.

Gratis8 pt
17
deploy-llms-with-ansible

Easily deploy any LLM on a VM with minimal configuration, using Ansible.

Gratis8 pt