Beste AI-tools voor Llm Inference

17 tools gerangschikt op communitysignalen en data.

SGLang

SGLang is a fast serving framework for large language models and vision language models.

Gratis8 pt

vLLM

A high-throughput and memory-efficient inference and serving engine for LLMs.

Gratis8 pt

TensorRT-LLM

Nvidia Framework for LLM Inference

Gratis8 pt

FasterTransformer

NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)

Gratis8 pt

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inferenc

Gratis8 pt

exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Gratis8 pt

FastChat

A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.

Gratis8 pt

mistral.rs

Blazingly fast LLM inference.

Gratis8 pt

SkyPilot

Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all w

Gratis8 pt

OpenLLM

Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at for LLMs-based applicati

Gratis8 pt

DeepSpeed-Mii

MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.

Gratis8 pt

Text-Embeddings-Inference

Inference for text-embeddings in Rust, HFOIL Licence.

Gratis8 pt

Infinity

Inference for text-embeddings in Python

Gratis8 pt

LMDeploy

A high-throughput and low-latency inference and serving framework for LLMs and VLs

Gratis8 pt

Liger-Kernel

Efficient Triton Kernels for LLM Training.

Gratis8 pt

prima.cpp

A distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices.

Gratis8 pt

deploy-llms-with-ansible

Easily deploy any LLM on a VM with minimal configuration, using Ansible.

Gratis8 pt