Best AI Llm Evaluation: Tools
8 tools ranked by community signals and data.
1
lm-evaluation-harness
A framework for few-shot evaluation of language models.
FREE8pts
2
lighteval
a lightweight LLM evaluation suite that Hugging Face has been using internally.
FREE8pts
3
simple-evals
Eval tools by OpenAI.
FREE8pts
4
OLMO-eval
a repository for evaluating open language models.
FREE8pts
5
HELM
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models.
FREE8pts
6
instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out
FREE8pts
7
Giskard
Testing & evaluation library for LLM applications, in particular RAGs
FREE8pts
8
Ragas
a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.
FREE8pts