最佳AILlm Evaluation:工具

8 款工具按社区信号和数据排名。

A framework for few-shot evaluation of language models.

a lightweight LLM evaluation suite that Hugging Face has been using internally.

Eval tools by OpenAI.

a repository for evaluating open language models.

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models.

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out

Testing & evaluation library for LLM applications, in particular RAGs

a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.