#

llm-evaluation

Here are 148 public repositories matching this topic...

langfuse / langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

Updated Dec 25, 2024
TypeScript

promptfoo / promptfoo

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

testing ci evaluation ci-cd pentesting cicd vulnerability-scanners prompts evaluation-framework red-teaming rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated Dec 25, 2024
TypeScript

giskard

Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for AI & LLM systems

Updated Dec 19, 2024
Python

confident-ai / deepeval

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Dec 24, 2024
Python

comet-ml / opik

From RAG chatbots to code assistants to complex agentic pipelines and beyond, build LLM systems that run better, faster, and cheaper with tracing, evaluations, and dashboards.

open-source playground openai llm prompt-engineering langchain llmops llama-index llm-evaluation llm-observability

Updated Dec 25, 2024
Python

NVIDIA / garak

the LLM vulnerability scanner

ai vulnerability-assessment security-scanners llm-security llm-evaluation

Updated Dec 24, 2024
Python

AutoRAG

Marker-Inc-Korea / AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

python open-source qa benchmarking ops pipeline analysis optimization evaluation embeddings automl document-parser rag llm retrieval-augmented-generation llm-ops llm-evaluation rag-evaluation

Updated Dec 19, 2024
Python

Helicone / helicone

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

open-source playground monitoring analytics evaluation ycombinator openai gpt large-language-models llm prompt-engineering langchain llmops llama-index prompt-management llm-evaluation llm-observability agent-monitoring llm-cost

Updated Dec 24, 2024
TypeScript

PacktPublishing / LLM-Engineers-Handbook

The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices

aws rag mlops llm llmops genai fine-tuning-llm llm-evaluation ml-system-design

Updated Dec 23, 2024
Python

agenta

Agenta-AI / agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.

prompt-engineering prompt-management llm-tools llm-framework llm-playground llm-platform llm-evaluation rag-evaluation llm-monitoring llm-as-a-judge llm-observability llmops-platform

Updated Dec 25, 2024
Python

lmnr-ai / lmnr

Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.

open-source ai monitoring analytics evaluation self-hosted rust-lang developer-tools agents observability pipeline-builder aiops rag ai-observability llmops evals llm-evaluation llm-observability llm-workflow

Updated Dec 26, 2024
TypeScript

microsoft / prompty

Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandability, and portability for developers.

promptengineering llms generative-ai llm-evaluation

Updated Dec 18, 2024
Python

relari-ai / continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

information-retrieval evaluation-metrics evaluation-framework rag llmops retrieval-augmented-generation llm-evaluation

Updated Dec 20, 2024
Python

onejune2018 / Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.

nlp benchmark machine-learning leaderboard evaluation dataset openai llama bert rag awsome-list gpt3 llm awsome-lists chatgpt large-language-model chatglm qwen llm-evaluation

Updated Oct 25, 2024

kimtth / awesome-azure-openai-llm

a curated list of 🌌 Azure OpenAI, 🦙Large Language Models, and references with notes.

agent awesome cheatsheet openai awesome-list gpt copilot rag azure-openai llm prompt-engineering chatgpt langchain llama-index semantic-kernel llm-agent llm-evaluation

Updated Dec 23, 2024
Python

palico-ai / palico-ai

Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework

nodejs javascript docker typescript ai full-stack openai autogen rag llm portkey langchain anthropic llamaindex langchain-js llm-agent llm-framework llm-evaluation llm-observability

Updated Nov 26, 2024
TypeScript

Value4AI / Awesome-LLM-in-Social-Science

Awesome papers involving LLMs in Social Science.

social-network simulation-environment policy economics psychology alignment social-science large-language-models llms llm-agent llm-evaluation

Updated Dec 20, 2024

athina-ai / athina-evals

Python SDK for running evaluations on LLM generated responses

evaluation evaluation-metrics evaluation-framework llmops llm-eval llm-ops llm-evaluation llm-evaluation-toolkit

Updated Dec 24, 2024
Python

Psycoy / MixEval

The official evaluation suite and dynamic data release for MixEval.

benchmark evaluation benchmarking-suite evaluation-framework benchmarking-framework foundation-models large-language-models large-language-model llm-inference llm-evaluation large-multimodal-models llm-evaluation-framework benchmark-mixture mixeval

Updated Nov 10, 2024
Python

iMeanAI / WebCanvas

Connect agents to live web environments evaluation.

agent benchmark-framework llm-agent llm-evaluation

Updated Dec 15, 2024
Python

Improve this page

Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."