Concepts

Regenerate Background

Chatbot Evaluation Benchmarks

Benchmarks for LLM chatbots: MT-Bench, MMLU, GSM8K, and related tasks for dialogue quality, reasoning, and trustworthiness.

MTEB, BEIR, STS, and related benchmarks for evaluating text embedding quality across retrieval, classification, and clustering.

Conversational ReAct builds on the original ReAct paradigm by weaving in two critical capabilities: (1) persistent memory so that ...

The Model Context Protocol (MCP) is a structured framework for designing intelligent systems powered by large language models ...

Prompt injection is a critical security vulnerability specific to AI applications that leverage large language models (LLMs) Unlike ...

OWASP Top 10 for LLM applications: prompt injection, supply chain, model theft, training data poisoning, secret management, and secure tool design.

The Reflexion Actor is a cognitive AI architecture that enhances an agent's ability to learn from its mistakes through self-reflection. It ...

Zero Shot React Description is a technique used in AI agents to make decisions dynamically without requiring prior training examples or ...