Slava Ukraine

LangSmith Makes Debugging and Improving LLMs Effortless

LangSmith is a unified platform for building production-grade large language model (LLM) applications. It brings together observability, evaluation, and prompt-engineering workflows so teams can ship AI agents with confidence—whether they’re using LangChain or any other LLM framework.

LangSmith Makes Debugging and Improving LLMs Effortless
LangSmith Makes Debugging and Improving LLMs Effortless

At its core, LangSmith offers three pillars of functionality:

  • 🔥Observability: Capture and inspect detailed traces of every LLM call, and configure metrics, dashboards, and alerts to catch issues in real time.
  • 🔥Evals: Run automated and human-in-the-loop evaluations on production traffic to score performance, track regressions, and gather qualitative feedback.
  • 🔥Prompt Engineering: Version and collaborate on prompts directly in the platform, so you can iterate quickly and roll back changes safely

Designed to be framework-agnostic, LangSmith integrates seamlessly with LangChain (or LangGraph) via a single environment variable, but can also work with any custom tooling or direct API calls you already have in place LangSmith.

This means teams can adopt end-to-end observability and testing without ripping out existing code—accelerating the journey from prototype to reliable, production-ready AI.

Here’s a rundown of the main JS/TS modules, classes, and functions you’ll find in the langsmith SDK, along with what each one does:

  • Client The core class for interacting with the LangSmith REST API. Through its methods you can create and update runs, manage datasets & examples, submit feedback, run & log Evals, handle annotation queues, version prompts, and more—all via a single, unified client instance.
  • traceable A function decorator / higher-order wrapper that instruments your entire function or pipeline. Any nested LLM calls or custom spans become children under a single trace in LangSmith, giving you end-to-end observability with zero manual span management.
  • wrapOpenAI A plug-and-play wrapper for the OpenAI JS/TS client that automatically traces every LLM call you make—no boilerplate tracing code required.
  • runTrees Utilities for extracting, serializing, and working with hierarchical “run trees” (i.e., nested traces). Useful if you need to post-process or analyze complex multi-step pipelines programmatically.
  • evaluation namespace A collection of functions to define, execute, and list automated and human-in-the-loop evaluations (e.g. createEvaluation, runEvaluation, listEvaluations). Hooks directly into LangSmith’s Evals engine so you can score model outputs, detect regressions, and collect qualitative feedback.
  • schemas TypeScript definitions and runtime validators for core LangSmith entities (Run, Dataset, Example, EvaluationResult, etc.), ensuring strong typing and correct structure when sending or receiving data.
  • vercel (e.g. AISDKExporter / VercelExporter) OpenTelemetry exporters and helpers tailored for Vercel/Next.js serverless environments. Automatically capture and forward edge-function traces into LangSmith.
  • singletons A tiny module to manage a global Client instance so you don’t re-initialize the client across multiple files or hot-reloads.
  • utils (including anonymizer) Low-level helpers for payload sanitization, metadata extraction, and other common transformations before sending data to LangSmith.
  • wrappers Additional client wrappers for other LLM providers (beyond OpenAI), enabling transparent tracing integration for whichever SDK you use.
  • jest / vitest plugins Testing integrations that let you define LangSmith datasets and Evals as Jest or Vitest test cases—so your CI suite can automatically generate evaluation reports and feedback.

import { ChatOpenAI } from "@langchain/openai";
import { AgentExecutor, createOpenAIFunctionsAgent } from "langchain/agents";
import { Tool } from "langchain/tools";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { LangChainTracer } from "langchain/callbacks";
import dotenv from "dotenv";
import readline from "readline";

dotenv.config();

// === LLM Model ===
const model = new ChatOpenAI({
  temperature: 0,
  modelName: "gpt-3.5-turbo",
  apiKey: process.env.OPENAI_API_KEY,
});

// === Tools ===
class CalculatorTool extends Tool {
  name = "calculator";
  description = "Useful for when you need to answer questions about math";

  async _call(input: string) {
    try {
      return eval(input).toString();
    } catch (error) {
      console.error(error);
      return "Error: " + error;
    }
  }
}

class TimerTool extends Tool {
  name = "timer";
  description = "Useful for when you need to track time";

  async _call(input: string) {
    return new Date().toLocaleTimeString();
  }
}

class WeatherTool extends Tool {
  name = "weather";
  description = "Fetches the current weather for a given city. Provide the city name as input.";

  async _call(city: string) {
    if (!city) return "Error: Please provide a city name.";

    try {
      const response = await fetch(`${process.env.WEATHER_API_URL}?q=${city}&appid=${process.env.WEATHER_API_KEY}&units=metric`);
      const data = await response.json();

      if (data.cod !== 200) return `Error: ${data.message}`;

      return `The weather in ${data.name} is ${data.weather[0].description} with a temperature of ${data.main.temp}°C.`;
    } catch (error) {
      return "Error fetching weather data.";
    }
  }
}

// === Main Run Function ===
async function run() {
  const tools = [new CalculatorTool(), new TimerTool(), new WeatherTool()];
  let chat_history: { role: string, content: string }[] = [];

  const prompt = ChatPromptTemplate.fromMessages([
    ["system", "You are a helpful AI assistant with access to tools. Follow these steps: " +
      "1. Think about the user's question " +
      "2. If a tool is needed, decide which one to use" +
      "3. Call the tool and observe its result " +
      "4. Respond to the user in a structured format " +
      "Do not respond until you have observed the tool's result."
    ],
    new MessagesPlaceholder("chat_history"),
    ["human", "{input}"],
    new MessagesPlaceholder("agent_scratchpad"),
  ]);

  const agent = await createOpenAIFunctionsAgent({
    llm: model,
    tools,
    prompt,
  });

  const executor = new AgentExecutor({
    agent,
    tools,
    callbacks: [new LangChainTracer()],
  });

  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  const askQuestion = async () => {
    rl.question("You: ", async (input) => {
      const result = await executor.invoke(
        { input, chat_history, agent_scratchpad: [] },
        {
          runName: "CLI Agent Run",
          tags: ["cli", "langsmith", "tools"],
        }
      );

      console.log("🤖 Agent:", result.output);

      chat_history.push({ role: 'user', content: input });
      chat_history.push({ role: 'assistant', content: result.output });

      askQuestion();
    });
  };

  askQuestion();
}

run().catch(console.error);

Conclusions

LangSmith unifies observability, evaluation, and prompt‐engineering into a single SDK—so you can build, ship, and maintain LLM apps with confidence:

  • End-to-end tracing: Automatically capture every LLM call (and nested spans) with zero boilerplate, viatraceable or wrapOpenAI.
  • Continuous evaluation: Define automated and human-in-the-loop Evals to score your models, detect regressions, and gather qualitative feedback right in CI or production.
  • Prompt versioning: Collaborate on, version, and roll back prompts directly in LangSmith—no more guessing which prompt delivered that edge-case result.
  • Framework-agnostic: Works seamlessly with LangChain/LangGraph or any custom LLM code, via a single ENV variable and the flexible JS/TS client.

Whether you’re prototyping a POC or running mission-critical AI services, LangSmith gives you the visibility and feedback loop you need to iterate safely and ship faster. Next, try integrating:

  • A CI-driven Eval suite (Jest/Vitest plugin) to catch model regressions before they hit prod.
  • An OpenTelemetry exporter (AISDKExporter) in your Vercel edge functions for real-time performance dashboards.
  • Prompt A/B testing with user feedback loops to continually optimize your conversational experiences.

External Resources

LangSmith is a unified observability & evals platform where teams can debug, test, and monitor AI app performance — whether building with LangChain or not.
What is LangSmith and why should I care as a developer?