LLM Engines

Home
AI
Concepts
LLM Engines

Date: 01.29.2025

A reference list of LLM and embedding models by provider: generative models, reasoners, embeddings, and rerankers. Includes specs, use cases, deployment options, and pricing.

text-embedding-3-smallOpen AIEmbeddingText-only · API~$0.02/1M tokens
RAG, semantic search, classification, recommendations, clustering, near-duplicate detection
text-embedding-3-largeOpen AIEmbeddingText-only · API~$0.13/1M tokens
RAG, semantic search, classification, recommendations, multilingual search, complex queries
CLIPOpen AIEmbeddingText + image · LocalFree (open source)
Text–image retrieval, image search by text, zero-shot image classification
GPT-4oOpen AIGenerativeText + image · API~$2.50/1M input, $10/1M output
Chat, coding, analysis, general tasks, vision
GPT-4o miniOpen AIGenerativeText + image · API~$0.15/1M input, $0.60/1M output
Chat, simple tasks, high volume, cost-sensitive use
o1Open AIReasoning / thinkerText-only · API~$15/1M input, $60/1M output
Math, coding contests, science, complex reasoning, chain-of-thought
o1-miniOpen AIReasoning / thinkerText-only · APILower than o1
Lightweight reasoning, cost-sensitive reasoning tasks
BGE-large-en-v1.5Sentence Transformers (Hugging Face)EmbeddingText-only · LocalFree (open source)
RAG, semantic search, retrieval, classification, open-source deployments
GTE-large-en-v1.5Sentence Transformers (Hugging Face)EmbeddingText-only · LocalFree (open source)
RAG, search, classification, multilingual
E5-mistral-7b-instructSentence Transformers (Hugging Face)EmbeddingText-only · LocalFree (open source)
High-quality retrieval, RAG, complex queries
BGE-reranker-largeSentence Transformers (Hugging Face)RerankerText-only · LocalFree (open source)
RAG reranking, passage–query relevance
BGE-reranker-v2-m3Sentence Transformers (Hugging Face)RerankerText-only · LocalFree
Multilingual reranking
Gemini Embedding 2GoogleEmbeddingMultimodal · APItext: ~$0.20/1M tokens
Multimodal RAG, cross-modal search, semantic search over text, images, video, audio, documents
Gemini 2.0 FlashGoogleGenerativeMultimodal · Gemini API, Vertex AI~$0.10/1M input, $0.40/1M output
Chat, fast responses, high volume, cost-sensitive use
Gemini 2.0 ProGoogleGenerativeMultimodal · Gemini API, Vertex AIHigher than Flash
Complex tasks, coding, analysis, higher quality
Gemini 2.5 Pro (Thinking)GoogleReasoning / thinkerMultimodal · Gemini API, Vertex AIHigher than standard Pro
Math, coding, multi-step reasoning, agents
Gemini 2.5 Flash (Thinking)GoogleReasoning / thinkerMultimodal · Gemini API, Vertex AILower than Pro thinking
Lightweight reasoning, cost-sensitive reasoning
Vertex AI Ranking APIGoogleRerankerText-only · Vertex AIPer RAG request
RAG reranking, semantic ranking
embed-v4CohereEmbeddingMultimodal (text + image) · API~$0.10–0.12/1M tokens
Multimodal RAG, text + image + PDF, mixed content
embed-english-v3.0CohereEmbeddingText-only · API~$0.10/1M tokens
English RAG, search, classification
embed-multilingual-v3.0CohereEmbeddingText-only · API~$0.10/1M tokens
Multilingual RAG, search
Command R+CohereGenerativeText-only · API~$2.50/1M input, $10/1M output
RAG, tool use, agents, multilingual
Command RCohereGenerativeText-only · API~$0.15/1M input, $0.60/1M output
RAG, chat, cost-sensitive use
Command R7BCohereGenerativeText-only · API~$0.04/1M input, $0.15/1M output
High volume, simple tasks
Command A ReasoningCohereReasoning / thinkerText-only · APIHigher than Command R+
Agents, tool use, complex reasoning
rerank-v4.0-proCohereRerankerText-only · APIPer search
RAG reranking, high relevance
rerank-v3.5CohereRerankerText-only · API~$2/1,000 searches
Multilingual reranking
voyage-3-largeVoyage AIEmbeddingText-only · APIPremium tier
High-accuracy RAG, semantic search, multilingual retrieval, enterprise retrieval quality
voyage-3Voyage AIEmbeddingText-only · APIMid tier
General-purpose RAG, semantic search, recommendations, classification
voyage-3-liteVoyage AIEmbeddingText-only · APILow-cost tier
High-volume embedding pipelines, cost-sensitive search/RAG
voyage-code-3Voyage AIEmbeddingText + code · APISpecialized tier
Code search, code RAG, repository retrieval, code similarity
rerank-2Voyage AIRerankerText-only · APIPer-request
Re-ranking top-k retrieved documents for higher precision in RAG
rerank-2-liteVoyage AIRerankerText-only · APILower than rerank-2
Cost-sensitive re-ranking at larger scale
Qwen2.5-72B-InstructAlibaba QwenGenerativeText-only · Local or APIOpen weights or provider API
High-quality chat, analysis, coding, complex instruction following
Qwen2.5-32B-InstructAlibaba QwenGenerativeText-only · Local or APIOpen weights / provider API
Strong quality with lower cost/latency than 72B
Qwen2.5-14B-InstructAlibaba QwenGenerativeText-only · Local or APIOpen weights / provider API
Mid-size production assistants, cost-aware coding/chat
Qwen2.5-7B-InstructAlibaba QwenGenerativeText-only · Local or APIOpen weights / provider API
Lightweight assistants, edge/server cost-sensitive workloads
Qwen2.5-Coder-32B-InstructAlibaba QwenGenerative (code-focused)Text-only · Local or APIOpen weights / provider API
Code generation, refactoring, repo Q&A, code reasoning
Qwen2.5-VL-72B-InstructAlibaba QwenMultimodal generativeText + image · Local or APIOpen weights / provider API
Vision + text understanding, document/image question answering
DeepSeek-V3DeepSeekGenerativeText-only · APIProvider-dependent
General chat, coding, analysis, high-quality assistant tasks
DeepSeek-R1DeepSeekReasoning / thinkerText-only · APIProvider-dependent
Reasoning-heavy tasks, math, logic, multi-step planning
DeepSeek-R1-Distill-Llama-70BDeepSeekReasoning / thinkerText-only · Local or APIOpen weights or provider API
Cost-aware reasoning, self-hosted reasoning workloads
DeepSeek-R1-Distill-Qwen-32BDeepSeekReasoning / thinkerText-only · Local or APIOpen weights or provider API
Mid-size reasoning deployments, balanced quality/latency
DeepSeek-Coder-V2DeepSeekGenerative (code-focused)Text-only · Local or APIOpen weights or provider API
Code generation, refactoring, debugging, repo-level coding assistance
DeepSeek-EmbeddingDeepSeekEmbeddingsText-only · APIProvider-dependent
RAG, semantic search, retrieval indexing, clustering
Kimi (flagship)Moonshot AI (Kimi)GenerativeText-only · APIProvider-dependent
General chat, analysis, long-document QA, productivity assistant
Kimi Long-ContextMoonshot AI (Kimi)Generative (long-context)Text-only · APIProvider-dependent
Very large documents, long conversations, report synthesis
Kimi Lightweight / FastMoonshot AI (Kimi)GenerativeText-only · APILower-cost tier
High-volume chat, latency-sensitive assistants, cost-aware production
Kimi Reasoning-OrientedMoonshot AI (Kimi)Reasoning / thinkerText-only · APIProvider-dependent
Multi-step reasoning, math/logic-heavy prompts, complex planning
Kimi Vision-CapableMoonshot AI (Kimi)Multimodal generativeText + image · APIProvider-dependent
Image + text understanding, document screenshots, charts/slides QA