LLM Engines
A reference list of LLM and embedding models by provider: generative models, reasoners, embeddings, and rerankers. Includes specs, use cases, deployment options, and pricing.
- text-embedding-3-smallOpen AIEmbedding~$0.02/1M tokens
RAG, semantic search, classification, recommendations, clustering, near-duplicate detection
- text-embedding-3-largeOpen AIEmbedding~$0.13/1M tokens
RAG, semantic search, classification, recommendations, multilingual search, complex queries
- CLIPOpen AIEmbeddingFree (open source)
Text–image retrieval, image search by text, zero-shot image classification
- GPT-4oOpen AIGenerative~$2.50/1M input, $10/1M output
Chat, coding, analysis, general tasks, vision
- GPT-4o miniOpen AIGenerative~$0.15/1M input, $0.60/1M output
Chat, simple tasks, high volume, cost-sensitive use
- o1Open AIReasoning / thinker~$15/1M input, $60/1M output
Math, coding contests, science, complex reasoning, chain-of-thought
- o1-miniOpen AIReasoning / thinkerLower than o1
Lightweight reasoning, cost-sensitive reasoning tasks
- BGE-large-en-v1.5Sentence Transformers (Hugging Face)EmbeddingFree (open source)
RAG, semantic search, retrieval, classification, open-source deployments
- GTE-large-en-v1.5Sentence Transformers (Hugging Face)EmbeddingFree (open source)
RAG, search, classification, multilingual
- E5-mistral-7b-instructSentence Transformers (Hugging Face)EmbeddingFree (open source)
High-quality retrieval, RAG, complex queries
- BGE-reranker-largeSentence Transformers (Hugging Face)RerankerFree (open source)
RAG reranking, passage–query relevance
- BGE-reranker-v2-m3Sentence Transformers (Hugging Face)RerankerFree
Multilingual reranking
- Gemini Embedding 2GoogleEmbeddingtext: ~$0.20/1M tokens
Multimodal RAG, cross-modal search, semantic search over text, images, video, audio, documents
- Gemini 2.0 FlashGoogleGenerative~$0.10/1M input, $0.40/1M output
Chat, fast responses, high volume, cost-sensitive use
- Gemini 2.0 ProGoogleGenerativeHigher than Flash
Complex tasks, coding, analysis, higher quality
- Gemini 2.5 Pro (Thinking)GoogleReasoning / thinkerHigher than standard Pro
Math, coding, multi-step reasoning, agents
- Gemini 2.5 Flash (Thinking)GoogleReasoning / thinkerLower than Pro thinking
Lightweight reasoning, cost-sensitive reasoning
- Vertex AI Ranking APIGoogleRerankerPer RAG request
RAG reranking, semantic ranking
- embed-v4CohereEmbedding~$0.10–0.12/1M tokens
Multimodal RAG, text + image + PDF, mixed content
- embed-english-v3.0CohereEmbedding~$0.10/1M tokens
English RAG, search, classification
- embed-multilingual-v3.0CohereEmbedding~$0.10/1M tokens
Multilingual RAG, search
- Command R+CohereGenerative~$2.50/1M input, $10/1M output
RAG, tool use, agents, multilingual
- Command RCohereGenerative~$0.15/1M input, $0.60/1M output
RAG, chat, cost-sensitive use
- Command R7BCohereGenerative~$0.04/1M input, $0.15/1M output
High volume, simple tasks
- Command A ReasoningCohereReasoning / thinkerHigher than Command R+
Agents, tool use, complex reasoning
- rerank-v4.0-proCohereRerankerPer search
RAG reranking, high relevance
- rerank-v3.5CohereReranker~$2/1,000 searches
Multilingual reranking
- voyage-3-largeVoyage AIEmbeddingPremium tier
High-accuracy RAG, semantic search, multilingual retrieval, enterprise retrieval quality
- voyage-3Voyage AIEmbeddingMid tier
General-purpose RAG, semantic search, recommendations, classification
- voyage-3-liteVoyage AIEmbeddingLow-cost tier
High-volume embedding pipelines, cost-sensitive search/RAG
- voyage-code-3Voyage AIEmbeddingSpecialized tier
Code search, code RAG, repository retrieval, code similarity
- rerank-2Voyage AIRerankerPer-request
Re-ranking top-k retrieved documents for higher precision in RAG
- rerank-2-liteVoyage AIRerankerLower than rerank-2
Cost-sensitive re-ranking at larger scale
- Qwen2.5-72B-InstructAlibaba QwenGenerativeOpen weights or provider API
High-quality chat, analysis, coding, complex instruction following
- Qwen2.5-32B-InstructAlibaba QwenGenerativeOpen weights / provider API
Strong quality with lower cost/latency than 72B
- Qwen2.5-14B-InstructAlibaba QwenGenerativeOpen weights / provider API
Mid-size production assistants, cost-aware coding/chat
- Qwen2.5-7B-InstructAlibaba QwenGenerativeOpen weights / provider API
Lightweight assistants, edge/server cost-sensitive workloads
- Qwen2.5-Coder-32B-InstructAlibaba QwenGenerative (code-focused)Open weights / provider API
Code generation, refactoring, repo Q&A, code reasoning
- Qwen2.5-VL-72B-InstructAlibaba QwenMultimodal generativeOpen weights / provider API
Vision + text understanding, document/image question answering
- DeepSeek-V3DeepSeekGenerativeProvider-dependent
General chat, coding, analysis, high-quality assistant tasks
- DeepSeek-R1DeepSeekReasoning / thinkerProvider-dependent
Reasoning-heavy tasks, math, logic, multi-step planning
- DeepSeek-R1-Distill-Llama-70BDeepSeekReasoning / thinkerOpen weights or provider API
Cost-aware reasoning, self-hosted reasoning workloads
- DeepSeek-R1-Distill-Qwen-32BDeepSeekReasoning / thinkerOpen weights or provider API
Mid-size reasoning deployments, balanced quality/latency
- DeepSeek-Coder-V2DeepSeekGenerative (code-focused)Open weights or provider API
Code generation, refactoring, debugging, repo-level coding assistance
- DeepSeek-EmbeddingDeepSeekEmbeddingsProvider-dependent
RAG, semantic search, retrieval indexing, clustering
- Kimi (flagship)Moonshot AI (Kimi)GenerativeProvider-dependent
General chat, analysis, long-document QA, productivity assistant
- Kimi Long-ContextMoonshot AI (Kimi)Generative (long-context)Provider-dependent
Very large documents, long conversations, report synthesis
- Kimi Lightweight / FastMoonshot AI (Kimi)GenerativeLower-cost tier
High-volume chat, latency-sensitive assistants, cost-aware production
- Kimi Reasoning-OrientedMoonshot AI (Kimi)Reasoning / thinkerProvider-dependent
Multi-step reasoning, math/logic-heavy prompts, complex planning
- Kimi Vision-CapableMoonshot AI (Kimi)Multimodal generativeProvider-dependent
Image + text understanding, document screenshots, charts/slides QA