LLM Engines
A reference list of LLM and embedding models by provider: generative models, reasoners, embeddings, and rerankers. Each row lists modalities, parameters, API or repo id, use cases, deployment, pricing, and benchmark metrics where public numbers exist (vendor papers, MTEB, or common eval suites). Percentages are point-in-time; task splits and prompts differ across sources.
- text-embedding-3-smallOpen AIEmbedding
Best forRAG, semantic search, classification, recommendations, clustering, near-duplicate detection
Price~$0.02/1M tokens
DeploymentAPI
ModalitiesText-only
Parameters1,536 dims, 8,191 tokens
Prosstrong MTEB scores, dimension reduction support
ConsAPI-only (no local), text-only
Modeltext-embedding-3-small
ScaleSmall / distilled
Benchmarking- MTEB (English, mean, 56 tasks)62.3%
- text-embedding-3-largeOpen AIEmbedding
Best forRAG, semantic search, classification, recommendations, multilingual search, complex queries
Price~$0.13/1M tokens
DeploymentAPI
ModalitiesText-only
Parameters3,072 dims, 8,191 tokens
ProsHigher quality than 3-small, strong MTEB scores, dimension reduction support
ConsAPI-only (no local), higher cost (~$0.13/1M tokens), text-only
Modeltext-embedding-3-large
ScaleLarge
Benchmarking- MTEB (English, aggregate)~64.6%
- CLIP (OpenAI / open source)Open AIEmbedding
Best forText–image retrieval, image search by text, zero-shot image classification, cross-modal similarity
PriceFree (open source)
DeploymentLocal (Hugging Face, PyTorch)
ModalitiesText + image
ParametersDepends on variant (ViT-B/32, ViT-L/14, etc.); ~224×224 images; 512–768 dims
ProsOpen source, runs locally, strong text–image alignment, widely used
ConsText + image only (no video/audio), older than newer multimodal models
Modelopenai/clip-vit-base-patch32, laion/CLIP-ViT-L-14, etc.
ScaleSmall / distilled
Benchmarking- ImageNet zero-shot (ViT-B/32, typical)~76.2%
- GPT-4oOpen AIGenerative
Best forChat, coding, analysis, general tasks, vision
Price~$2.50/1M input, $10/1M output
DeploymentAPI
ModalitiesText + image
Parameters128K context, multimodal (text + image)
ProsFast, strong performance, vision
ConsHigher cost than mini
Modelgpt-4o
ScaleLarge
Benchmarking- MMLU (5-shot, approx.)~87%
- GPT-4o miniOpen AIGenerative
Best forChat, simple tasks, high volume, cost-sensitive use
Price~$0.15/1M input, $0.60/1M output
DeploymentAPI
ModalitiesText + image
Parameters128K context, multimodal (text + image)
ProsCheaper, fast
ConsLess capable than GPT-4o
Modelgpt-4o-mini
ScaleSmall / distilled
Benchmarking- MMLU (approx.)~82%
- GPT-5.4 miniOpen AIReasoning / thinker
Best forLightweight reasoning, high-volume agents, coding, cost-sensitive workloads
Price~$0.75/1M input, $4.50/1M output
DeploymentAPI
ModalitiesText + image (input)
Parameters400K context, reasoning tokens, image input
ProsStrong mini tier, tools (Responses API), lower cost than GPT-5.4
ConsLess capable than GPT-5.4; not a reranker
Modelgpt-5.4-mini
ScaleSmall / distilled
Benchmarking- ReportedSee OpenAI model card / latest model guide
- BGE-large-en-v1.5 (BAAI), BAAI General Embedding, BAAI = Beijing Academy of Artificial IntelligenceSentence Transformers (Hugging Face)Embedding
Best forRAG, semantic search, retrieval, classification, open-source deployments
PriceFree (open source)
DeploymentLocal
ModalitiesText-only
Parameters1,024 dims, 512 tokens, ~335M params
ProsStrong MTEB scores, open source, runs locally, good quality/size trade-off
ConsText-only, shorter context than some models
ModelBAAI/bge-large-en-v1.5
ScaleLarge
Benchmarking- MTEB (English, v1)~64.2%
- GTE-large-en-v1.5 (Alibaba) - General Text EmbeddingsSentence Transformers (Hugging Face)Embedding
Best forRAG, search, classification, English long-context
PriceFree (open source)
DeploymentLocal
ModalitiesText-only
Parameters1,024 dims, 8,192 tokens, ~335M params
ProsHigh MTEB score, long context
ConsEnglish-only; other GTE checkpoints for multilingual
ModelAlibaba-NLP/gte-large-en-v1.5
ScaleLarge
Benchmarking- MTEB (English, v1)~64.0%
- E5-mistral-7b-instructSentence Transformers (Hugging Face)Embedding
Best forHigh-quality retrieval, RAG, complex queries
PriceFree (open source)
DeploymentLocal
ModalitiesText-only
Parameters4,096 dims, 32K tokens, 7B params
ProsTop MTEB performance, long context
ConsHeavy; needs more GPU memory
Modelintfloat/e5-mistral-7b-instruct
ScaleSmall / distilled
Benchmarking- MTEB (English, v1)~66.5%
- BGE-reranker-largeSentence Transformers (Hugging Face)Reranker
Best forRAG reranking, passage–query relevance
PriceFree (open source)
DeploymentLocal
ModalitiesText-only
ParametersCross-encoder, query + passage → score
ProsStrong MTEB, works with Sentence Transformers
ConsText-only
ModelBAAI/bge-reranker-large
ScaleLarge
Benchmarking- MS MARCO / BEIR rerankingstrong cross-encoder tier
- BGE-reranker-v2-m3Sentence Transformers (Hugging Face)Reranker
Best forMultilingual reranking
PriceFree
DeploymentLocal
ModalitiesText-only (multilingual)
ParametersMultilingual cross-encoder
Pros100+ languages
ConsHeavier than base
ModelBAAI/bge-reranker-v2-m3
ScaleSmall / distilled
Benchmarking- Multilingual reranking (mMARCO-style)strong
- Gemini Embedding 2GoogleEmbedding
Best forMultimodal RAG, cross-modal search, semantic search over text, images, video, audio, documents
Pricetext ~$0.20/1M tokens, images ~$0.45/1M tokens, audio ~$6.50/1M tokens, video ~$12.00/1M tokens
DeploymentAPI
ModalitiesText, image, video, audio, documents
Parameters3,072 dims, 8K tokens (text), images (up to 6), video (~120s), audio (~80s)
ProsNative multimodal, single embedding space, Matryoshka-style compression, 100+ languages
ConsAPI-only (no local), higher cost than text-only models
Modelgemini-embedding-2-preview
ScaleLarge
Benchmarking- ReportedMultimodal embedding (see Google model card)
- Gemini Embedding 2 (Vertex AI)GoogleEmbedding
Best forMultimodal RAG, cross-modal search, semantic search over text, images, video, audio, documents
Pricetext: ~$0.15 / 1M tokens (cheaper than Gemini API), images/audio/video: similar to Gemini API
DeploymentAPI (Vertex AI)
ModalitiesText, image, video, audio, documents
Parameters3,072 dims, 8K tokens (text), images (up to 6), video (~120s), audio (~80s) · Vertex text embedding ~$0.15/1M vs Gemini API ~$0.20/1M is typical; confirm on current pricing pages.
ProsNative multimodal, single embedding space, Matryoshka-style compression, 100+ languages, enterprise support, VPC integration
ConsAPI-only (no local), higher cost than text-only models
ModelVertex AI (text-multilingual-embedding-002 or gemini-embedding-2)
ScaleLarge
Benchmarking- ReportedMultimodal embedding (Vertex AI, same family as Gemini API)
- Gemini 2.5 Pro (Thinking mode)GoogleReasoning / thinker
Best forMath, coding, multi-step reasoning, agents
Price~$1.25/1M in, $10/1M out (prompt ≤200k tokens); ~$2.50/1M in, $15/1M out (>200k)
DeploymentGemini API, Vertex AI
ModalitiesText, image, audio, video
ParametersExtended reasoning, 1M context
ProsStrong reasoning, multimodal
ConsSlower, higher cost
Modelgemini-2.5-pro (with thinking enabled)
ScaleLarge
Benchmarking- ReportedReasoning + multimodal (see Google eval tables)
- Gemini 3.1 Pro PreviewGoogleReasoning / thinker
Best forComplex reasoning, coding, agents, multimodal analysis
PricePaid (split by prompt length): input $2.00/1M (≤200k tokens) or $4.00/1M (>200k); output $12.00/1M or $18.00/1M
DeploymentGemini API, Vertex AI
ModalitiesText, image, audio, video (per model doc)
ParametersSee model doc (preview; limits updated there)
ProsNewest Pro tier in Gemini 3.1 line, strong agentic / coding positioning
ConsPreview (behavior/rates may change); higher cost than Flash
Modelgemini-3.1-pro-preview
ScaleLarge
Benchmarking- ReportedSee Google evals / model doc
- Gemini 2.5 Flash (Thinking)GoogleReasoning / thinker
Best forLightweight reasoning, cost-sensitive reasoning
Price~$0.30/1M input, $2.50/1M output (thinking counted in output; audio in $1/1M)
DeploymentGemini API, Vertex AI
ModalitiesText, image, audio, video
ParametersSmaller reasoning model
ProsCheaper than Pro thinking
ConsLess capable than Pro
Modelgemini-2.5-flash-thinking
ScaleSmall / distilled
Benchmarking- ReportedReasoning tier below Pro thinking (see model card)
- Gemini 3 Flash Preview (Thinking)GoogleReasoning / thinker
Best forFast Gemini 3 Flash tier, reasoning, agents, search/grounding-heavy work
PricePaid (standard): ~$0.50/1M input (text/image/video), ~$1.00/1M (audio); ~$3.00/1M output (thinking in output)
DeploymentGemini API, Vertex AI
ModalitiesText, image, audio, video, PDF
ParametersPreview; 1M in / 65k out; thinking via API config
ProsNewer Flash line than 2.5; strong speed + capability mix
ConsPreview; higher $/1M than 2.5 Flash; stricter limits than stable models
Modelgemini-3-flash-preview
ScaleSmall / distilled
Benchmarking- ReportedSee Google evals / model doc
- Gemma 4 E2B (Google / DeepMind)GoogleGenerative
Best forOn-device and edge chat, agents, coding; multimodal (text, image, video, audio) on small sizes
PriceOpen weights (infra cost); see AI Studio / Vertex if hosted
DeploymentLocal, Hugging Face, Kaggle, Ollama, Vertex AI
ModalitiesText, image, video, audio
ParametersE2B line, PLE architecture, 128K context (see model card for parameterization)
ProsTiny footprint for Gemma 4 line, native multimodal on E2B/E4B, function calling, thinking modes
ConsLower ceiling than 31B / MoE variants
Modelgoogle/gemma-4-E2B-it
ScaleSmall / distilled
Benchmarking- ReportedSee Gemma 4 model card
- Gemma 4 E4B (Google / DeepMind)GoogleGenerative
Best forEdge and browser-class workloads, agents, coding; multimodal (text, image, video, audio)
PriceOpen weights (infra cost); see AI Studio / Vertex if hosted
DeploymentLocal, Hugging Face, Kaggle, Ollama, Vertex AI
ModalitiesText, image, video, audio
ParametersE4B line, PLE architecture, 128K context (see model card for parameterization)
ProsStrong quality for size class, multimodal on small Gemma 4, Apache-2.0-style Gemma terms
ConsHigher static VRAM than naive 4B due to PLE tables
Modelgoogle/gemma-4-E4B-it
ScaleSmall / distilled
Benchmarking- ReportedSee Gemma 4 model card
- Gemma 4 26B A4B MoE (Google / DeepMind)GoogleReasoning / thinker
Best forHigh-throughput reasoning, coding, agents; ~4B active params per token
PriceOpen weights (infra cost); full expert set loaded at inference
DeploymentLocal, Hugging Face, Kaggle, Vertex AI
ModalitiesText, image, video, audio
Parameters26B MoE (A4B active per token), 256K context, function calling, thinking modes
ProsEfficient per-token cost vs dense 26B-class, strong reasoning positioning
ConsMemory footprint closer to full 26B than to 4B active
Modelgoogle/gemma-4-26B-A4B-it
ScaleLarge
Benchmarking- ReportedSee Gemma 4 model card
- Gemma 4 31B (Google / DeepMind)GoogleReasoning / thinker
Best forDeep reasoning, coding, enterprise and server-grade open-weight deployment
PriceOpen weights (infra cost)
DeploymentLocal, Hugging Face, Kaggle, Vertex AI
ModalitiesText, image, video, audio
Parameters31B dense, 256K context, multimodal (text, image, video, audio), function calling
ProsTop open-weight Gemma 4 dense tier, long context, agentic tooling
ConsHeavy GPU/TPU requirements at full precision
Modelgoogle/gemma-4-31B-it
ScaleLarge
Benchmarking- ReportedSee Gemma 4 model card
- Vertex AI Ranking APIGoogleReranker
Best forRAG reranking, semantic ranking
Price$1.00 USD per 1,000 ranking calls
DeploymentVertex AI
ModalitiesText-only
ParametersSemantic reranker, <100ms latency
ProsLow latency, strong performance
ConsVertex AI only, API-only
Modelsemantic-ranker-default-004, semantic-ranker-fast-004
ScaleSmall / distilled
Benchmarking- ReportedSemantic reranker, <100ms latency
- embed-english-v3.0CohereEmbedding
Best forEnglish RAG, search, classification
Price~$0.10/1M tokens
DeploymentAPI
ModalitiesText + image
Parameters1,024 dims, English; API supports text and images
ProsStrong search, RAG
ConsEnglish only
Modelembed-english-v3.0
ScaleLarge
Benchmarking- MTEB / retrievalstrong English API tier
- embed-v4CohereEmbedding
Best forMultimodal RAG, text + image + PDF, mixed content
Price~$0.10–0.12/1M tokens
DeploymentAPI
ModalitiesText + image
ParametersMultimodal (text, image), unified embedding space
ProsMultimodal, high-res images, PDFs
ConsAPI-only
Modelembed-v4
ScaleLarge
Benchmarking- BenchmarkingN/A
- embed-multilingual-v3.0CohereEmbedding
Best forMultilingual RAG, search
Price~$0.10/1M tokens
DeploymentAPI
ModalitiesText + image
Parameters1,024 dims, 100+ languages; API supports text and images
ProsMultilingual
ConsAPI-only
Modelembed-multilingual-v3.0
ScaleLarge
Benchmarking- BenchmarkingN/A
- Command RCohereGenerative
Best forRAG, chat, cost-sensitive use
Price~$0.15/1M input, $0.60/1M output
DeploymentAPI
ModalitiesText-only
Parameters128K context
ProsCheaper than R+, solid RAG
ConsLess capable than R+
Modelcommand-r
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- Command R+CohereGenerative
Best forRAG, tool use, agents, multilingual
Price~$2.50/1M input, $10/1M output
DeploymentAPI
ModalitiesText-only
Parameters128K context, 23 languages
ProsStrong RAG, tool use, multilingual
ConsHigher cost than Command R
Modelcommand-r-plus
ScaleLarge
Benchmarking- BenchmarkingN/A
- Command R7BCohereGenerative
Best forHigh volume, simple tasks
Price~$0.04/1M input, $0.15/1M output
DeploymentAPI
ModalitiesText-only
Parameters7B params
ProsFast, cheap
ConsLess capable
Modelcommand-r7b
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- Command A ReasoningCohereReasoning / thinker
Best forAgents, tool use, complex reasoning
PriceHigher than Command R+
DeploymentAPI
ModalitiesText-only
Parameters111B params, 256K context, 23 languages
ProsStrong reasoning, tool use, agentic
ConsAPI-only, higher cost
Modelcommand-a-reasoning-08-2025
ScaleLarge
Benchmarking- BenchmarkingN/A
- rerank-v4.0-proCohereReranker
Best forRAG reranking, high relevance
PricePer search
DeploymentAPI
ModalitiesText-only
Parameters32K context
ProsHigh quality
ConsSlower than fast
Modelrerank-v4.0-pro
ScaleLarge
Benchmarking- BenchmarkingN/A
- rerank-v3.5CohereReranker
Best forMultilingual reranking
Price~$2/1,000 searches
DeploymentAPI
ModalitiesText-only (multilingual)
Parameters4,096 tokens
ProsMultilingual
ConsShorter context than v4
Modelrerank-v3.5
ScaleLarge
Benchmarking- BenchmarkingN/A
- voyage-3-largeVoyage AIEmbedding
Best forHigh-accuracy RAG, semantic search, multilingual retrieval, enterprise retrieval quality
PricePremium tier (check latest Voyage pricing page)
DeploymentAPI
ModalitiesText-only
ParametersText embeddings, long-context input, Matryoshka-style dimension shortening support
ProsVery strong retrieval quality, good multilingual performance, flexible embedding size
ConsAPI-only (no local), higher cost than lite variants
Modelvoyage-3-large
ScaleLarge
Benchmarking- BenchmarkingN/A
- voyage-3Voyage AIEmbedding
Best forGeneral-purpose RAG, semantic search, recommendations, classification
PriceMid tier (check latest Voyage pricing page)
DeploymentAPI
ModalitiesText-only
ParametersText embeddings, long-context input, balanced quality/latency
ProsStrong quality with better cost than large
ConsAPI-only, not as accurate as large on hard retrieval sets
Modelvoyage-3
ScaleLarge
Benchmarking- NDCG@1076.72%
- voyage-3-liteVoyage AIEmbedding
Best forHigh-volume embedding pipelines, cost-sensitive search/RAG, near-duplicate detection
PriceLow-cost tier (check latest Voyage pricing page)
DeploymentAPI
ModalitiesText-only
ParametersLightweight text embeddings, lower latency
ProsCheap, fast, scalable
ConsLower retrieval quality than voyage-3 / voyage-3-large
Modelvoyage-3-lite
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- voyage-code-3Voyage AIEmbedding
Best forCode search, code RAG, repository retrieval, code similarity
PriceSpecialized tier (check latest Voyage pricing page)
DeploymentAPI
ModalitiesText + code
ParametersCode-focused embeddings for natural language + code retrieval
ProsBetter code retrieval than general text embeddings
ConsAPI-only, less ideal for purely non-code corpora
Modelvoyage-code-3
ScaleLarge
Benchmarking- BenchmarkingN/A
- rerank-2Voyage AIReranker
Best forRe-ranking top-k retrieved documents for higher precision in RAG
PricePer-request/token based (check latest Voyage pricing page)
DeploymentAPI
ModalitiesText-only
ParametersCross-encoder reranker (query-document relevance scoring)
ProsNoticeable precision boost after initial vector retrieval
ConsExtra latency/cost step after retrieval
Modelrerank-2
ScaleLarge
Benchmarking- BenchmarkingN/A
- rerank-2-liteVoyage AIReranker
Best forCost-sensitive re-ranking at larger scale
PriceLower than rerank-2 (check latest Voyage pricing page)
DeploymentAPI
ModalitiesText-only
ParametersLightweight reranker optimized for speed/cost
ProsFaster and cheaper than rerank-2
ConsSlightly lower precision than full rerank-2
Modelrerank-2-lite
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- Qwen2.5-72B-Instruct (Alibaba / Qwen)Alibaba QwenGenerative
Best forHigh-quality chat, analysis, coding, complex instruction following
PriceOpen weights (inference cost depends on your infra) or provider API pricing
DeploymentLocal or API (provider-dependent)
ModalitiesText-only
Parameters72B open-weight instruct model, long-context variants available
ProsStrong general quality, good multilingual support, open weights for self-hosting
ConsHeavy compute for local inference, not as cheap as smaller variants
ModelQwen/Qwen2.5-72B-Instruct
ScaleLarge
Benchmarking- BenchmarkingN/A
- Qwen2.5-32B-Instruct (Alibaba / Qwen)Alibaba QwenGenerative
Best forStrong quality with lower cost/latency than 72B
PriceOpen weights / provider-dependent API pricing
DeploymentLocal or API
ModalitiesText-only
Parameters32B instruct model, long-context variants available
ProsGood quality/performance balance
ConsStill requires substantial resources for local serving
ModelQwen/Qwen2.5-32B-Instruct
ScaleLarge
Benchmarking- BenchmarkingN/A
- Qwen2.5-14B-Instruct (Alibaba / Qwen)Alibaba QwenGenerative
Best forMid-size production assistants, cost-aware coding/chat
PriceOpen weights / provider-dependent API pricing
DeploymentLocal or API
ModalitiesText-only
Parameters14B instruct model
ProsMuch easier to serve than 32B/72B, solid instruction following
ConsLower reasoning depth than larger Qwen models
ModelQwen/Qwen2.5-14B-Instruct
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- Qwen2.5-7B-Instruct (Alibaba / Qwen)Alibaba QwenGenerative
Best forLightweight assistants, edge/server cost-sensitive workloads
PriceOpen weights / provider-dependent API pricing
DeploymentLocal or API
ModalitiesText-only
Parameters7B instruct model
ProsFast, cheaper inference, widely deployable
ConsLower quality on harder reasoning/coding tasks
ModelQwen/Qwen2.5-7B-Instruct
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- Qwen2.5-Coder-32B-Instruct (Alibaba / Qwen)Alibaba QwenGenerative (code-focused)
Best forCode generation, refactoring, repo Q&A, code reasoning
PriceOpen weights / provider-dependent API pricing
DeploymentLocal or API
ModalitiesText-only
Parameters32B code-specialized instruct model
ProsStrong coding quality vs general-only models
ConsLarge model serving cost; for simple tasks smaller coder variants may be enough
ModelQwen/Qwen2.5-Coder-32B-Instruct
ScaleLarge
Benchmarking- BenchmarkingN/A
- Qwen2.5-VL-72B-Instruct (Alibaba / Qwen)Alibaba QwenGenerative
Best forVision + text understanding, document/image question answering
PriceOpen weights / provider-dependent API pricing
DeploymentLocal or API
ModalitiesText + image
ParametersMultimodal (text + image) model family
ProsStrong multimodal capability in Qwen ecosystem
ConsHigher compute and serving complexity than text-only models
ModelQwen/Qwen2.5-VL-72B-Instruct
ScaleLarge
Benchmarking- BenchmarkingN/A
- DeepSeek-V3DeepSeekGenerative
Best forGeneral chat, coding, analysis, high-quality assistant tasks
PriceProvider-dependent API pricing
DeploymentAPI (and provider-hosted endpoints)
ModalitiesText-only
ParametersLarge MoE-style foundation model line, long-context capable variants via providers
ProsStrong quality-to-cost, good coding and multilingual performance
ConsAPI/provider availability can vary by region; behavior depends on hosting/provider tuning
ModelDeepSeek-V3
ScaleLarge
Benchmarking- BenchmarkingN/A
- DeepSeek-R1DeepSeekReasoning / thinker
Best forReasoning-heavy tasks, math, logic, multi-step planning/problem solving
PriceProvider-dependent API pricing
DeploymentAPI
ModalitiesText-only
ParametersReasoning-oriented model family with deliberate chain-style behavior
ProsStrong reasoning performance, useful for hard step-by-step tasks
ConsHigher latency/token usage than non-reasoning models on simple prompts
ModelDeepSeek-R1
ScaleLarge
Benchmarking- BenchmarkingN/A
- DeepSeek-R1-Distill-Llama-70BDeepSeekReasoning / thinker
Best forCost-aware reasoning with strong quality, self-hosted reasoning workloads
PriceOpen weights (infra cost) or provider-dependent API pricing
DeploymentLocal or API
ModalitiesText-only
ParametersDistilled reasoning model based on Llama 70B backbone
ProsStrong reasoning at lower cost/complexity than full frontier reasoning models
ConsLower ceiling than full DeepSeek-R1 on hardest tasks
Modeldeepseek-ai/DeepSeek-R1-Distill-Llama-70B
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- DeepSeek-R1-Distill-Qwen-32BDeepSeekReasoning / thinker
Best forMid-size reasoning deployments, balanced quality/latency
PriceOpen weights (infra cost) or provider-dependent API pricing
DeploymentLocal or API
ModalitiesText-only
ParametersDistilled reasoning model on Qwen 32B backbone
ProsGood reasoning/cost balance, easier to serve than larger models
ConsLess capable than 70B/full R1 on difficult benchmarks
Modeldeepseek-ai/DeepSeek-R1-Distill-Qwen-32B
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- DeepSeek-Coder-V2-InstructDeepSeekGenerative (code-focused)
Best forCode generation, refactoring, debugging, repo-level coding assistance
PriceOpen weights (infra cost) or provider-dependent API pricing
DeploymentLocal or API
ModalitiesText-only
ParametersCode-specialized model line (various sizes/checkpoints)
ProsStrong coding performance, practical for dev workflows
ConsGeneral non-code reasoning/chat can be weaker than top general models
Modeldeepseek-ai/DeepSeek-Coder-V2-Instruct
ScaleLarge
Benchmarking- BenchmarkingN/A
- kimi-k2.5Moonshot AI (Kimi)Generative
Best forMultimodal (image + video + text), vision-language, agent-style workflows; use thinking mode for harder reasoning
PriceToken-based; see Moonshot pricing docs
DeploymentAPI
ModalitiesText + image + video
ParametersMultimodal (image + video + text); video via upload / ms:// file refs; long context (model card: ~256K-class weights — confirm API limits for your account)
ProsFlagship multimodal line; thinking mode where supported
ConsThinking vs instant modes and defaults differ from older v1 APIs; confirm latest docs
Modelkimi-k2.5
ScaleLarge
Benchmarking- MMMU-Pro78.5%
- MathVision84.2%
- moonshot-v1-128k-vision-previewMoonshot AI (Kimi)Generative
Best forHeavy multimodal context (long system + user + images) in one shot
PriceToken-based; see Moonshot pricing docs
DeploymentAPI
ModalitiesText + image
ParametersVision model id for ~128K context tier (preview)
ProsLargest v1 vision context tier in the name
ConsPreview; most expensive/heaviest when you use full context
Modelmoonshot-v1-128k-vision-preview
ScaleLarge
Benchmarking- BenchmarkingN/A
- moonshot-v1-32k-vision-previewMoonshot AI (Kimi)Generative
Best forLonger multimodal chats / more image+text context in one request
PriceToken-based (vision token accounting); see Moonshot pricing docs
DeploymentAPI
ModalitiesText + image
ParametersVision model id for ~32K context tier (preview); same image rules as other Moonshot vision models
ProsMore room for instructions + image context than 8K tier
ConsPreview; higher token use and cost than 8K when you fill context
Modelmoonshot-v1-32k-vision-preview
ScaleLarge
Benchmarking- BenchmarkingN/A
- moonshot-v1-8k-vision-previewMoonshot AI (Kimi)Generative
Best forImage + text in one request; short prompts and small multimodal turns
PriceToken-based chat pricing (vision uses dynamic image/video tokens); see Moonshot pricing docs
DeploymentAPI
ModalitiesText + image
ParametersVision model id for ~8K context tier (preview); images: png, jpeg, webp, gif; see Moonshot vision guide for request format (base64 / file id)
ProsLower context cost vs larger tiers when input fits in 8K
ConsPreview name may change; long images + long text can hit limits faster
Modelmoonshot-v1-8k-vision-preview
ScaleSmall / distilled
Benchmarking- BenchmarkingN/A
- MiniMax M2.7 (MiniMax)MiniMax AIReasoning / thinker
Best forAgents, coding, software engineering, office workflows, long-context reasoning; flagship text line
PriceSee https://platform.minimax.io pricing (token plans / pay-as-you-go)
DeploymentAPI (OpenAI- or Anthropic-compatible SDKs per docs)
ModalitiesText-only
ParametersMoE-class stack (~230B total / ~100B active per public materials); very long context (~204.8K-class); tools / Anthropic-compatible API path
ProsStrong real-world engineering and agentic positioning; highspeed variant for latency-sensitive paths
ConsAPI-only (no open weights); pricing/quotas region- and account-dependent
ModelMiniMax-M2.7
ScaleLarge
Benchmarking- ReportedSee MiniMax M2.7 model page and third-party tables
- MiniMax M2.7-highspeed (MiniMax)MiniMax AIReasoning / thinker
Best forSame tasks as M2.7 when you need lower latency / higher throughput
PriceSee MiniMax pricing (often differs from base M2.7)
DeploymentAPI
ModalitiesText-only
ParametersSame capability tier as M2.7; faster inference (vendor-tuned routing)
ProsSignificantly faster than base M2.7 for similar quality class
ConsAPI-only; may differ slightly in throughput vs base under load
ModelMiniMax-M2.7-highspeed
ScaleLarge
Benchmarking- BenchmarkingN/A
- MiniMax M2.5 (MiniMax)MiniMax AIGenerative
Best forCode generation, refactoring, polyglot coding, strong value tier before M2.7
PriceSee MiniMax pricing
DeploymentAPI
ModalitiesText-only
ParametersLong context (~204.8K-class per docs); code-optimized positioning
ProsPeak value tier in the M2 text line for coding-heavy workloads
ConsSuperseded for absolute frontier by M2.7 on vendor charts; API-only
ModelMiniMax-M2.5
ScaleLarge
Benchmarking- BenchmarkingN/A
- MiniMax M2.5-highspeed (MiniMax)MiniMax AIGenerative
Best forSame as M2.5 with lower latency
PriceSee MiniMax pricing
DeploymentAPI
ModalitiesText-only
ParametersSame performance class as M2.5; faster inference
ProsFast M2.5-class option for high-volume coding
ConsAPI-only
ModelMiniMax-M2.5-highspeed
ScaleLarge
Benchmarking- BenchmarkingN/A
- MiniMax M2.1 (MiniMax)MiniMax AIGenerative
Best forCode, reasoning, refactoring; legacy M2 line still listed for stable integrations
PriceSee MiniMax pricing
DeploymentAPI
ModalitiesText-only
ParametersMoE-style (~230B total, ~10B activated per token per docs); code-focused
ProsMature tier; often cheaper than newest flagship
ConsLegacy relative to M2.5 / M2.7; API-only
ModelMiniMax-M2.1
ScaleLarge
Benchmarking- BenchmarkingN/A
- MiniMax M2 (MiniMax)MiniMax AIReasoning / thinker
Best forLong output and agentic text (function calling, streaming) on older M2 generation
PriceSee MiniMax pricing
DeploymentAPI
ModalitiesText-only
Parameters~200K context; up to ~128K output (incl. chain-style content per docs)
ProsEstablished M2 generation; long outputs
ConsLegacy vs M2.5 / M2.7; API-only
ModelMiniMax-M2
ScaleLarge
Benchmarking- BenchmarkingN/A
- M2-her (MiniMax)MiniMax AIGenerative
Best forRoleplay, multi-character dialogue, long-horizon character interaction
PriceSee MiniMax pricing
DeploymentAPI
ModalitiesText-only
ParametersText chat tuned for character and emotional expression
ProsSpecialized for interactive fiction / persona use cases
ConsNot a general coding frontier model; API-only
ModelM2-her
ScaleLarge
Benchmarking- BenchmarkingN/A
- Claude Opus 4.6AnthropicGenerative
Best forHardest tasks, agents, coding, long multimodal work
Price~$5/1M input, ~$25/1M output (see Anthropic pricing for batch, cache, thinking)
DeploymentClaude API, AWS Bedrock, Google Vertex AI
ModalitiesText + image in, text out
Parameters1M context; 128k max output; extended thinking + adaptive thinking
ProsStrongest Claude tier in the current lineup
ConsHighest latency and cost in the family
Modelclaude-opus-4-6
ScaleLarge
Benchmarking- ReportedSee Anthropic announcements and third-party leaderboards
- Claude Sonnet 4.6AnthropicGenerative
Best forProduction chat, agents, coding, vision; balance of speed and quality
Price~$3/1M input, ~$15/1M output (see Anthropic pricing for batch, cache, thinking)
DeploymentClaude API, AWS Bedrock, Google Vertex AI
ModalitiesText + image in, text out
Parameters1M context; 64k max output; extended thinking + adaptive thinking
ProsFast relative to Opus; strong general capability
ConsLess capable than Opus on the hardest prompts
Modelclaude-sonnet-4-6
ScaleLarge
Benchmarking- ReportedSee Anthropic announcements and third-party leaderboards
- Claude Haiku 4.5AnthropicGenerative
Best forLow-latency chat, high-volume routing, cost-sensitive workloads
Price~$1/1M input, ~$5/1M output (see Anthropic pricing)
DeploymentClaude API, AWS Bedrock, Google Vertex AI
ModalitiesText + image in, text out
Parameters200k context; 64k max output; extended thinking (no adaptive thinking per model table)
ProsFastest and cheapest current Claude tier listed in overview
ConsSmaller context than Opus/Sonnet 1M tier
Modelclaude-haiku-4-5
ScaleSmall / distilled
Benchmarking- ReportedSee Anthropic announcements and third-party leaderboards
- CONTEXT-1ChromaAgentic retrieval
Best forMulti-hop retrieval, agentic search paired with a frontier reasoning model
PriceOpen weights (inference cost on your infra)
DeploymentHugging Face, local
ModalitiesText-only
Parameters~20B params; query decomposition, iterative corpus search, in-loop context editing
ProsBuilt for complex multi-hop retrieval as a sub-agent
ConsSpecialized workflow; not a general chat or coding model
Modelchromadb/context-1
ScaleLarge
Benchmarking- ReportedMulti-hop retrieval (see Chroma research)
- Tiny Recursive Model (TRM)Samsung (SAIL Montréal)Reasoning / thinker
Best forStructured puzzle reasoning (ARC-AGI, Sudoku, mazes), recursive-reasoning research
PriceFree (open source)
DeploymentLocal
ModalitiesStructured grids / puzzles (not free-form conversational text)
Parameters~7M parameters, iterative latent and answer refinement (paper: arXiv:2510.04871)
ProsOrders-of-magnitude smaller than LLMs on comparable reasoning tasks, MIT-licensed repo
ConsNot a general chat model, embedding, or reranker; no official hosted API; GPU stack for train/eval
ModelSamsungSAILMontreal/TinyRecursiveModels
ScaleSmall / distilled
Benchmarking- ARC-AGI-1 (reported)44.6%
- ARC-AGI-2 (reported)7.8%