Tools
The AI tools we'd actually use.
45 tools across 11 categories. Each with a one-line take on what it's good for. Curated, opinionated, no affiliate links.
LLM APIs(6)
Anthropic Claude API
paidFrontier models (Opus / Sonnet / Haiku) with prompt caching, tool use, and extended thinking. Best for serious production work.
OpenAI API
paidGPT-5 family + o-series reasoning models. Native structured output, broad tool/function-calling ecosystem.
Google Gemini API
paid2.5 family. Cheap Flash tier, long context on Pro, native multimodal. Vertex AI for enterprise.
AWS Bedrock
paidHosted Claude / Llama / Titan / Mistral / Cohere on AWS. Best when the rest of your stack is on AWS.
Together AI
paidHosted open-source models (Llama, Mistral, DeepSeek). Cheaper than building your own GPU stack.
Groq
paidSpecialized inference hardware. Famously fast on open-source models. Great for low-latency demos.
Open models(4)
Llama 4
open-source70B and 405B open-weight Meta models with permissive license. Long context, native tool use.
Mistral / Mixtral
open-sourceEuropean frontier-tier open models. Good multilingual coverage, strong on European languages.
Qwen 3
open-sourceAlibaba's open models. Wide size range, strong multilingual, great math performance.
DeepSeek
open-sourceOpen reasoning models with frontier-tier quality at small sizes. Distill series fits on consumer GPUs.
Inference / serving(7)
vLLM
open-sourceThe default high-throughput LLM serving engine. PagedAttention, continuous batching, multi-LoRA. Start here.
Hugging Face TGI
open-sourceHF's production inference server. Slightly less throughput than vLLM, integrates well with HF tooling.
NVIDIA Triton + TensorRT-LLM
open-sourceLowest p99 latency on NVIDIA hardware. More complex to operate; reach for it after maxing vLLM.
SGLang
open-sourceOptimized for structured outputs and agent traces. Worth a look for tool-use-heavy workloads.
Modal
paidServerless GPU. Define a function, get an endpoint. Best for variable workloads + cold-start speed.
Replicate
paidOne-click deployment of open-source models. Great for prototypes and low-volume production.
Baseten
paidProduction model serving with autoscaling. Good middle ground between roll-your-own and Replicate.
Agent frameworks(4)
LangGraph
open-sourceStateful agent orchestration. The serious successor to LangChain for production agents.
CrewAI
open-sourceMulti-agent orchestration. Good for the genuinely-multi-agent cases (research swarms, etc).
Anthropic Agent SDK
free-tierNative tool-use loop. Often the simplest path — no framework needed.
OpenAI Agents SDK
free-tierOpenAI's opinionated agent stack. Hand-offs, guardrails, tracing built in.
Vector databases(6)
pgvector
open-sourcePostgres extension. Already on Postgres? You probably don't need a dedicated vector DB until 10M+ chunks.
Pinecone
paidManaged vector DB. Easiest to operate at scale. Pricing matters above ~10M vectors.
Weaviate
open-sourceOpen-source vector DB with strong hybrid search. Good if you need self-hosted with hybrid out of the box.
Qdrant
open-sourceFast, Rust-based vector DB. Great hybrid + filtering. Self-hosted or cloud.
LanceDB
open-sourceEmbedded vector DB. Fits in a single file. Good for local-first / desktop apps.
Chroma
open-sourceDeveloper-friendly embedded vector DB. Pythonic API, great for prototypes.
Eval / testing(4)
Promptfoo
open-sourceOpen-source prompt eval. Side-by-side LLM comparison, regression tests in CI.
RAGAS
open-sourceRAG-specific eval. Faithfulness, answer relevance, context recall, with LLM-as-judge.
Braintrust
paidEval + observability platform. Good for teams that want a managed eval workflow.
TruLens
open-sourceTracking and eval for LLM apps. Strong on RAG-specific metrics.
Observability(4)
LangSmith
paidTrace + eval + prompt-management. Pairs naturally with LangChain/LangGraph.
Helicone
free-tierDrop-in observability proxy. Logs, costs, latency without code changes.
PostHog (LLM Observability)
free-tierAlready use PostHog? It now has LLM-specific traces / cost tracking. One stack.
OpenTelemetry GenAI
open-sourceVendor-neutral GenAI tracing standard. Worth instrumenting against if you anticipate switching tools.
Training / fine-tuning(5)
Hugging Face Transformers
open-sourceThe de-facto Python library for loading and fine-tuning models. Foundation of most pipelines.
PEFT
open-sourceLoRA, QLoRA, IA³ — parameter-efficient fine-tuning. Use with Transformers.
TRL
open-sourceSFT, DPO, PPO, KTO — alignment training that actually works. From HF.
Unsloth
open-source2x faster fine-tuning on consumer GPUs. Drop-in optimization for HF Transformers.
Axolotl
open-sourceYAML-driven fine-tuning. Easier than writing scripts, more flexible than no-code.