Módulos

14 labs independientes. Cada uno enseña un concepto de evaluación de LLMs con código ejecutable y tests que pasan sin API key.

#	Módulo	Tests	Concepto clave
01	primer-eval	8	Primer `LLMTestCase` · AnswerRelevancy · Faithfulness
02	ragas-basics	10	Pipeline RAGAS · faithfulness · context_precision · recall
03	llm-as-judge	11	G-Eval · DAG Metric · position bias · verbosity bias
04	multi-turn	10	ConversationalTestCase · KnowledgeRetention · 8 turnos
05	prompt-regression	11	PromptRegistry · RegressionChecker · z-test
06	hallucination-lab	9	Extracción de claims · groundedness · negaciones
07	redteam-garak	10	42 attack prompts · DAN · many-shot · token manipulation
08	redteam-deepteam	8	OWASP Top 10 LLM 2025 · prompt injection · agencia
09	guardrails	11	PII detection · output validation · pipeline I/O
10	agent-testing	9	Tool selection · trayectorias · AST-safe eval
11	playwright-streaming	8	SSE streaming · E2E chatbot UI · FastAPI mock
12	observability	8	OTel spans · `@trace` · latencia · error tracking
13	drift-monitoring	13	PSI · AlertHistory · tendencias · alert rules
14	embedding-eval	15	Similitud coseno · centroid shift · regresión semántica

Módulos ​