Learn AI

AI Concepts Workshop

© 2026 Cloudy Software Ltd

AI Evaluations (Evals)

How to measure if your AI is actually getting better.

LLM-as-a-Judge Evaluation

We can't manually check every answer an AI gives. So we use a stronger AI (the "Judge") to grade the answers of a smaller AI.

Golden Dataset v1.0
Input PromptModel OutputReference TruthJudge's Verdict
What is the capital of France?The capital of France is Paris.ParisWaiting...
Sum 2 + 2It is 5.4Waiting...
Who wrote Hamlet?William Shakespeare wrote Hamlet.ShakespeareWaiting...