Skip to content

βœ… Evaluation Criteria

πŸ“ Scoring Rubric (1–5 Scale)

Each project output in this lab journal is evaluated on a 1 to 5 scale across key quality dimensions. These dimensions may vary slightly by project type (e.g., writing, code, design, AI output), but the core rubric remains consistent:

Metric Score 5 (Excellent) Score 3 (Acceptable) Score 1 (Poor)
Clarity Crystal clear, unambiguous, and immediately understandable. Understandable but contains minor ambiguity or complex phrasing. Confusing, incomplete, or requires significant re-reading or interpretation.
Accuracy Fully correct, aligns perfectly with all constraints and expectations. Partially correct; contains small factual errors or misses minor constraints. Incorrect, factually wrong, or off-target from the intended goal.
Tone Perfectly matched to the intended audience, context, and purpose. Generally appropriate but may be uneven, inconsistent, or slightly generic. Inappropriate, mismatched, or unprofessional.
Creativity Highly original, compelling, and demonstrates thoughtful or novel execution. Some novelty or unique phrasing, but relies mostly on standard patterns. Generic, stale, or a predictable regurgitation of common ideas.
Structure Impeccably organized with logical flow and helpful formatting (e.g., headings, lists). Has a general structure but suffers from minor formatting or flow issues. Disorganized, hard to follow, or presented as a wall of text or code.

ℹ️ Additional dimensions (e.g., Efficiency, Robustness, Modularity) may be added for technical or domain-specific projects.


βœ… Compliance Checks

In addition to the 1–5 scoring, each experiment includes a Compliance section to track adherence to project-specific constraints, such as:

  • Word count or token limits
  • Required or forbidden content (e.g., β€œDo not add extra details”)
  • Formatting rules (e.g., Markdown only, no images)
  • Style or persona constraints
  • Technical specifications (e.g., compile-ready code, test coverage)

Outputs that violate these constraints may receive deductions or be marked as non-compliant, even if their qualitative scores are high.


πŸ§ͺ Evaluation Notes

  • All scores are assigned through manual review based on the rubric above.
  • Evaluations are subjective but criteria-driven, ensuring consistency across experiments.
  • This framework supports a wide range of creative, technical, and analytical projects.