β Evaluation Criteria¶
π Scoring Rubric (1β5 Scale)¶
Each project output in this lab journal is evaluated on a 1 to 5 scale across key quality dimensions. These dimensions may vary slightly by project type (e.g., writing, code, design, AI output), but the core rubric remains consistent:
| Metric | Score 5 (Excellent) | Score 3 (Acceptable) | Score 1 (Poor) |
|---|---|---|---|
| Clarity | Crystal clear, unambiguous, and immediately understandable. | Understandable but contains minor ambiguity or complex phrasing. | Confusing, incomplete, or requires significant re-reading or interpretation. |
| Accuracy | Fully correct, aligns perfectly with all constraints and expectations. | Partially correct; contains small factual errors or misses minor constraints. | Incorrect, factually wrong, or off-target from the intended goal. |
| Tone | Perfectly matched to the intended audience, context, and purpose. | Generally appropriate but may be uneven, inconsistent, or slightly generic. | Inappropriate, mismatched, or unprofessional. |
| Creativity | Highly original, compelling, and demonstrates thoughtful or novel execution. | Some novelty or unique phrasing, but relies mostly on standard patterns. | Generic, stale, or a predictable regurgitation of common ideas. |
| Structure | Impeccably organized with logical flow and helpful formatting (e.g., headings, lists). | Has a general structure but suffers from minor formatting or flow issues. | Disorganized, hard to follow, or presented as a wall of text or code. |
βΉοΈ Additional dimensions (e.g., Efficiency, Robustness, Modularity) may be added for technical or domain-specific projects.
β Compliance Checks¶
In addition to the 1β5 scoring, each experiment includes a Compliance section to track adherence to project-specific constraints, such as:
- Word count or token limits
- Required or forbidden content (e.g., βDo not add extra detailsβ)
- Formatting rules (e.g., Markdown only, no images)
- Style or persona constraints
- Technical specifications (e.g., compile-ready code, test coverage)
Outputs that violate these constraints may receive deductions or be marked as non-compliant, even if their qualitative scores are high.
π§ͺ Evaluation Notes¶
- All scores are assigned through manual review based on the rubric above.
- Evaluations are subjective but criteria-driven, ensuring consistency across experiments.
- This framework supports a wide range of creative, technical, and analytical projects.