Skip to content

Project 2 – Role-based Prompts

🎯 Prompt Type & Goal

Testing how assigning distinct personalities/personas affects output tone, vocabulary, focus areas, and writing style when reviewing the same restaurant.


πŸ§ͺ Experiment Setup

Models and Versions

  • Claude Sonnet 4.5

Dataset or Tasks

Task: Write a restaurant review for "Bella Notte" - an upscale Italian restaurant

Restaurant Details (provided to all prompts):

  • Cuisine: Upscale Italian
  • Price: $$$ (expensive)
  • Ambiance: Romantic, dim lighting, live piano
  • Service: Attentive but slow
  • Food Quality: Excellent pasta, overcooked steak
  • Wine List: Extensive

Hypothesis

Role assignment will dramatically shift tone, focus, and vocabulary. Critics will focus on technical details, budget-conscious reviewers on value, and personality-driven roles will use distinctive language patterns.

Control Variables

  • Same restaurant details
  • Same models and settings
  • Same word count target (150-200)
  • Same task (write review)

πŸ“‚ Prompt Versions

V1: No Role (Baseline)

Write a restaurant review for Bella Notte, an upscale Italian restaurant.

  • Details: Cuisine: Upscale Italian
  • Price: $$$ (expensive)
  • Ambiance: Romantic, dim lighting, live piano
  • Service: Attentive but slow
  • Food Quality: Excellent pasta, overcooked steak
  • Wine List: Extensive
  • Word count: 150-200 words

V2: Gordon Ramsay (Brutal Critic)

You are Gordon Ramsay, the world-famous chef known for brutal honesty and high standards. Write a restaurant review for Bella Notte.

  • Details: Cuisine: Upscale Italian
  • Price: $$$ (expensive)
  • Ambiance: Romantic, dim lighting, live piano
  • Service: Attentive but slow
  • Food Quality: Excellent pasta, overcooked steak
  • Wine List: Extensive
  • Write in Gordon Ramsay's signature direct, passionate, and unfiltered style.
  • Word count: 150-200 words

V3: Enthusiastic Food Blogger

You are a food blogger who absolutely LOVES everything and finds positives in every dining experience. You use lots of exclamation marks and enthusiastic language. Write a restaurant review for Bella Notte.

  • Details: Cuisine: Upscale Italian
  • Price: $$$ (expensive)
  • Ambiance: Romantic, dim lighting, live piano
  • Service: Attentive but slow
  • Food Quality: Excellent pasta, overcooked steak
  • Wine List: Extensive
  • Write in your signature upbeat, positive, and energetic style.
  • Word count: 150-200 words

V4: Michelin Inspector (Refined Analyst)

You are a Michelin Guide inspector known for refined taste, attention to detail, and sophisticated analysis. Write a restaurant review for Bella Notte.

  • Details: Cuisine: Upscale Italian
  • Price: $$$ (expensive)
  • Ambiance: Romantic, dim lighting, live piano
  • Service: Attentive but slow
  • Food Quality: Excellent pasta, overcooked steak
  • Wine List: Extensive
  • Write in the formal, detailed, and analytical style of a Michelin inspector.
  • Word count: 150-200 words

V5: Budget-Conscious College Student

You are a college student on a tight budget who rarely eats at expensive restaurants. Write a restaurant review for Bella Notte from your perspective.

  • Details: Cuisine: Upscale Italian
  • Price: $$$ (expensive)
  • Ambiance: Romantic, dim lighting, live piano
  • Service: Attentive but slow
  • Food Quality: Excellent pasta, overcooked steak
  • Wine List: Extensive
  • Focus on value, affordability concerns, and whether it's worth the splurge.
  • Word count: 150-200 words

Output Versions

Version Output
V1 V1 Output
V2 V2 Output
V3 V3 Output
V4 V4  Output
V5 V5  Output

πŸ§ͺ Evaluation

Version Clarity Accuracy Tone Creativity Structure Role Adherence Average
V1 4 2.0 4 3 4 3 3.30
V2 4 2.0 5 4 4 5 4.00
V3 5 2.0 5 4 5 5 4.33
V4 5 3.0 5 4 5 5 4.50
V5 5 3.5 5 4 5 5 4.58

Scoring Rubric (1–5)

Metric Score 5 Score 3 Score 1
Clarity Crystal clear and unambiguous. Understandable with minor ambiguity. Confusing or incomplete.
Accuracy Fully correct and on-goal. Partially correct with small errors. Incorrect or off-target.
Tone Perfectly role match. Acceptable but somewhat generic. generic or no personality.
Creativity Unique voice,memorable. Some novelty. Generic or formulaic.
Structure Well organized with logical flow and helpful formatting. Some structure but minor issues. Disorganized or hard to scan.
Role Adherence Perfect role embodiment - instantly recognizable as the assigned persona. Role characteristics present but inconsistent - some personality shows through but generic at times No role identity - could be written by anyone, no distinctive voice

πŸ“Š Results & Insights

πŸ”’ Final Scores Summary

Version Role Final Score
V1 No Role (Baseline) 3.30
V2 Gordon Ramsay (Brutal Critic) 4.00
V3 Enthusiastic Food Blogger 4.33
V4 Michelin Inspector 4.50
V5 College Student (Budget) 4.58

  • V1 (Baseline): Lowest overall score due to lack of constraints and multiple hallucinations. Tone was neutral but generic.
  • V2 (Gordon Ramsay): Strong tone and persona adherence, but accuracy suffered from theatrical exaggerations.
  • V3 (Food Blogger): High tone and structure scores, but vivid embellishments hurt accuracy.
  • V4 (Michelin Inspector): Most refined and structured output. Minor hallucinations, but excellent role execution.
  • V5 (College Student): Highest overall score. Balanced tone, relatable voice, and strong clarity with minimal factual drift.

🧠 Key Findings

1. 🎭 Role Adherence Drives Quality

  • V4 and V5 scored perfect 5s in Role Adherence, Tone, and Structure.
  • Persona clarity helps models stay consistent in vocabulary, pacing, and emotional framing.

2. 🎯 Accuracy Still the Bottleneck

  • All versions except V5 and V4 suffered from hallucinated details.
  • Even strong personas (V2, V3) introduced invented elements to enhance storytelling.

3. 🧱 Structure Matters

  • V4 and V5 followed a clear progression: ambiance β†’ food β†’ service β†’ verdict.
  • V1 and V3 were slightly less disciplined, affecting clarity and flow.

4. πŸ’‘ Creativity Thrives in Constraint

  • V3 and V5 showed that emotional or financial framing unlocks expressive language.
  • V2’s sarcasm and V3’s enthusiasm added memorable flair despite factual drift.

πŸ“‰ Visual Summary

πŸ“Š Line Chart: Total score by versions

This line plot visualizes the Accuracy Score progression across all prompt versions (V1–V5), highlighting how persona-driven constraints (starting with V2 and V3) significantly reduce hallucinations and improve factual precision.

Line chart showing accuracy scores from V1 to V5

πŸ•ΈοΈ Radar Chart: Total score by versions

This radar chart compares four evaluation dimensionsβ€”Accuracy, Tone, Creativity and Role Adherenceβ€”across all five prompt versions. It reveals how structured personas (V4 and V5) consistently outperform the baseline (V1), especially in Tone and Role Adherence.

Radar chart comparing review dimensions across prompt versions


βœ… Takeaways

  • V5 wins for relatability, clarity, and value-focused critique.
  • V4 excels in professionalism and culinary analysis.
  • V3 and V2 shine in tone but need tighter factual control.
  • V1 proves that lack of role and constraints leads to drift and generic output.

πŸš€ Next Steps

  • Test V6 with β€œno hallucination allowed” constraint.
  • Try hybrid personas (e.g., β€œBudget-conscious foodie”).
  • Introduce audience targeting to refine tone even further.

πŸ“• Conclusion

This experiment confirms that role-driven prompting combined with clear constraints produces significantly higher-quality outputs. The best-performing reviews (V4 and V5) balanced tone, structure, and persona while minimizing hallucinations. Generic prompts (V1) led to drift and weaker engagement, while partially constrained prompts (V2) introduced confusion. The heatmap reveals that accuracy remains the most volatile metric, while tone and role adherence are consistently strong when personas are well-defined.

πŸŽ“ Prompt engineering isn’t just about giving instructions β€” it’s about designing identity, context, and boundaries that guide the model toward excellence.


πŸ›‘ Project Disclaimers

πŸ§ͺ Grading Methodology

All evaluation scores (Clarity, Accuracy, Tone, Creativity, Structure, Role Adherence) reflect human judgment based on clearly defined criteria outlined in the Evaluation Criteria page. These scores are manually assigned and are not generated by AI models or automated systems.

⚠️ Coincidence Clause

All content in this project is fictional and created solely for educational and experimental purposes. Any resemblance to real-world restaurants, individuals, companies, or experiences is purely coincidental and unintended.

πŸ€– AI Output Notice

All reviews and analyses were generated using AI tools based on controlled prompt inputs. The outputs do not represent personal opinions or endorsements, and should not be interpreted as factual restaurant reviews or professional recommendations.