How Structured JSON Inputs Eliminated Hallucinations in a Fine-Tuned 7B Code Model
Products & LaunchesBreakthroughScore: 90

How Structured JSON Inputs Eliminated Hallucinations in a Fine-Tuned 7B Code Model

A developer fine-tuned a 7B code model on consumer hardware to generate Laravel PHP files. Hallucinations persisted until prompts were replaced with structured JSON specs, which eliminated ambiguous gap-filling errors and reduced debugging time dramatically.

GAla Smith & AI Research Desk·2h ago·5 min read·4 views·AI-Generated
Share:
Source: pub.towardsai.netvia towards_ai, medium_fine_tuningCorroborated

What Happened: A Case Study in Structured Input

A developer documented a months-long experiment in fine-tuning a 7B parameter code generation model (Qwen2.5-Coder-7B-Instruct) to produce Laravel PHP files on an Apple M2 Pro with 16GB RAM. Despite eight rounds of training with 308 examples, the model consistently produced a specific class of error: it would invent framework relationships, methods, or patterns that weren't specified in the natural language prompt. These weren't random syntax errors but systematic "gap-filling"—the model applying its pretraining priors to ambiguous instructions.

The breakthrough came from abandoning natural language prompts altogether. Instead, the developer created a structured JSON format called BuildSpec that explicitly defined every artifact attribute: relationship types, method names, foreign keys, boolean flags for framework traits, and exact field lists. This format left no room for interpretation. When the model was fine-tuned on these structured specs (using just 54 examples versus the previous 308), the hallucination-type errors disappeared entirely.

Technical Details: The BuildSpec Approach

The core innovation was treating the specification as data rather than instruction. A BuildSpec for a Laravel model looks like this:

{
  "artifact": "model",
  "class": "Book",
  "namespace": "App\\Models",
  "table": "books",
  "has_factory": true,
  "soft_deletes": true,
  "fillable": ["title", "isbn", "year", "author_id"],
  "casts": {"year": "integer"},
  "relationships": [
    {
      "type": "BelongsTo",
      "model": "Author",
      "method": "author",
      "foreign_key": "author_id"
    }
  ]
}

Key technical components:

  1. Explicit Enumeration: Every possible decision point (relationship type, foreign key name, trait inclusion) is explicitly specified as data.
  2. Validation Compiler: A 530-line Python compiler validates specs before generation, catching invalid Laravel patterns in <1ms.
  3. Reduced Output Space: By constraining the input format, the model's possible output space is dramatically reduced, making the mapping from spec to code more learnable.

The experiment compared two pipelines:

  • Pipeline A: 308 natural language examples, 300+ training iterations
  • Pipeline B: 54 structured JSON examples, 225 training iterations

Both were evaluated on three Laravel applications (26 PHP files total). While both achieved 100% syntax validity, the nature of remaining bugs differed fundamentally.

Pipeline A bugs (5 total): All were "incorrect domain assumptions"—the model inserting patterns from its pretraining that didn't match the developer's intent. Examples included generating non-existent Laravel methods (->withHttpStatus()) or dropping specified relationships. Debugging required understanding the model's mistaken intent and averaged 15-30 minutes per fix.

Pipeline B bugs (3 total): All were "mechanical issues"—visible typos or omissions (like forgetting to import a base class). These were obvious from reading the file and took under 2 minutes to fix.

The structured approach didn't eliminate all errors, but it eliminated the most expensive class: those requiring semantic debugging of the model's internal representation versus the developer's intent.

Retail & Luxury Implications: From Code to Commerce

While this case study focuses on PHP code generation, the underlying principle—using structured data formats to eliminate ambiguity in LLM tasks—has direct parallels in retail and luxury AI applications.

1. Product Description & Catalog Generation

Luxury brands generating product descriptions face similar hallucination risks. A prompt like "write a description for a silk evening gown" leaves countless decisions to the model: which heritage elements to highlight? Which technical fabrics to mention? What tone (exclusive vs. accessible)?

A structured ProductSpec could enforce brand voice consistency:

{
  "product_type": "evening_gown",
  "material_composition": {"silk": 100},
  "heritage_elements": ["hand-stitched hem", "mother-of-pearl buttons"],
  "target_tone": "exclusive_heritage",
  "required_keywords": ["couture", "atelier", "limited edition"],
  "prohibited_phrases": ["affordable luxury", "mass-produced"]
}

This would prevent the model from inventing fabric blends or production methods that don't exist, while ensuring consistent brand messaging across thousands of SKUs.

2. Personalized Client Communication

When generating personalized emails for VIP clients, ambiguity in client profiles leads to generic or inappropriate recommendations. A structured ClientProfile format could ensure precision:

{
  "purchase_history": [
    {"category": "handbags", "brands": ["Hermès", "Chanel"], "avg_price_point": 15000},
    {"category": "ready_to_wear", "styles": ["evening", "business"]}
  ],
  "communication_preferences": {"formality": "high", "length": "detailed"},
  "known_aversions": ["animal prints", "oversized logos"],
  "upcoming_events": [{"type": "gala", "date": "2024-09-15"}]
}

3. Visual Merchandising & Space Planning

Generating store layout recommendations from natural language ("create a welcoming fragrance section") invites misinterpretation. A structured SpaceSpec could define exact constraints:

{
  "section_type": "fragrance",
  "available_sqft": 240,
  "required_fixtures": ["lighting_track", "glass_display_cases"],
  "brand_hierarchy": {"primary": "Dior", "secondary": ["Chanel", "Guerlain"]},
  "traffic_flow": "circular",
  "adjacency_requirements": ["near_entrance", "away_from_direct_sunlight"]
}

Technical Implementation Considerations

For retail teams considering this approach:

  1. Schema Design is Critical: The JSON schema becomes your domain ontology. Invest time in getting it right—it defines what your model can and cannot "know."

  2. Validation Layer Required: Like the compiler in the case study, you need validation to catch specification errors before generation. This is especially important for compliance (e.g., ensuring prohibited claims aren't made).

  3. Data Transformation Pipeline: Existing product data must be transformed into the structured format. This may require initial manual work or a separate extraction model.

  4. Model Selection: The case study used a 7B model on consumer hardware. For complex retail domains, you might need larger models, but the structured approach should improve performance at any scale.

  5. Hybrid Approach: Natural language could still be used for creative tasks, with structured specs for factual precision. The key is knowing when each is appropriate.

The fundamental insight: Ambiguity in input creates space for the model's pretraining biases to manifest as hallucinations. Structure eliminates that space. For luxury brands where precision, consistency, and brand integrity are non-negotiable, this represents a more controllable path to AI adoption than purely prompt-based approaches.

AI Analysis

This case study provides concrete evidence for what many AI practitioners in retail have suspected: fine-tuning alone cannot eliminate certain classes of LLM errors when the input space is ambiguous. The structured JSON approach essentially implements a **type system for LLM inputs**, constraining both what the model receives and what it can produce. For luxury retail, where brand voice consistency, product accuracy, and client personalization precision are paramount, this approach offers a path to automation without sacrificing control. The most immediate application would be in **product catalog generation and maintenance**—where hallucinated product features or inconsistent descriptions could damage brand credibility. A structured `ProductSpec` format could be derived from existing PIM (Product Information Management) systems, creating a natural bridge between legacy data and AI generation. This aligns with our recent coverage of the **prompt vs. RAG vs. fine-tuning decision framework** (March 30). That article emphasized choosing the right tool for the job. This case study introduces a fourth option: **structured specification-driven generation**, which sits between prompt engineering and fine-tuning in complexity but offers unique precision benefits. The use of **Apple's MLX framework** on consumer hardware is particularly relevant given Apple's increased AI focus. Following their March 27 leak about Private Cloud Compute infrastructure and hiring of former Google executive Lilian Rincon, Apple is clearly positioning itself as a serious AI player. The ability to run precise, structured generation locally on Apple Silicon could enable boutique luxury brands to implement AI without cloud dependencies—important for protecting client data and proprietary product information. However, the approach has limitations. It requires upfront investment in schema design and data transformation. It's best suited for **repeatable, structured tasks** rather than creative ones. And as the author notes, this was a self-evaluation—production deployment would require more rigorous testing. But as a proof-of-concept for reducing hallucination in domain-specific generation, it's compelling evidence that sometimes the solution isn't a bigger model, but a better-defined problem.
Enjoyed this article?
Share:

Related Articles

More in Products & Launches

View all