Key Takeaways
- A new paper details FRAGATA, a system enabling semantic search over two decades of technical support tickets at a supercomputing center.
- It uses hybrid retrieval-augmented generation (RAG) to find relevant past incidents despite typos, language, or wording differences, showing a qualitative improvement over the legacy search.
What Happened
A research team from the Galician Supercomputing Center (CESGA) has published a paper on arXiv detailing the development and deployment of FRAGATA, a semantic search system for their 20-year archive of technical support tickets. The tickets, managed in the Request Tracker (RT) system, represent a vast repository of solved incidents and operational knowledge. The team identified that RT's native keyword search was a significant bottleneck, hindering support staff from efficiently finding relevant past solutions.
FRAGATA was built to solve this by applying modern information retrieval techniques, specifically a hybrid Retrieval-Augmented Generation (RAG) architecture. The system is designed to understand the intent behind a support query, retrieving relevant tickets regardless of the language used, the presence of spelling errors, or the specific phrasing. The architecture is deployed on CESGA's infrastructure, supports incremental updates without downtime, and offloads computationally expensive embedding and indexing tasks to the center's FinisTerrae III supercomputer. Preliminary qualitative assessments indicate the system provides a substantial improvement over the previous search capability.
Technical Details
The core innovation of FRAGATA lies in its application of a hybrid RAG approach to a long-tail, domain-specific corpus of unstructured text. While the paper's full technical details are on arXiv, the abstract indicates a system that likely combines:
- Semantic Embedding & Indexing: Converting 20 years of ticket text (subject lines, descriptions, solutions) into vector embeddings to enable similarity search beyond keywords.
- Hybrid Retrieval: Possibly blending dense vector search (for semantic meaning) with sparse lexical search (for exact term matching) to improve recall and precision.
- RAG Pipeline: Using the retrieved relevant ticket histories as context for a large language model (LLM) to generate concise answers or summaries for support agents.
- Production Architecture: The system is built for operational use, featuring incremental updates to incorporate new tickets and leveraging HPC resources for batch processing of the embedding workload.
This work is a concrete example of moving RAG from a conceptual framework to a deployed, mission-critical application, a trend noted in the broader AI community. It tackles real-world challenges of legacy data silos and noisy, multilingual text.
Retail & Luxury Implications
While FRAGATA is built for high-performance computing support, its underlying architecture is directly applicable to a critical, yet often overlooked, function in retail and luxury: customer service and internal knowledge management.

Every major brand and retailer sits on decades of unstructured data that mirrors the HPC ticket corpus:
- Customer Service Tickets: Historical records from CRM systems like Salesforce Service Cloud or Zendesk, containing queries about product defects, sizing, care instructions, and policy clarifications.
- Internal IT & Operations Logs: Tickets for POS system failures, warehouse management issues, or e-commerce platform bugs.
- Employee Knowledge Bases: Manually curated FAQs, training documents, and process guides that become outdated and difficult to search.
A FRAGATA-like system could transform these archives from passive records into active intelligence. A customer service agent facing a novel complaint about a sustainable material's care could instantly surface all related past cases and resolutions. A retail operations manager troubleshooting a new inventory sync error could find similar incidents from five years prior. The ability to search semantically—by problem intent rather than exact keywords—dramatically reduces mean time to resolution (MTTR) and prevents knowledge loss from staff turnover.
The technical requirements—vector databases, embedding models, and orchestration frameworks—are now commodity components. The primary challenge for retail would be the data engineering effort to unify and clean historical data from disparate legacy systems, a task analogous to what the CESGA team undertook with their RT history.









