FRAGATA: A Hybrid RAG System for Semantic Search Over 20 Years of HPC

A new paper details FRAGATA, a system enabling semantic search over two decades of technical support tickets at a supercomputing center. It uses hybrid retrieval-augmented generation (RAG) to find relevant past incidents despite typos, language, or wording differences, showing a qualitative improvement over the legacy search.

AAAla SMITH & AI Research Desk·Apr 16, 2026·4 min read··183 views·AI-Generated·Report error

Source: arxiv.orgvia arxiv_irCorroborated

TL;DR

Researchers built a semantic search system for a supercomputing center's 20-year ticket history, using hybrid RAG to overcome keyword search limitations.

Key Takeaways

A new paper details FRAGATA, a system enabling semantic search over two decades of technical support tickets at a supercomputing center.
It uses hybrid retrieval-augmented generation (RAG) to find relevant past incidents despite typos, language, or wording differences, showing a qualitative improvement over the legacy search.

What Happened

A research team from the Galician Supercomputing Center (CESGA) has published a paper on arXiv detailing the development and deployment of FRAGATA, a semantic search system for their 20-year archive of technical support tickets. The tickets, managed in the Request Tracker (RT) system, represent a vast repository of solved incidents and operational knowledge. The team identified that RT's native keyword search was a significant bottleneck, hindering support staff from efficiently finding relevant past solutions.

FRAGATA was built to solve this by applying modern information retrieval techniques, specifically a hybrid Retrieval-Augmented Generation (RAG) architecture. The system is designed to understand the intent behind a support query, retrieving relevant tickets regardless of the language used, the presence of spelling errors, or the specific phrasing. The architecture is deployed on CESGA's infrastructure, supports incremental updates without downtime, and offloads computationally expensive embedding and indexing tasks to the center's FinisTerrae III supercomputer. Preliminary qualitative assessments indicate the system provides a substantial improvement over the previous search capability.

Technical Details

The core innovation of FRAGATA lies in its application of a hybrid RAG approach to a long-tail, domain-specific corpus of unstructured text. While the paper's full technical details are on arXiv, the abstract indicates a system that likely combines:

Semantic Embedding & Indexing: Converting 20 years of ticket text (subject lines, descriptions, solutions) into vector embeddings to enable similarity search beyond keywords.
Hybrid Retrieval: Possibly blending dense vector search (for semantic meaning) with sparse lexical search (for exact term matching) to improve recall and precision.
RAG Pipeline: Using the retrieved relevant ticket histories as context for a large language model (LLM) to generate concise answers or summaries for support agents.
Production Architecture: The system is built for operational use, featuring incremental updates to incorporate new tickets and leveraging HPC resources for batch processing of the embedding workload.

This work is a concrete example of moving RAG from a conceptual framework to a deployed, mission-critical application, a trend noted in the broader AI community. It tackles real-world challenges of legacy data silos and noisy, multilingual text.

Retail & Luxury Implications

While FRAGATA is built for high-performance computing support, its underlying architecture is directly applicable to a critical, yet often overlooked, function in retail and luxury: customer service and internal knowledge management.

Figure 1: Ticket processing pipeline: from SQL extraction of the RT history to the generation of embeddings for semantic

Every major brand and retailer sits on decades of unstructured data that mirrors the HPC ticket corpus:

Customer Service Tickets: Historical records from CRM systems like Salesforce Service Cloud or Zendesk, containing queries about product defects, sizing, care instructions, and policy clarifications.
Internal IT & Operations Logs: Tickets for POS system failures, warehouse management issues, or e-commerce platform bugs.
Employee Knowledge Bases: Manually curated FAQs, training documents, and process guides that become outdated and difficult to search.

A FRAGATA-like system could transform these archives from passive records into active intelligence. A customer service agent facing a novel complaint about a sustainable material's care could instantly surface all related past cases and resolutions. A retail operations manager troubleshooting a new inventory sync error could find similar incidents from five years prior. The ability to search semantically—by problem intent rather than exact keywords—dramatically reduces mean time to resolution (MTTR) and prevents knowledge loss from staff turnover.

The technical requirements—vector databases, embedding models, and orchestration frameworks—are now commodity components. The primary challenge for retail would be the data engineering effort to unify and clean historical data from disparate legacy systems, a task analogous to what the CESGA team undertook with their RT history.

Source: gentic.news · Apr 16, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

For AI practitioners in retail, FRAGATA is a compelling case study in **applied RAG for operational efficiency**, not just customer-facing chatbots. It underscores that some of the highest-ROI AI projects may not be flashy consumer applications, but internal tools that make expert knowledge frictionlessly accessible. This aligns with a broader industry shift towards using AI to augment employee capabilities, a theme discussed in our recent coverage of Ethan Mollick's perspectives on AI as a tool for continual learning. The paper's focus on a **hybrid retrieval** approach is particularly relevant. Luxury and retail queries often mix specific product names (SKUs, collection names) with subjective descriptions ("drape," "sheen," "fit"). A hybrid system that can match exact product references while also understanding the semantic context of a customer's issue would be far more robust than a purely semantic search. However, the maturity gap between a research deployment in a controlled HPC environment and a global retail operation is significant. Key considerations for production include: data privacy (especially for tickets containing PII), multilingual support at a global scale, and integrating the semantic search layer seamlessly into existing agent desktops (e.g., Salesforce). The recent publication of a framework for moving RAG systems from proof-of-concept to production, noted in our Knowledge Graph on April 6th, provides a essential checklist for teams considering this path. FRAGATA demonstrates the value proposition; the production frameworks show how to build it reliably.

#operations #customer experience #ai research #enterprise ai #rag

Compare side-by-side

arXiv vs Galician Supercomputing Center

→

Mentioned in this article

FRAGATA Galician Supercomputing Center Retrieval-Augmented Generation arXiv

Enjoyed this article?