Back to Customer Stories

Connecting siloed data to unlock new value for a major construction TechCo

Hoppa auto classified free-text invoice items to standardised product codes, unlocking new revenue streams for the customer by side-stepping 250,000 hours of manual data cleansing each year.

Connecting siloed data to unlock new value for a major construction TechCo
Sam Rees
Sam Rees
1 Jun 2025 · 4 min read

The explosion of digital tooling has changed how we work forever. Many tools solve specific tasks within workflows but are rarely designed to connect seamlessly into broader systems. Scheduling tools often do not communicate with cost management systems, BIM modelling software rarely links directly to materials catalogues, and surveying apps frequently operate in isolation.

Having an API doesn’t mean applications are ready to be integrated. Often, the underlying data itself cannot be joined or reconciled. Because information is entered inconsistently – with free text, abbreviations, and varied terminology – it remains incompatible with other datasets. Even with technical integration, data stays fragmented, preventing organisations from generating connected insights.

The challenge

A major construction technology company offers an invoicing application widely used across the UK construction sector. Users type invoice line items, such as the products or materials the invoice refers to, into free-text fields. Separately, the company holds extensive structured pricing and carbon factor datasets that could unlock new insights for the entire supply chain. The challenge was to join invoice entries to the structured reference datasets.

With 4 million rows of invoicing data generated in the past year alone, and 600,000 new rows created each month, this was not a task that could be tackled by humans. Without a solution, the opportunity to connect invoicing data to pricing and carbon insights – and to generate value from it – would remain unrealised.

Hoppa's solution

Following a successful pilot, the major construction TechCo commissioned Hoppa to solve this challenge using our proprietary HARDR algorithm (Hierarchically Aware Row-To-Description Retriever). HARDR specialises in taking unstructured inputs, such as invoice line items, and classifying them against a hierarchical taxonomy of the customer's choosing.

For this client, this meant mapping invoice line items to product and material categories defined by Uniclass and ETIM. By normalising the plain language descriptions to a structured taxonomy, the invoicing data could then be connected to the company's pricing and carbon factor datasets. This enabled entirely new value propositions, such as serving real-time price and carbon information to frontline teams directly within the invoicing workflow.

Deployment and impact

Recognising the strategic importance of HARDR's capabilities, the major construction TechCo opted to license the algorithm and deploy Hoppa's technology directly onto their own infrastructure. This wasn't simply a technology transfer – Hoppa embedded our code into the very fabric of how they work, stitching our algorithm directly into their backend systems and operational processes.

The partnership went beyond pure technology deployment. Working with a product-centric organisation, Hoppa shared our deep data expertise and know-how, helping the client understand not just how to implement the solution, but how to maximise its value across their business. This knowledge transfer ensured HARDR could operate seamlessly within their digital ecosystem, processing data at scale and delivering structured, actionable outputs in real time.

Proven results

Customer subject matter experts conducted a comprehensive analysis of HARDR's output, assessing classification accuracy across a representative sample. The results demonstrated exceptional performance: 80% of classifications were assessed as perfect matches, 11% as good matches, and only 9% as low confidence matches. Importantly, the small percentage of imperfect matches was predominantly attributed to underlying data quality issues (not all data entered into the invoicing platform was realistically classifiable), and the inherent challenges of taxonomies such as Uniclass, which are not always able to capture the breadth and nuance of construction terminology.

To validate HARDR's effectiveness, a control study was conducted using two human subject matter expert classifiers. The human experts achieved consensus on only 39% of invoice entries due to limits and gaps in their own knowledge and interpretation differences. This stark contrast highlighted Hoppa's supremacy for this type of classification activity, demonstrating that our algorithm not only processes data at unprecedented scale and pace, but also delivers more consistent and accurate results than human experts.

Beyond classification

In addition to mapping descriptions to structured categories, Hoppa's solution also extracted other key information, such as manufacturer names, enhancing dataset richness and utility. Our adaptable algorithms can be applied across a wide range of use cases, joining systems that were never explicitly designed to work together. This breaks down data siloes and unlocks new opportunities to improve efficiency, insight, and value across an organisation's digital ecosystem.

Looking ahead

If you are facing similar challenges with unstructured data trapped within operational tools – whether invoices, asset records, site diaries, or material requests – Hoppa can help you turn fragmented inputs into structured, connected knowledge. Our team specialises in bridging data gaps to drive smarter decisions and better outcomes, combining our licensed technology with deep data expertise to empower your teams to achieve more with the data you already hold.

Feeling inspired?

See Hoppa in action and learn how it can make the difference to your workflows.

Speak to the team

Related Case Studies

Discover more customer success stories.