Back to Customer Stories

Connecting siloed data to unlock new value for a major construction TechCo

Hoppa auto classified free-text invoice items to standardised product codes, unlocking new revenue streams for the customer by side-stepping 250,000 hours of manual data cleansing each year.

Connecting siloed data to unlock new value for a major construction TechCo
Sam Rees
Sam Rees
1 Jun 2025 · 5 min read

Major construction technology companies sit on vast reservoirs of data. But data entered inconsistently - with free text, abbreviations, and varied terminology - remains incompatible with other datasets. Even with technical integration in place, information stays fragmented, preventing organisations from generating the connected insights their products could deliver.

A major construction TechCo faced exactly this problem. Their invoicing application is widely used across the UK construction sector, with users typing invoice line items into free-text fields. Separately, the company held extensive structured pricing and carbon factor datasets that could unlock entirely new value for the supply chain. The challenge: how to bridge the two.

The challenge

The TechCo's invoicing platform held a backlog of 4 million rows of unstructured data, with 600,000 new rows added each month. Each row contained free-text descriptions of products and materials - inconsistent, abbreviated, and unstandardised.

This created two interconnected problems. Without a way to map historic invoice entries to standardised product codes, years of accumulated pricing data could not be transformed into a usable analytics database - the commercial opportunity was locked away in plain sight. And as new invoices were created, there was no mechanism to classify entries on the fly and feed pricing or carbon insights back to users within the workflow itself.

Solving both required the same underlying capability: a classification engine that could handle millions of existing rows at scale, and operate fast enough to enrich new entries in real time. Three barriers needed to be overcome:

  • Inconsistent inputs. Free-text entry meant the same product could be described dozens of different ways, making automated classification impossible through conventional means.
  • Incompatible datasets. Without a common taxonomy, invoicing data could not be joined to structured pricing or carbon factor datasets, leaving significant commercial opportunity unrealised.
  • Overwhelming scale. With millions of rows of existing data and hundreds of thousands added each month, manual classification was never a viable option.

Hoppa's solution

Following an evaluation process that compared performance of several providers, The TechCo commissioned Hoppa to solve this challenge using our proprietary HARDR algorithm (Hierarchically Aware Row-To-Description Retriever). HARDR specialises in taking unstructured inputs, such as invoice line items, and classifying them against a hierarchical taxonomy of the customer's choosing.

For this client, this meant mapping invoice line items to product and material categories defined by industry-standard unified classification systems: Uniclass and ETIM. By normalising the plain language descriptions to a structured taxonomy, the invoicing data could then be connected to the company's pricing and carbon factor datasets. This enabled entirely new value propositions, such as serving real-time price and carbon information to frontline teams directly within the invoicing workflow.

In addition to mapping descriptions to structured categories, Hoppa's solution also extracted other key information, such as manufacturer names, enhancing dataset richness and utility.

Deployment and impact

Recognising the strategic importance of HARDR's capabilities, the major construction TechCo opted to license the algorithm and deploy Hoppa's technology directly onto their own infrastructure. This wasn't simply a technology transfer – Hoppa embedded into the very fabric of how our customer needed to work, stitching our capability directly into their backend systems and operational processes.

The partnership went beyond pure technology deployment. Working with a product-centric organisation, Hoppa shared our deep data expertise and know-how, helping the client understand not just how to implement the solution, but how to maximise its value across their business. This knowledge transfer ensured HARDR could operate seamlessly within their digital ecosystem, processing data at scale and delivering structured, accurate outputs in real time.

Hoppa by numbers

Customer subject matter experts assessed classification accuracy across a representative sample of HARDR's output. In parallel, a blind control study was run with two domain experts independently classifying the same sample.

  • 91% of classifications usable without further intervention. 80% were assessed as perfect matches and 11% as good matches. The remaining 9% of low-confidence matches were predominantly attributable to underlying data quality issues and the inherent limitations of taxonomies like Uniclass, which do not always capture the full breadth of construction terminology.
  • 2× greater coverage than domain experts working independently. The two human experts reached consensus on only 39% of invoice entries, due to knowledge gaps and differences in interpretation. Hoppa not only processed data at a scale no human team could match, but delivered greater consistency and accuracy than domain experts working manually.
  • 250,000 hours of manual effort avoided. Scaled across the full 4 million row data backlog, Hoppa's classification engine eliminated the equivalent of 250,000 hours of human effort.

From fragmented records to new revenue streams

Hoppa's classification engine connected datasets that were never designed to work together, transforming the TechCo's invoicing platform from a data entry tool into an intelligence layer. With invoice items now mapped to standardised product codes, the company could serve real-time pricing and carbon information to frontline teams directly within the invoicing workflow - unlocking entirely new commercial propositions.

Recognising the strategic importance of this capability, the TechCo licensed Hoppa's HARDR algorithm and deployed it directly onto their own infrastructure - embedded into their backend systems and operational processes, not bolted on as a third-party tool.

Tom Goldsmith profile
Tom Goldsmith
Co-Founder & CTO at Hoppa
This engagement was a fantastic opportunity to demonstrate Hoppa's ability to make sense of complex, high-volume construction finance data. Key to the success of this engagement was how seamlessly Hoppa could be embedded directly into an existing technology platform - delivering structured intelligence without disrupting the workflows our customer already relied on.

Looking ahead

For construction companies sitting on large volumes of unstructured operational data, Hoppa can bridge the gap between fragmented inputs and connected intelligence. Whether the challenge is invoice classification, asset record enrichment, or materials cataloguing, our adaptable algorithms can be licensed and embedded directly into your existing technology stack - turning data that was previously unusable into a source of competitive advantage.

Feeling inspired?

See Hoppa in action and learn how it can make the difference to your workflows.

Speak to the team

Related Case Studies

Discover more customer success stories.