AI Factsheet

Last updated: January 2026

Background

This factsheet was prepared using the Responsible Handover Framework, an artificial intelligence (AI) governance approach developed by Sense About Science.

Purpose

AI regulation – including the EU Artificial Intelligence Act – requires AI providers to publish documentation that others (including downstream deployers) can refer to when conducting their own due diligence into the capabilities and limitations of the AI system.

The purpose of the factsheet is to support all stakeholders – from code developers to end users – ask the right questions about applying or building with Hoppa’s tooling. By providing a clearer understanding of the applicability of the technology, this factsheet improves safety, helps avoid misuse and enables better outcomes.

Scope

This factsheet covers the complete Hoppa ecosystem (the system), including: Hoppa Workbench, Hoppa Cloud analysis services and the Hoppa Web management portal.

What is the purpose of the system?

What problem does it address?

The system assists built-environment professionals to organise and/or reconstitute unstructured data (reports, drawings, spreadsheets etc).

Some (non-exhaustive) examples of where this would be required are:

Tender evaluation
Construction handover and regulatory compliance (CDM 2015, Building Safety Act 2022)
Structural condition monitoring and pre-demolition audits

Unstructured data management is particularly challenging for built environment professionals because:

Data is often held in proprietary formats (RVT, XMCD) that require specialist software.
Data is created and then needs to be re-used multiple times over the asset lifecycle (50+ years), often in a new context and by users unaware of the information’s original purpose and dependencies.
Situationally specific language, taxonomies and processes impede common understanding across complex, geographically distributed supply chains.

Whilst the system was developed for use in the built environment, it could also be used in other contexts. Instead of leveraging sector-specific data connectors, convertors and classifiers, the system would default to general-purpose processing components.

Risk Assessment A

What is the system aiming to achieve?

The system is aiming to achieve equivalent (or near to) human levels of comprehension across an input unstructured dataset in a fraction of the time a human would need to open and read each file.

System operation is broadly understood as a three-step process:

Configure the workflow engine using the Hoppa Workbench kit-of-parts
Connect to data at source to automatically run the workflow for each file or line item
Assure the outputs in the system user interface, using AI scoring and edit-mode to layer in additional context

In most cases today, the alternative would be to assign data labelling and query extraction tasks to humans. This is both slower and more expensive. This approach is usually time/cost prohibitive when numbers of documents exceed several hundred (dependent on use case).

In other cases, specialist tools exist for certain use cases, e.g. finance receipt filing. However, their customisability is often limited, and they may not fully support industry or customer-specific taxonomies (e.g. NBS Uniclass). Equally, they cannot be re-purposed for other unstructured data cataloguing use cases a business or project may face.

General-purpose workflow automation tools could be configured for simple use cases. However, they are unlikely to contain the tooling necessary to crack and analyse sector-specific file-types. Furthermore, the interface (if there is one) for quality assuring the analysis outputs is unlikely to be optimised for unstructured data cataloguing – with AI confidence scoring, edit-mode and other features.

The system has been developed to be ‘ethical-by-design’ to satisfy legal or ethical requirements across global geographies and jurisdictions:

NOT a black box: Hoppa’s proprietary, plain-language Workbench Workflow Definition Language (WWDL) ensures developers have precise control over each stage of the workflow and users have a reference to understand what the system is configured to do
Human-in-the-loop: the Hoppa Web interface ensures a human operator always has final control over the analysis outputs. Confidence scores, edit-mode and other tools make human moderation tasks more efficient and accurate.
Data custody: Separation controls across physical, infrastructure, application and logical layers ensure customer data is protected from theft or loss.
Data rights: Customer’s retain exclusive rights to their data; Hoppa will never use customer data to train AI models without express customer permission.

What level of autonomy will the system have?

The system is designed to be used in a gated process, whereby:

A human user selects the dataset to be analysed
The system analyses the data according to the workflow and metadata specification supplied by the user
The user assures results, applying additions and corrections where needed.

The system is therefore only autonomous within the controlled task environment. The system is unable to take actions that would affect the operation of other systems.

Do the FAIR principles apply?

The system’s transparent, user-centred approach to data cataloguing promotes the FAIR principles for data.

However, since the system is designed to be used in a closed, business context, it remains the responsibility of the system adopter to decide how or if data is made discoverable and usable by humans.

Where has the data come from?

For system calibration, data has been obtained from the following publicly available sources:

This data was used to evaluate performance of the system under representative conditions. No data has been used for AI model training (e.g. foundation model fine-tuning).

The data was wholly sourced from UK organisations, and all documents were written in English. The system providers can make no direct claim on the system’s performance in other, non-English speaking settings. However, 3rd party AI technologies included in the solution, for example OpenAI LLMs, have been developed and tested for multi-language support (see How can I use the OpenAI API with text in different languages).

Additionally, as the system is configurable, alternative locally-optimised AI models could be substituted. System performance is therefore likely to be robust (or able to be made robust) for use in new settings.

What assumptions are being made?

What assumptions have been made in the model?

The system is stochastic; running the analysis twice or more with the same inputs will not yield the exact same outputs. The assumption is that stochasticity is acceptable since the system is intended as a guide to assist human operators in organising and reconstituting information.

In system use, the main assumption is that human operators are qualified to interpret and apply system outputs. Technical and/or project and business-specific terminology is likely to be present.

The system will inherit any assumptions made when creating the underlying 3rd party foundation LLMs and other AI (e.g. OCR) models. The system uses widely distributed and tested models, minimising the likelihood of development assumptions impacting system performance. Should such issues become known, the system is designed so that models can easily be swapped for alternatives on a customer or use case basis.

What’s necessary for the system to work properly and produce useful results?

The system requires a metadata specification that defines:

The workflow operations to be performed, under what conditions and in what sequence
The metadata fields that will be populated for each input data item when executing the workflow

Improper or inadequate configuration of either part of the metadata specification could result in sub-optimal performance, including increased risk of hallucination. Care should be taken when specifying metadata fields to ensure conditions such as the following are met:

Picklist options are MECE (Mutually Exclusive, Comprehensively Exhaustive)
Taxonomy filters are tightly scoped (e.g. Uniclass table is Ss Systems)
Business / project-specific acronyms and jargon are properly explained
Escape options (e.g. ‘All Areas’, ‘Other’) are provided
Search queries encourage response context retention (i.e. instructing the system to reference, quote or explain its reasoning)

Each metadata specification should be tested on a sample dataset and care should be taken when applying the specification in a new context. Performance criteria will depend on the use case. It is the user’s responsibility to satisfy themselves on performance acceptability. Hoppa personnel can provide expert guidance to customers applying templated specifications or developing their own.

The hardware and software required to run the system will depend on the system configuration. Most users will interact with Hoppa’s cloud-hosted version of the system, meaning no additional hardware or software is required to use the system.

More advanced users may elect to use all or part of the system in a self-hosted capacity. For example, Hoppa Workbench can be installed as a Python package for use in other applications, and Hoppa’s local MCP server can be integrated with a user’s desktop copilot (e.g. Anthropic Claude). Further information can be provided upon request.

Is there a statement on the unknowns and is it explained?

As a multi-purpose data cataloguing system for the built environment, the system has no prior knowledge of the use case the user has configured it for or the input dataset to be analysed. Hoppa personnel possess extensive knowledge of past system usage and can advise on how the system should be configured and used to guard against sub-optimal performance.

The quality of the input dataset is unknown. While the system employs several safeguards to guard against hallucination, the system cannot claim to always correctly disambiguate between conflicting or missing information. For example, the embedded metadata in an Office Word document specifying the Security Classification as ‘Baseline’ but the title page of the document specifying as ‘Official Sensitive’.

Is there other software or equipment the tool needs to interact with?

The system can operate independently – users upload datasets directly into Hoppa Cloud – or in conjunction with other systems – users delegate system access to read files and their contents from 3rd party document management systems.

If Hoppa Workbench is used as part of a custom package deployment then the user is expected to configure the necessary app registrations and secret keys and tokens for interacting with 3rd party APIs and services (e.g. Microsoft Azure, Autodesk Platform Services).

Is the system good enough for this use?

How do you measure how accurate the system is (performance), and is it being monitored?

Performance measurement will depend on the use case the system is being applied to. The system automatically generates quantitative metrics on workflow run time, data volumes, confidence scoring and more that are monitored to measure and calibrate performance.

What is the system being compared to (benchmarking)?

Performance benchmarking exercises have quantified the system as between 30-500x times faster than a human operator (dependent on use case).

Performance is judged acceptable if the net time to achieve the same outcome is less than the equivalent human operator time. For example, even if the final user QA/QC time is longer, if the analysis time is reduced by a greater amount, then the net task time is improved.

System logs and performance metrics are provided in Hoppa Web, allowing performance to be measured. Tools for searching and filtering results and grouping by AI confidence score are provided to maximise user QA/QC.

User manual edits of results are stored, and users are encouraged to self-report unexpected behaviour or sub-optimal performance so this can be improved by adapting the metadata specification or extending the Hoppa Workbench kit-of-parts.

If the performance of the system drops, who will fix it?

Users understand their business context and use case most clearly and are therefore encouraged to optimise the metadata specification in Hoppa Web to address known performance gaps.

For more involved system fixes, Hoppa personnel will assume responsibility in line with the Service Level Agreement (SLA), raising support requests with 3rd party service providers as appropriate.

If the performance can't be fixed, is there a plan in place for rollback?

Yes. The system can be rolled back to an earlier stable version by mutual agreement.

What is the procedure for releasing new versions of the system?

Hoppa applies continuous integration and continuous deployment (CI/CD) practices to release regular feature upgrades and security patches. Users would be notified of major version updates (and any required actions) in advance via email.

Hoppa personnel apply customer feedback to improve the system. Customers retain exclusive rights to their data; Hoppa will never use customer data to train AI models without express customer permission.

Is it possible to explain which parameters the tool is using and does this matter?

The metadata specification is the blueprint that explains what actions the system will take. The metadata specification is written in plain text, allowing the user to infer why the system may have taken a course of action. Whilst LLM responses within a single workflow step are stochastic, the workflow is deterministic, meaning the user has a high degree of control over the order in which operations are performed.

As standard, all metadata classifications are accompanied with an AI confidence score and qualitative justification. These justifications offer insight into why the system has assigned a certain classification, allowing the user to evaluate performance and act if necessary.

Risk Assessment B

What are the likely negative outcomes for individuals and the tool user?

Users are delegating manual, time-consuming tasks to the system with the intention of freeing capacity for other, higher-value tasks. This is a positive outcome, particularly where it was previously unfeasible for humans to perform this work due to its scale/nature – for example to audit a national built asset portfolio and identify instances of an installed faulty component.

However, care should be taken that this does not negatively impact staff professional development or organisational knowledge. Delegating many tasks to the system could lead to teams losing understanding of how end-to-end processes operate.

This skills reduction could in turn impede user’s ability to safely and effectively QA/QC system outputs and diminish organisational resilience in the event of system outage.

Do you have a plan to mitigate these risks?

The human-in-the-loop design philosophy of the system ensures users can always assure system outputs and control the flow of information into downstream processes.

Hoppa supports customers and their employees with hands-on training when applying AI systems in their role. Upskilling staff in this way ensures they can take an active role in the AI re-tooling of the industry.

Finally, Hoppa advises users on how the system can be used as a risk substitution control within an over-arching risk management strategy. This involves substituting a high risk with a lower risk to manage overall risk As Low as Reasonably Practicable (ALARP).

For example, if users are reviewing incoming documentation to identify substances hazardous to health (COSHH), Hoppa could be used as an additional screening layer or as surge capacity during high demand periods. Improvements to speed and/or coverage of COSSH identification mean the overall risk score is lowered, despite increases in the risk score of the system flagging false positives/negatives or risk score to business loss of skills.

Transparency, licenses, availability & documentation

Is the code accessible?

The code is not publicly accessible. Hoppa Workbench SDK documentation can be found at: https://workbench-docs.hoppa.ai/.

What software license applies?

Hoppa’s standard terms and conditions of business: https://hoppa.ai/terms-and-conditions

Are there any IP and ownership considerations?

The system is the sole property of Hoppa Technologies Limited.