Two billion words, 200 years worth of data: how a Glasgow-based firm unlocked Hansard
A big data analytics company has developed a powerful tool that aims to help researchers sift huge volumes of text-based material and quickly find the information they need.
Software engineers at Nalanda Technology worked on a platform that would allow people to perform much easier search functions and get to the precise information they were looking for without having to search through endless documents.
The Nalytics tool offers up a ‘results tree’ of information which can give an instant glimpse of all the available text based on the search terms.
Its engineers pointed the tool to the UK Parliament Official Report, Hansard, which is available online. A quick glance of the search screen reveals the 200-year-old tomes contain more than two billion words; many of those would no doubt be attributable to the filibustering of Jacob Rees-Mogg, MP.
“In normal searches, if it’s a public record like Hansard, or the British Newspaper Archives, you often have to find that document, find that article and find the information within the page of that article and decide whether that’s relevant or not,” explains David Rivett, the company’s Chief Operating Officer.
“But if you had that text available as part of the search experience, then it’ll make it more efficient. So I think we could add some value to those services.”
The company incubated the concept within its parent OLM Group, which specialises in technology solutions for the health and social care sector. It is working on a project to mine that data to find predictive patterns that might flag up advanced warnings to healthcare professionals. But could that logic be applied to geopolitical events? Could free text technology be developed to predict where global conflicts might break out?
“Learned people make predictions around a build-up of certain types of behaviour,” says Rivett. “I can see how the technology can certainly help do that. Whether it’s machine learning or artificial intelligence, it’s certainly the start of something within that space.”