Iris the AI scientist

Screen Shot 2016-08-12 at 12.39.45  Andrew Wade, senior reporter

The morning after Theresa May called her snap election, I found myself sitting in the lobby of Northcliffe House, home of the Daily Mail. Surrounded by headlines of ‘saboteurs’ and giant flatscreens displaying the latest from MailOnline, a foreboding sense of unease began to set in.

Thankfully, a warm welcome awaited when I emerged from the lift at Founders Factory, the technology incubator that shares the building with Dacre and co. Set up by some of the brains behind lastminute.com, Founders Factory is backed by companies including Aviva, easyJet, L’Oréal and Guardian Media Group. According to its website, its mission is to launch 200 businesses over the next five years across a range of sectors, covering everything from fintech and healthcare to beauty and publishing.

As well as building up teams from scratch, Founders Factory will also work alongside more established startups through its accelerator arm. One such company is Iris, an AI-based research tool that promises to change the way businesses and academics approach R&D. The international team behind Iris is led by Norwegian CEO Anita Schjøll Brede, with co-founders hailing from Sweden, Finland and Spain. The four came together for the first time in 2015 at the Singularity University, a Silicon Valley think tank and incubator that aims to use ‘exponential technologies’ – such as AI and genomics – to solve some of the humanity’s big problems.

“We got the team together and sat down and looked at what problems do we think we can solve in the world,” Schjøll Brede told me.

Initially looking at everything from world hunger to health issues, the team started searching for established research in the various areas. However, they quickly realised that across these disparate topics, they lacked the domain expertise to properly get to grips with them. It was from this realisation that the idea for Iris was born. The web-based tool allows users to input a research paper’s web address, then explores the text from that paper and suggests other scientific research that’s likely to be relevant.

The AI technology at the heart of the process is known as non-semantic neural topic modelling. Iris uses a TF-IDF (term frequency–inverse document frequency) algorithm to assess the importance of certain words in the context of the document they sit in.

“We don’t work with citations, we don’t work with who the researchers are. We just look at the text,” said Schjøll Brede.

The Iris team

“We vectorise those words and put them into a multidimensional vector space. Then we go out and train our model on about 18 million other research papers.”

Algorithms then seek out contextual synonyms and hypernyms, words that have similar meanings or which are closely related to those found in the text. This results in word clusters of up to 150 words that are correlated to the original document, providing a fingerprint that can be used to cross-reference against other research papers.

“We use all that information to create this hierarchy, and feed that back to the user,” said Schjøll Brede. “Our goal is to bypass the need to know the terminology of the field, the need to understand the full scope of what you’re looking at.”

“Teams that use it outperform teams using say, Google Scholar, by more than 2X, and we actually kind of enable people who aren’t necessarily domain experts to perform the literature search as well as domain experts.”

Anita on stage at TechCrunch Disrupt

As well as self-learning, Iris also has a user base of 5,000 ‘trainers’ that manually help the technology improve. While the AI’s evolution has been open-sourced to a degree, much of the work it has been designed to feed off remains hidden behind paywalls. But there is still a wealth of material in the public domain, and Schjøll Brede believes the prevailing trend is for more and more research to be openly published.

“Our free tool has access to about 69 million research papers today, which is good,” she said. “But of course, there’s 50+ million – maybe more, maybe as much as 100 – that is paywalled.”

The hope is that a micropayment system, combined with an increasingly open publishing landscape, could help extend Iris’s reach beyond some of those barriers. When the company works with major clients, they often already have access to a lot of paywalled research, and the tool can easily integrate with that material, as well as internal documentation and archives.

Understandably, big companies can sometimes be reluctant about handing over the keys to their document troves. To help overcome this, the Iris team has been hosting science hackathons, or scithons, where researchers are tasked with using the technology to try and solve a company’s R&D problem over the course of a day. Over time, Schjøll Brede says Iris could even be trained to become an independent researcher, prompting scientists to investigate links found between certain papers and test new theories.

“Once we’ve done that, we can actually connect her to a robotic lab, or a simulation environment, she can create that hypothesis or a hundred different hypotheses, use the methods she’s found, test those methods in a robotic lab and then actually come back to you with the results.”

That sort of scenario may be about a decade away, however. In the meantime, the Iris team will be working with Founders Factory to develop the tool further, raise funding and gain exposure. It’s a six-month programme designed to speed up both the product and business development of the company, helping Iris reach a state of maturity. One wonders just what she’ll be capable of when she’s all grown up.