Biological search engine

Berkeley Lab researchers have developed an innovative search engine that simulates the way scientists think.

It may take weeks for a biologist to comb through a stack of journal articles and discover that one gene is functionally related to another. This relationship could lead to a new way to fight a disease – but not if it remains hidden.

Berkeley Lab researchers hope to accelerate this needle-in-a-haystack hunt with an innovative search engine that simulates the way scientists think. It’s called GenoPharm, and rather than search through data by keyword, the way Google does, it searches by association, like scientists do.

“GenoPharm mimics the way biologists search through biomedical literature for connections between genes,” says Kasian Franks of Berkeley Lab’s Life Sciences Division, who developed the software with Life Sciences biologists Mina Bissell and Connie Myers. “It could enable a biologist to do in minutes what now takes them days.”

To use GenoPharm, a person enters a gene symbol and selects a context, such as “molecular function” or “therapeutics.” The result is a web of relationships, with genes that appear more closely together in scientific literature appearing more closely together in the web. Plug in “BRAC-1,” for example, which is a gene that plays a role in breast cancer, and a GenoPharm search yields a sprawling network of associations.

Some connections are known, some are not. By following one thread of relationships, a researcher can learn that BRAC-1 is linked to a gene that performs DNA binding functions, which is related to another gene, which is the target of a drug that slows the growth of cancer cells.

“We are able to find indirect connections between genes and therapies that haven’t been noticed before,” says Franks.

The idea for a search engine that maps associations came to Franks by way of his three young children. He noticed how each child processed information by taking two pieces of knowledge, combining them, and coming up with something new. Franks wondered whether he could get a computer to do the same thing — that is, help a biologist connect two genes in a previously unknown way.

He turned to the Geneva Development System, something akin to a search engine factory developed at Berkeley Lab to find contextual relationships in biomedical databases. The system measures the proximity of every word to every other word in millions of documents, and, when asked, reveals how a specific word is related to others. In developing the system, the team drew its inspiration from the way a person’s brain works when asked to list the words associated with the word “sky.” Nearly always, a person will immediately respond with “blue” and “cloud,” largely because they are accustomed to seeing these words very near “sky” in text.

“We are literally mimicking the process of auto-association, which is a cognitive principle that describes how a human stores and recollects information,” says Franks.

In this manner, GenoPharm focuses the Geneva Development System’s powers on a database of 70,000 gene descriptions and PubMed functional references. Once an associative network surrounding a gene is generated, a separate database maps relevant diseases and therapies to each gene, creating an interlinking web of genes, diseases, and their therapies.

The system is still in the developmental stage. As Franks says, it isn’t easy getting a computer to do what comes naturally to a child, but his goal is to narrow the gap separating how computers and people process information.