Andrew Mitchell, University of Natal, Pietermaritzburg
Dr Andrew Mitchell is a senior lecturer at the University of Natal specialising in molecular phylogenetics. In this curtain raiser article he outlines the role the young science of bioinformatics plays in his field of research. In future articles we will highlight where and how this science is already changing the lives of people on our continent.
An unconscious patient lies on her back on a plastic-looking bed. The puzzled doctor standing next to her speaks: "Computer, scan her DNA for contamination." When I first saw this scene in an old episode of Star Trek I scoffed at its implausibility. But researchers from Harvard University's Dana-Farber Cancer Institute brought this scene one step closer to reality this month as they unveiled a new computer-based method for detecting non-human genes (potentially those of infectious microbes) within the human body. Already they have found thousands of "alien" genes that are expressed in diseased human tissues, suggesting fruitful avenues of research into finding cures.
Bacteria, viruses and other microbes are now suspected of playing a role in many enigmatic yet common human diseases. Examples are coronary artery disease, multiple sclerosis, rheumatoid arthritis, and even various cancers. While not necessarily the biggest killers, these diseases have been among the most frustrating for medical researchers, resisting all attempts to find their cause, never mind a cure. The idea of microbial involvement is a relatively new one, probably spurred on by the surprising recent discovery that stomach ulcers are caused by bacterial infection, rather than by stress and excess stomach acid. But which microbes are involved in these diseases? If the culprits were easily cultured they would probably already be know to medical researchers. The problem is that some microbes are not easily cultured, and therefore are not readily identified, or even detected. Unlike trying to find the proverbial needle in a haystack, searching for these agents is like looking through the haystack without even knowing whether it's a needle or a particular grain of sand that is being sought. This is where computing power is useful, in identifying the target and homing in on it.
Although inconceivable just a few years ago, their method is conceptually quite simple. It compares all the genes found in diseased tissue with the genes of healthy tissue-any genes left over are likely to be from an invading microbe. But conceptually simple does not mean easy. Given the size and complexity of the human genome, such comparisons are no mean feat.
Specifically, the new method uses "expressed sequence tag" (EST) libraries, which are sets of short DNA sequences representing all messenger RNA molecules found in a particular tissue. ESTs thus represent the genes that are switched on in that tissue at the time the library was made. By comparing EST libraries of diseased tissues with all the genes found in the human genome, and using a computer to remove those genes common to both sets, a method they call computational subtraction, they are left with a set of genes that don't belong there and must be of non-human origin. If one repeats this process for many samples of diseased tissue from different people and always find the same set of genes left over at the end, you have found a microbe associated with that disease, and perhaps causing it. These genes are then checked against existing DNA sequence databases for known microbes, and any that are left over are likely to be from unknown organisms.
This method has been tested and proven worthy. The results were formally published in this month's edition of the prestigious journal Nature Genetics. The researchers filtered the 3.2 million ESTs in the GenBank database against the complete human genome sequence, and found 65 000 non-human genes. Approximately one-fifth of these were identified as known pathogens or likely laboratory contaminants, but more than 50 000 genes had no known representatives in any biological database-they are new to science.
The researchers' next step will be to apply their method to tissues from patients suffering from lupus and Crohn disease, which are additional mysterious diseases with potentially infectious causes. If and when novel genes are detected in the affected tissues, investigators will have to switch to more traditional methods, such as making antibodies to the proteins made by those genes, in order to identify the microbes.
It is both fascinating and frightening to think that so many genes from unidentified species have been found within our bodies... and the search has only just begun. How many "alien" life forms are living within you right now? Are they friendly? If not, how will your doctor know whether they are there? And how can we kill them if that turns out to be what we need to do? These are the sorts of questions that we now have some hope of answering because of the power of the fast growing field of bioinformatics.
Bioinformatics is a very young interdisciplinary field that has gone from 0-100 in the last 3 seconds and is still accelerating. In science it is usually theory that advances ahead of practice, suggesting experiments to be done in the laboratory or the field. But that is not how things have worked in bioinformatics. Just last year a draft sequence of the human genome was published, and soon all 3 billion bases (the familiar As, Cs, Gs and Ts that make up DNA) will be known, though not completely deciphered. Complete genome sequences already have been published for about 60 pathogenic microbes and a dozen or so plants and animals, and hundreds of additional projects are in progress. Theory has lagged behind the surge of data flowing from both privately and publicly-funded research, though some would argue that they now race neck-and-neck. Who knows what theoretical advances await? These are exciting times to be a biologist!