The Computer Inside You

By Team Digit Published Date
01 - Nov - 2005
| Last Updated
01 - Nov - 2005
The Computer Inside You
Silicon has been singularly successful as a computing material. So much so that there's even an area nicknamed after the element-which, as some have half-jokingly pointed out, should be called "iron oxide valley" instead of "silicon valley" because of the disk industry. Every desktop computer in the world has silicon in it; and we can't imagine computing except in the context of silicon.

Is there something fundamental about silicon that makes it impossible to live without in the computing world? Nothing, really. Silicon and its properties just happen to be extremely suitable for computers to be built upon. But it's not that other materials-such as DNA, for example-cannot be explored.

DNA? When one first hears about "DNA Computing," the natural reaction is bewilderment. It needn't be: one just needs to know what "computing" really means. "Computing" does not mean browsing the Web or playing an MP3 song. Computing means the manipulation of numbers or other symbols, while having a means of feeding inputs to, and extracting outputs from, the system that is doing the manipulation.

We decided a long time ago that the best things to manipulate were ones and zeroes (binary), and that the ideal devices to do that were to be silicon-based. That's about the hardware. What about the software-the stuff that does the manipulation? John von Neumann-the pioneer of the stored-program concept-decided that the software, too, should be put into the silicon, and there you have it.

Is there something fundamental about silicon that makes it impossible to live without in the computing world? Nothing, really

Now substitute A, C, G and T for the ones and zeroes, substitute enzymes for the software, and you have a DNA computing system. To expand this a little bit: A, C, G and T are the nucleotides (or bases) in DNA molecules, one of the fundamental building blocks of life. The enzymes we're talking about are chemicals that do things to DNA in much the same way they do in your body. And since software needs to 'operate on' hardware, we need to have a basic set of operations. That, too, is provided by DNA: the molecules can split, copy themselves, recombine, and so on, which gives us an adequate set of operations-just as adding, XORing, and so on give us an adequate set of operations for ones and zeroes.

Note that we haven't mentioned a "DNA computer" so far. There's a good reason for that: DNA computers don't exist. DNA computing does. How? Well, we think of computers as things that produce results 'on their own,' given a set of inputs. Like when you type in "" into a browser, you don't need much further inspection of the system before your browser takes you there. On the other hand, when we talk about "DNA computing," we're talking about solving problems using DNA-with or without human intervention-and this was Leonard Adleman's idea. Adleman is a theoretical computer scientist and professor of computer science and molecular biology at the University of Southern California.

The Travelling Salesman Problem
Adleman solved a broad version of the Travelling Salesman Problem (the TSP) using DNA. The TSP is quite simple to understand, really (but difficult to solve in most cases). Think of a salesman who needs to do a tour of some cities-say five, including the origin city. He needs to visit each city once, and return to his origin city. He shouldn't visit any city twice or more-just once.

The problem is to find the shortest route by which he can complete his tour, so he can save on the airfare. For example, say he starts from Mumbai; he needs to visit Delhi, Kolkata, Chennai and Bangalore, with Kolkata being his last leg, and return to Mumbai. What is the shortest route? You can see there are several possibilities. He could do M-D-B-C-K, or he could do M-C-D-B-K, or he could do…

Since we know some Indian geography, it's easy to see that our first option is the best. When the problem is limited to a few cities, it can be solved by hand; when it involves, say, 20 cities, a computer would take a while to do it. When there are, say, 200 cities, a supercomputer is needed, doing it the 'brute force' way-looking up each route. And when it involves 2,00,000 cities, the fastest supercomputer on earth probably can't do it in a hundred years!

So did Adleman do it? Did he use his DNA-rather, the DNA in his test tube-to solve what supercomputers can't? Actually, no. He solved something like our example above, which we can do by hand. It was groundbreaking because he illustrated what could be done using DNA. That's needed because, as you've probably noticed from the figures above, the complexity of the problem explodes as the number of cities goes up. That's because of the massively parallel nature of the problem-meaning that the computer (or computing device) cannot work out a neat, systematic way of exploring routes. It has to look at many, many paths at the same time, one of which will turn out to be the answer. (There happen to be algorithms that simplify the process heavily, but that, again, depends on human ingenuity-change the problem and it'll take people years to discover a cooler way to solve it.)

That's where DNA computing comes in. The hardware-the DNA molecules themselves-is used to encode the operation, and in one droplet of DNA solution are more than a trillion molecules. They all work in parallel, doing the same kind of thing, and then some molecules morph into the answer.

The Key Insight
Like we said, Adleman kickstarted the field of DNA computing by using DNA to solve a broad version of the TSP. What he did was pour a solution with some DNA into a test tube, then add some enzymes to it, then add more enzymes, and finally filter out the molecules in the test tube. Those DNA molecules that emerged gave him the answer to the seven-city version of the TSP.

If that sounds way out, it's only because we don't understand how advanced biotechnology is today. It's relatively easy to understand the 'hardware' part of it-that DNA molecules, with A, C, G and T, play around and recombine and split and so on. What seems difficult to understand is how they play around the way we want them to. It turns out that using the appropriate biochemical techniques, one can 'tell' the DNA molecules what to do, and one can also 'read out' the results of what went on in the test tube.

It is beyond our scope here to explain how exactly these biochemical agents work; what we can do is explain what manipulations are done. And what better example than Adleman's experiment?

How The Mixture Can Produce Random Routes

We can obtain the route encodings from the city encodings. These two mixed together result in complementary bonding-producing molecules that represent routes

To understand how a mathematical problem can be solved by a bubbling solution in a scientist's test tube requires us to understand a few things. First, the structure of the molecules in the solution. Second, what goes on inside the test tube and how. And third, how the scientist gets his answers from the test tube.

Now let's get our hands dirty and see how Adleman's experiment worked.

Adleman's Experiment (Or Something Like It)
DNA molecules consist, like we said, of the four nucleotides A, T, G and C. These can be considered its building blocks. The familiar double-helix you see in all the pictures comes about this way: A (Adenine) bonds with T (Thiamine), and C (Cytosine) bonds with G (Guanine), to produce what is called a complementary strand. So if you have a strand like ACCTCA, the complement would be TGGAGT. Those two strands entwine in double-helix fashion.

We'll restrict the problem a little so there's only one correct answer. (Let's take it easy on Adleman, shall we?) So let's say there are no flights between some cities, such that the only answer is M-D-B-C-K. So what we hope to achieve with this DNA computation is DNA strands that encode this particular route.

Our first step would, naturally, be to encode the cities and the routes. Let's use four bases to encode a city, so Mumbai could be, say, AGTC, Delhi could be GATT, and so on. Our first task is to produce a test tube full of these molecules. That's easy to do with today's techniques-all that's needed is a DNA synthesiser. (OK, we're skipping a lot of details, but those aren't important.)

Now how would we produce molecules that encode the routes between cities? We use a property of DNA for this-the fact that a strand of DNA entwines ('hybridises') with its complementary sequence. So, suppose we took the last two bases of Mumbai (TC), and the first two bases of Delhi (GA). We get TCGA. Now we take the complement of this, which is AGCT. This can represent the path between Mumbai and Delhi. How? Because AGCT would hybridise with TGCA, linking the two strands together.

So if we had lots of copies of Mumbai (AGTC) and Delhi (GATT), and lots of copies of the route between them (AGCT), we'd get several molecules that strung together Mumbai and Delhi. The molecules that string together Mumbai and Delhi would represent the route between them.

The same holds for more than two cities-you could have routes consisting of any number of cities. After a sufficient period of time, if you'd taken enough DNA in the test tube, you'd have all the paths possible-including paths that make no sense for our purposes, such as Mumbai-Bangalore-Delhi. These answers are nonsensical because we're assuming there's only one answer to the problem.

We need to weed out the paths that make no sense, so our next step is to make the test tube full of paths that start at the correct city (Mumbai) and end at the correct city (Kolkata). Like we said, there are several operations possible with DNA. Let's use copying. There's an enzyme called polymerase involved. And a reaction called the Polymerase Chain Reaction (PCR). Well,what happens?

PCR can be used to copy sections of DNA you're interested in. The polymerase actually does the copying, and PCR means many iterations of the copying process. You can dictate to the polymerase what segment of DNA you want to start at and where you want it to end. So if we said we wanted it to start at Mumbai and end at Kolkata, those sections of DNA would be copied over and over via PCR. And after a while, you end up with a mix of DNA molecules that encodes several routes, some of them still nonsensical, but all of them starting at Mumbai and ending at Kolkata.

Now we find those paths that are exactly five cities long. Since each city is encoded by four bases, we know the final path should be exactly 20 base pairs long-4 base pairs, multiplied by 5 cities. The technique used to find the lenght of base pairs is called gel electrophoresis (GE). Using GE, the DNA gets segregated into different lengths, and you can filter out the DNA molecules that have the length you want. So into the next test tube goes all the DNA that is 20 base pairs long. And what does this encode? To reiterate, this new test tube contains all paths that start at Mumbai and end at Kolkata, and which have a total of five cities encoded.

We now need to select those itineraries that have all the cities in them. (Remember we have DNA molecules that are five cities long, but in which cities may be repeated or missing). Now to pick out the molecules that contain a specific city, we use a process called affinity purification. We do this using, of all things, magnetic beads: the complement of the sequence you're interested in is attached to a magnetic bead. This complement hybridises with the sequence you're after. (Again, we're using the hybridising property of DNA.) Then you simply retrieve the bead, and you're left with DNA that does contain the sequence you're after.

It should be patent that you need to do affinity purification five times. For example, Mumbai must be in the pool of candidates; so you attach the complement of Mumbai to the magnetic beads, it hybridises with all the molecules that have Mumbai in them, and you pull out all the beads. The ones that get left behind are the sequences that don't have Mumbai in them. Then, in the next test tube, you affinity-purify for Delhi, and so on. In the last test tube, you'll be left with itineraries that visit each city exactly once, and which start at Mumbai and end at Kolkata!

Here, we 'know' the answer: it should be Mumbai-Delhi-Bangalore-Chennai-Kolkata. So we need to find if our 'solution' test tube has the correct path. The technique used is PCR again, combined with the technique for measuring the length of a DNA sequence, called gel electrophoresis, which we've used before. You amplify (using PCR) the section between Mumbai and, say, Bangalore. If these molecules are 12 base pairs long (as they should be if all went well), you know Bangalore is the third city in the itinerary! Similarly, you PCR-amplify for all the four remaining cities and find their positions in the itinerary. We're done!

Of course, in our example, we haven't really solved a problem, but we've shown (like Adleman did) that DNA and the procedures mentioned above can be used to solve a mathematical problem. Unfortunately, even what Adleman did has not convinced everyone. Calculations have been done to show that if the problem were to be increased to a 100 or so cities, the amount of DNA required would be utterly impossible; here, it's a restriction in terms of volume, rather than a restriction in terms of time as with a traditional computer.

So Is DNA Computing Any Good?
Yes. It's only that we haven't gotten there yet. One must realise that DNA computing is, at the current state of the art, best applicable to problems that are massively parallel in nature. This includes problems of the sort we described above. Then, there's encryption, where pretty much the same thing applies: to crack a code, one needs to try a whole lot of possibilities at the same time. There also happen to be several computer science issues that can be explored better by a DNA-based computing system, by virtue of its "stochastic" (based on probability and statistics) nature.

A Finite Automaton

A simple two-state, two-symbol finite automaton (FA). The FA is a 'machine' that has a state. When it's in state 0 and the input is b, the machine goes to state 1. When it's in state 1 and the input is b, it remains in state 1, and so on

At the same time, advances in biotechnology are about as staggering as those happening in the silicon world. We're just not aware of them! We've mentioned techniques, involving enzymes and magnetic beads, in our sample 'problem': these would have been impossible to do in, say, 1970.

There's a lot of the word 'self' associated with DNA. DNA molecules self-replicate; in so doing, they self-refer, with proteins as the intermediary; and it turns out that DNA's biggest promise lies in its ability to self-assemble. Much research in recent years has focused on this property of the DNA molecule-rather, of the structures that can be built at the nanoscale from the DNA molecule itself.

So what is self-assembly? And what is the relation between self-assembly and computation? These are important questions, because we're asking if DNA computing is just a fad or if it is the future. It turns out that DNA, in its self-assembling nature, could be a universal computer. In other words, it might be used someday to build any kind of computer at all!

Now that's a possible happy ending. But if you'd like the gory details, read on…

"A Trillion Computers In A Drop Of Water"
Such headlines make for exciting news. This one was in November 2001, when several Web sites reported that a group of scientists led by Prof Ehud Shapiro at the Weizmann Institute of Science in Israel had created a programmable FA (finite automaton; see figure) in a test tube. It was a two-state, two symbol FA, which is not very complex as FAs go, but it is an achievement. The news went: "This nanocomputer is so small that a trillion such computers compute in parallel, in a drop the size of a tenth of a millilitre. Collectively, the computers perform a billion operations per second…

"This study may lead to future computers that can operate within the human body, interacting with its biochemical environment to yield far-reaching biological and pharmaceutical applications."

Not to downplay the stupendous achievement, we should emphasise that such "computers" are not very different from the one we discussed-Adleman's DNA computer solving the TSP. The future implications for DNA computing that such news items predict are typically vague and broad. They say nothing about what DNA computing will be like, and whether you'll see a DNA desktop anytime. But that's what we really want to know-enough of the test tubes!

According to a Web definition, self-assembly is "the fundamental principle that generates structural organisation on all scales-from molecules to galaxies. It is defined as a reversible process in which pre-existing parts or disordered components of a pre-existing system form structures of patterns." According to another definition, self-assembly is "a branch of nanotechnology in which objects, devices, and systems form structures without external prodding."

Adleman's work was just the beginning. His 1994 paper basically explained how one-dimensional self-assembly of DNA-remember how the molecules arranged themselves physically according to the inputs?-could work like a FA. That was the foundation for connecting computation and self-assembly of DNA.

His work inspired Erik Winfree, in 1996, to propose that two-dimensional self-assembly of DNA can perform universal computation, with a set of 'molecular Wang tiles' as the 'program'. Winfree's proposal used Hao Wang's idea that computation had something to do with tiling. If Winfree's proposal was right, DNA would be a material that could, well, be used in a DNA computer.

We need to get our hands dirty again. What is a "universal computation"? What are molecular Wang tiles? How did Wang relate computation to tiling? What is tiling, anyway? There are straight answers to all these.

Winfree And Wang
A universal computation means any computation at all-just as the phrase implies. The full form is "Turing-universal," which refers to Turing machines. A Turing machine is the underlying conceptual 'machine' behind all computers today-whatever a Turing machine is capable of, any computing machine is capable of. To put it simply, think of a Turing machine as something that can do everything a general-purpose computer can do.

A Set Of 13 Wang Tiles-Do They Tile The Plane?

Wang tiles are equal-sized squares with a colour on each edge. To tile the plane using a set of Wang tiles, the colours on the edge must match. This is a set of 13 Wang tiles that can tile the plane aperiodically-meaning the tiling pattern doesn't repeat

A tiling is an arrangement of a few basic shapes, called tiles, that fit perfectly together and stretch any area you can imagine. Think of bathroom tiles-they're all square, and they fit perfectly together for any extent you can imagine. Bathroom tiles, however, aren't very interesting, because they're all square! Think of, say, a rectangle and two different kinds of triangles. If you had an infinite supply of these, would you be able to tile any area? Of course, that depends on the shape of the triangles; if you can, that set (of the rectangle and the triangles) is said to "tile the plane," if not, then the set does not tile the plane. And if the tiling pattern can repeat, it's called a periodic tiling.
Given a set of tiles, it was expected by scientists that one should be able to say whether they can periodically tile the plane. When Wang investigated the problem-called the "tiling problem"-in the 1960s, he discovered it was unsolvable. That is, given the set of polygons, it could be proved that there was no way of telling whether the set would tile the plane or not.

The tiling problem thus 'reduced' to the Halting Problem of Turing Machines. That's the computer-science way of saying that if you could solve the tiling problem, you could achieve the impossible. However, it also means that a machine (or whatever) that could go about trying to tile the plane is theoretically as powerful as a regular computer. For example, your computer, given a set of tiles that does tile the plane, will be able to find that it tiles the plane. In other words, dealing with tiles was proved to be the equivalent of computing in general!

Now, there is a close relationship between the way the plane is tiled and the way crystals grow. (Crystals start forming when a seed is put into a gel, for example: the gel starts crystallising from the seed outwards. The seed can even be a sepck of dust.) Winfree asked if crystal growth had the potential to compute as powerfully as tiling did. To answer that question, one needed the ability to design molecular Wang tiles. (Wang tiles are square tiles coloured on the edge, which pose the question whether the plane can be tiled using them.)

Winfree's Conclusions
It turned out that Wang tiles on the molecular scale could indeed be constructed. Nadrian Seeman, a pioneer of DNA nanotechnology, envisioned the use of DNA as an architectural element. DNA, it turns out, can morph into structures other than the double helix. Seeman and his students built a variety of nanostructures-a wire-frame cube, a truncated octahedron, and more-all with DNA; and most especially, a four-armed 'brick' known as a double-crossover (DX) DNA molecule. This DX molecule was the molecular Wang tile Winfree was looking for. The DX molecule's four arms could be given sequences of DNA bases (A, G, C or T) corresponding to the colour labels on the four sides of the Wang tile it represented.

In a 1998 paper, Simulations of Computing by Self-Assembly, Winfree argued it was plausible to perform Turing-universal computation by crystallisation, or crystal growth. He summarised at the end of the paper: "Our results lend credence to proposals for computation by self-assembly of DNA: we have found that 2D self-assembly can theoretically support computation with arbitrarily low error rates."

"With arbitrarily low error rates" is science-speak for "perfectly, depending on your implementation." So, DNA can indeed self-assemble in 2D, and this self-assembly means DNA is complete computing material!

If things are a little fuzzy at this point, a little summary should make it clear:
  • DNA can be nano-engineered into DX molecules, or molecular Wang tiles
  • Molecular Wang tiles self-assemble into 2D crystal lattices
  • Crystal lattice growth is similar to tiling the plane
  • Tiling the plane means universal computing
  • Hence (if Winfree's conclusions are correct) DNA can be used for universal computing!
DNA Nets
More on the self-assembly and nanotechnology front: Duke University scientists used, in 2003, self-assembling DNA molecules to build molecular meshes that could expand and contract, according to "Switchable net woven from DNA," published by Nature. Such a molecular mesh "might even be the basis of a computer that has as its components lumps of metals or semiconductors just a few nanometres wide. The switchable DNA net could turn electrical communication between these devices on and off by altering the distance between them." On and off-ones and zeroes! And once we build something that can manipulate ones and zeroes, we can, in theory, build a computer around it.

Today, Tomorrow, And The Day After
There are many visions for DNA computing out there. If you view it in the short term, you could say it's limited to massively parallel problems; that it is "an interesting idea, but…" If you're looking a few years ahead, you'll factor in that biotechnology is, strictly speaking, only a few years old, and that DNA manipulation techniques will take us well beyond where we are today. And if you look even further-at the work of Winfree, Seeman and others-you might see that nanotechnology is where it's at and that along with quantum computing, DNA computing is the future.

You won't see a desktop DNA computer soon, but whether you'll see one in your lifetime is an open question. If Moore's Law holds good for all of technology, you probably will.

Team DigitTeam Digit

All of us are better than one of us.