The Lizard Of Oz

By Team Digit | Updated on 01-Jan-2006

01 Jan 2006 14:54

We speak in awe of the Artificially Intelligent being. People have come and gone thinking that they would see the ultimate Computer in their lifetime-one that’s indistinguishable from a human being. In the past and even today, many researchers have tried to build such a computer.

Alan M Turing, in 1950, proposed the “imitation game” to test the intelligence of a machine. The Turing Test for Artificial Intelligence basically states that if you were chatting with a human and a machine and couldn’t tell the difference, then the machine is intelligent; it has “reached AI.”

To achieve the technological Nirvana of an intelligent computer, AI researchers use two approaches. The first approach-the top-down approach-is the creation of an all-knowing machine. Machines are great at storing knowledge, but they are awful at little things like basic common sense, so rather than programming the machine with all the knowledge in the world, program the computer with all the common sense in the world, and let it loose upon civilisation. This is what Douglas Lenat, CEO of Cycorp, is using for his robot Cyc (pronounced “psych”). There are problems with this approach, though.

First, there is the horrifyingly large size of the world’s collective common sense. Second, most people would not be able to tell the difference between such a machine and a human, and it would pass the Turing Test. However, there will always be the odd person who would figure out it was a machine he was talking to, and it would then be easy for him to poke holes in the software.

It’s nice to know a lot, but realistically, you can’t know everything. To be regarded as human, software cannot take shortcuts that we aren’t privy to-it will have to go through life just as we do. This is the reasoning Rodney Brooks of MIT uses for his robot Cog. Rather than attemp to teach it everything, Brooks will let Cog find its own way in the world and learn on its own. This way, Cog would be able to respond to unexpected happenings in a more human-like manner-stumbling at first, but slowly learning from its mistakes.

It is difficult to say which approach will finally create the first truly intelligent machine, but the second approach might just win. The software of the future would be able to modify itself to behave the way users want it to. What’s more, it would do this and still survive the big bad world it lives in.

The real world needs Adaptive Software-which can alter itself to meet users’ demands better, and can do so even in the face of chaos like failing hardware

It’s Scary Out There
Life was easy for programmers when their end users were just computer professionals. There were less of them, and this meant fewer letters asking for new enhancements. Today, though, the number of users is staggering-including not just professionals but the layperson as well.

Add to this the complexity of today’s systems: a company that used to have its employees connected to a single mainframe server, today, has a hornet’s nest of desktop PCs and servers throughout its network.

And the cherry on the cake is the resources that need consideration-hardware resources, system load, network bandwidth, security and many more. Software needs to be able to trade off some of these resources in order to meet its goals.

Thus far, software has lived in a Utopia where every situation is planned for and every type of scenario is anticipated. The bitter truth is that programmers can’t write applications that will tackle situations they can’t foresee, and can’t anticipate the new and creative ways in which users can cause software to misbehave.

In the real world, then, we need software that can react to and plan for sudden changes in its environment. Software that can alter itself to meet users’ demands better, and can do so even in the face of chaos like failing hardware. The real world needs Adaptive Software.

Er… What Software?

In 1997, DARPA (Defense Advanced Research Project Agency)-an agency of the United States Department of Defense-defined Self-Adaptive Software as software that “evaluates its own behaviour and changes its behaviour when the evaluation indicates that it is not accomplishing what it is intended to do, or that better functionality or performance is possible”.

In nature’s model of Life, the Universe and Everything, creatures have responded to changes in their environment by evolving to suit it; the intention is the same in the case of adaptive software-it alters itself according to its environment. The “environment” for a software application is complex, just like nature. There is hardware to live in, the ‘society’ of the operating system, and other software and humans to interact with. But how does software know what to do to adapt to this environment?

Put simply, Adaptive or Self-Adaptive Software “knows itself.” This means that in addition to knowing what it has to do, the software also knows its own structure-how it does what it does-and the many different ways to achieve its purpose. This way, if it isn’t doing its job well, the software will automatically alter itself to do it better.

What Would You Do?
Let’s take a look at how we interact with our environments, and how adaptive software follow the same behaviour in their own environments.

What if you were on your way to the cinema hall and your car broke down? Simple, really. Get out, take a cab, and move to your destination. Today’s software is the equivalent of someone who’d sit in the car and cry about being unable to proceed-decidedly unpleasant company. Adaptive software, however, will be able to keep going despite such hardware failures. Like us, it would try to find a different method before giving up altogether. This is called System Level Adaptivity.

Suppose, now, that you are back home, still wondering how to go to that movie theatre. In Mumbai, you could take the train, which would be fast, but crowded and smelly. You could also take the bus, but that’s just uncomfortable. Taxis are too expensive, and you have the sudden premonition that your car is going to break down.

Now that you’ve spent so much time mulling over what to do, you realise it’s really late and that you should be getting on. The best way, then, is the train. In quite a similar manner, adaptive software looks at the status of the system and decides the best approach out of many possible approaches. This is called Algorithm Level Adaptivity.

Now picture yourself budgeting for that shiny new PC you’re going to build. What you could do is find out the exact price of everything right down to the xx99.95 plus taxes. Accurate down to the last paisa, no doubt, but it would take you ages to figure out what you need.

What you could do instead is quickly just find out the approximate prices of everything and be on your way. You’ve traded in accuracy for time. Adaptive software will be able to decide how to make its calculations depending on how much accuracy is really needed for a purpose. If it isn’t a big concern, it would use a less accurate method to save time and system resources. This is called Application Level Adaptivity.

The best adaptive software will be adaptable at all these three levels.

So How Do They Work?
It’s all very well to talk about what adaptive software is capable of, but we now need more than just science fiction. We need to know just how this “revolution” proposes to deliver what it promises.

We expect software to behave like humans; the first step, then, is to have them thinking like humans. One of the ways in which adaptive software works is much like how we work. Instead of just running into a task, it plans its approach, executes it and then evaluates it. If the evaluation shows any negative results, it re-plans the approach and repeats.

Such software could also work like a factory, with inputs and outputs, and a monitor that manages the factory. In case the output is not the best, the input can be tweaked to make it so.

The key to adaptive software is the agent-a piece of software designed to help you meet your goals

Send In The Agents
Today’s software works by reacting to users. To get an application to work just the way you want it, you need to give it specific instructions. The average human, however, is quite averse to such activity. When we give instructions to other humans, we hardly spell out a detailed plan of action to follow. Instead, we tell them what would make us happiest, and then leave them to their devices. And while they set out to get us what we want, they do so independently.

The key to adaptive software is the agent-a piece of software designed to help you meet your goals. Just like human agents, all they need is an idea of your preferences rather than your concrete goal. They will then act autonomously on your behalf and interact with other agents for an output that best suits your preferences. In addition, they keep an eye out for what’s going on in their environment, and can take the initiative to correct the consequences of, say, a network failure.

The bus’s delimma: which way is the most rewarding?

One thing that might concern us, then, is how to prevent this agent from running wild and wreaking havoc-we need to know that the agent is doing what we want, and not just what it thinks we want. The answer sounds quite simple-teach it! Once we’ve told the agent our preferences, we need to let it have a few practice runs to see how it’s doing, and tell it where it’s going wrong. To do this, we apply a technique called Reinforcement Learning.

Teach Me, Master
Reinforcement Learning (RL) relies on the concept of rewards to tell agents whether they have hit upon an optimal solution or not. The goal of the agent, like all of us, is to maximise the rewards of performing a task.

To understand RL, let us build ourselves a city, and throw in an automated bus system. Passengers would approach a bus-stand, press a button, and soon a bus would be on its way to pick them up.

Let us look at just one of these buses. This bus now has to pick up various passengers across the city and drop them to their destinations. Only once the bus is empty can it turn back and go the other way. What is the best way to go about it? Nobody really knows yet. What we need to do is establish a policy that will define the best way for the bus to fulfil all its duties.

To define our policy, we now need to decide why the “best” way would be the best. For this case, let’s say that our criteria are fuel efficiency (because we are all environmentally) and ensuring that a maximum number of passengers reach their destinations without spending too much time waiting for the bus. Now let’s set up a reward system. Let’s say that the bus gets -5 for running empty (wasting valuable fuel), -1 for each bus-stop with a waiting passenger, and 5 for each passenger who reaches his destination.

Now, writing software to prepare for each scenario possible above is a mammoth task for programmers. And even after that, if the environment changes-the aging bus might get slower-the programmers need to be called in again. And after that, if we were to set up another futuristic city with this same system, we’d have to turn to the programmers again.

RL is the answer to this problem because it trains the system to do its job rather than have it pre-programmed. This means that if any part of the environment changes, the system will be re-trained all on its own without programmer intervention.

But before we send our software agent to train itself, we need to equip it with some vital information. Firstly, it needs to know the reward function-how rewards are calculated-so that it can decide which course of action would get it the maximum reward.

Still, merely knowing the immediate rewards of an action is not enough. The agent also needs to know the long-term rewards of a series of actions. Consider this: suppose there are passengers at bus-stops A, B and C and each of them wants to go to bus-stop D.

The agent might just calculate that there are high rewards in taking A to D. But then, it would have to return for B. The logical choice, we can see, is to pick up all three and then head to D. The long term consequences of the agent’s actions are captured in the utility function. So the best solution for the agent will involve high immediate and long term rewards. In this way, Reinforcement Learning starts with an initial guess at the utility function, and evolves as it explores the system.

And finally, when our agent has explored the system enough and trained itself for the best utility function, it can be deployed into the real network.

To implement adaptive code, we need a Dynamic Language-one that allows software to be changed even at runtime

What we’ve described here is a single controlling agent for our system. In a lot of applications, though, we would need to have many agents interacting with each other. The RL algorithm in such cases would need to be modified to incorporate complex theories such as the Game Theory and the Economic Market Theory.

All is not hunky-dory with RL, though. Like any human agent will tell you, you can’t always get what you want. This holds good with the software agent as well. Everything cannot be optimised at once, so the software agent will make a few trade-offs, and you will need to tell the software agent how much importance you attach to resources and to results. You also never know what is going to happen-software may not be quick enough to respond to sudden changes in environment, and performance would suffer for a while.

Is It… Human?
Another way for agents to learn is the Genetic Algorithm, that’s designed to achieve the same goals as Reinforcement Learning, but uses a different approach. Instead of learning the utility function to satisfy the best policy as time goes on, the Genetic Algorithm learns the best policy itself.

The reason Genetic Algorithms are called so is because of their close resemblance to biological evolution. The Genetic Algorithm creates a pool of possible policies and evaluates each one to see the rewards. By combining successful policies, a new pool of policies is created. This is akin to biological reproduction and the process of natural selection-by combining the best of the gene pool, a newer, stronger gene pool is created, and the rest just die. This process continues until the best possible policy is obtained-the software version of “Survival of the Fittest.”

Deciding between an RL or Genetic Algorithm approach is difficult. Reinforcement Learning would work better in bigger, more complex systems, but Genetic Algorithms would be the preferred choice for a small system with limited scope – the fewer policies it has to sift through, the more efficient it becomes.

The Viable Systems Model

The Viable Systems Model specifies a generalised model for agent-based systems. It breaks down self-adaptive software into six major systems:
System One (S1) – Operations: The dirty workers-one or more S1s perform the basic operations of
the application.
System Two (S2) – Co-ordination: The foremen-S2s co-ordinate the activities of the S1s.
System Three – Control: The management guys-resource planning and strategic planning.
System Three* – Audit: The inspector-monitoring the progress of the S1s and seeing whether all is going according to plan.
System Four (S4) – Intelligence: The navigator-surveys the environment and plans the way ahead.
System Five (S5) – Policy: The Big Boss-defines the purpose of the entire system, based on the ‘world view’ provided by S4.

Decisions, Decisions
Thus far, we’ve talked about agents making decisions-but how do they “decide”? They are, after all, computer programs, and all they know is zero or one-true or false.

The answer is a Probabilistic Network, which executes an action based on the information presented to it by the surrounding environment. Decisions are made based on the probability of a favourable outcome from an action, given all that is happening in the surroundings. The term “Network” is misleading-probabilistic networks are a way of depicting probability problems, rather than being actual, physical networks.

We now need to talk about the Bayesian or Belief network, which makes decisions based on Bayes’ Theorem, developed by the Rev Thomas Bayes. The gist of this theorem is that the strength of our belief in a hypothesis, given some additional evidence, depends on the probability of the hypothesis alone, as well as the probability of the evidence assuming the hypothesis itself is true.

By starting on an initial hypothesis, the software goes on to calculate all the possible Bayesian networks that would maximise the probability of the hypothesis being true, finally coming up with a structure that would give the maximum reward.

So we now have an agent using Bayesian networks to predict the utility function that will maximise its rewards. After it has made its initial prediction, it selects the best method to achieve this function and sets about executing this method. The agent will now monitor the rewards of this system and keeps modifying its idea of the utility function to reap greater rewards.

The Tools Of The Trade
To implement adaptive code, we need a Dynamic Language-one that allows software to be changed even at runtime. Languages like C demand that the programmer have a clear idea of the structure that the program is going to take. Having a design idea for adaptive software before getting down to building it, is a horrifying, if not completely impossible task.

What we need to do is write these programs in such a way that its structure can be modified as and when necessary. The two preferred languages that support this are Apple’s Dylan and the Common Lisp Object System (CLOS).

Unlike regular programming, AI programming doesn’t use the “if-this-then-that” approach. What it relies on is the encoding of information about the “world” it runs in. When an event occurs in the system, the program uses the information it has about the world to draw its own conclusions about what happens.

Lisp and Dylan have always been favourites for AI programmers because they allow the programmer to add more and more information about the world at runtime, basically, teaching the program new things.
True intelligence can only emerge if a program goes beyond what the programmer foresaw

What Is It Good For?
The goal of developing adaptive software is to make software more like living beings. Just like in living society, software agents will observe, learn from and co-operate with its environment as well as other agents. The possibilities for such software are endless.

The aim, of course, is to finally reach that Holy Grail of computing-a HAL 9000-a computer that could converse with humans as if it were a human itself. It could learn about you and speak to you only about things that you like talking about.

A HAL 9000 built without adaptability would behave the same whether it was interacting with a 40-year-old nuclear physicist or a two-year-old child. An adaptive HAL 9000, however, would discover that the interaction with two-year-old wasn’t really working out so well, and would soon alter its approach to get the best results, eventually reaching the level of baby-talk.

Software will also be more resilient. Server software, running on multi-processor systems, would quickly adapt itself if one of those processors failed (providing, of course, that the hardware is built that way), keeping the server alive.

Software could be developed that would monitor your PC’s performance, and give up some of its resource-hungry tasks to ensure that it is always responsive to your actions. Adaptive software would also have enough “self-awareness” to detect a malfunctioning algorithm within itself and be able to still run, albeit a little crippled.

Robots today are built for specific tasks-programming them to take on new responsibilities is cumbersome. But with adaptive software, robots can take on new roles easily. Adaptive software can also be used in robot teams, say, a robotic football team where each team member can take on a new role depending on its position on the field.

Gaming AI is at an incredible level today. Your opponents will choose fantastic strategies to get your goose, and you need to push your abilities to the limit to figure out how to beat them. But once you have figured out their shortcomings, the game becomes a snap.

Now imagine this, a game that realises that its old strategies aren’t going to work anymore. It’s being defeated, and there are no rewards to being defeated, so it sets about figuring how to beat you. It observes you, gets to know you and learns your favourite moves and weaknesses. It then uses this knowledge to hit upon the best way to turn you to dust. No corner would be safe; you would have to give up your standard moves and trade them in for new ways to approach the game every time. The agents in the game would be as dangerous as real human players, perhaps even more so.

When Will This Happen?
We are already seeing little instances of adaptive software in our lives. Microsoft’s Office Assistant is one of the earliest examples of software that tries to anticipate user needs. Today, Google’s AdSense program has some semblance of adaptivity-it shows you advertisements it thinks you are most likely to click. Adaptability becomes more difficult to implement as the purpose gets more complex, but the Agents are already among us. You never know where one might pop up.

We should see the first examples of truly adaptive software appearing in places where the efficiency and survival of the system are foremost-server applications that can adapt to the changing load of the system, and are immune to minor hardware failures.

After this, we should see adaptive software extending itself to more complex (read: more fun) applications such as gaming. Game developers are always trying to make more challenging games; soon every computer-generated opponent will be an agent.

Perhaps, most importantly, adaptive software will take us along the road to true AI. A system can exhibit intelligent behaviour even if it’s dumb-but true intelligence can only emerge if a program goes beyond what the programmer foresaw.

Intelligent software explores its surroundings, learns from them and thus evolves beyond what it was originally intended for. In other words, it should behave like a living being. Adaptive software aims precisely at that. Think about it: today’s programs entirely reflect the intelligence of the programmer, who has to baby-sit the program for its lifetime. Tomorrow’s programmers will be able to let their creations loose-to let them go forth, prosper and multiply!

Team Digit

Team Digit is made up of some of the most experienced and geekiest technology editors in India! View Full Profile