MACHINE LEARNING ================ Introduction ------------ The most difficult and time comsuming step in developing an expert system is extracting the knowledge from the expert and putting it into the form(s) used by the expert system shell (usually rules). This is called the "Feigenbaum bottleneck" after the person who initially drew attention to it. The ultimate solution to this problem is the development and use of systems capable of learning. Machine learning is still very much a research area, and a very active one in AI, with a journal specializing in it; however, there is one area which is already well understood: inducing rules from examples. ---------- Rule Induction -------------- The algorithm shown below learns new rules from experience: loop Build up a state description -- this will act as the condition of a production; If there is a production whose condition unifies the state description, then fire it; otherwise, select an action; Evaluate the results of the production just fired; Induce new rules by finding generalizations of rules end loop Building up a state description can be done in a number of ways, the weakest and most general involving generate and test: select a ground literal (atomic formula without variables) relevant to the domain. Neither the ground literal nor its negation should be deducible from the ground literals already in the state description by standard theorem proving techniques. This is repeated until a suitable number of items are present in the state description -- as many as can be expected to be contained in WM. Selecting an action can be done by finding the production whose conditions most closely match WM, and making its action the action of a new production whose conditions are the contents of WM. If no existing production matches, select an action at random. After a goal has been achieved, or at some other convenient time, the system should apportion credit to the productions used in attempting to achieve the goal. Credit may be in the form of deleting the production if it is particularly bad, or attaching an explicit priority to the rule or (equivalently) changing its position in PM. This step is sometimes referred to as "post mortem" or "credit assignment". Generalizations can be made by two basic methods: (1) removing a condition of a production (2) replacing a constant by a variable Removing a condition can be done if several productions have the same action, and the same conditions except one, thus: p ^ q ^ x -> a p ^ q ^ y -> a ==> p ^ q -> a p ^ q ^ z -> a The production with the generalized condition should then be checked to ensure that it does not clash with other productions (two productions clash if one has a subset of the conditions of the other, in which case they could both be in the conflict set simultaneously. In this case, the least general production would apply. In the worst case, if two productions have exactly the same conditions but different actions, there is an inconsistency.) Replacing a constant by a variable can be done if several productions have the same condition except for a single atom p(a) -> a p(b) -> a => p(X) -> a p(c) -> a This basic learning process can be improved by the use of heuristics -- special metarules for guiding the learning process, or with expert guidance from the user. ---------- Concept Learning ---------------- Concept learning is achievable using a similar method to rule induction. A concept represented by a frame can equally be represented in FOPC as attribute(objectI, value1) ^ ... ^ attributeN(objectI, valueN) -> instance(objectI, concept) Learning of the concept is achieved by obtaining descriptions of examples of it which are "near misses", and of examples which only just satisfy the concept. People often learn new concepts from a single example. This suggests that they use knowledge to decide which attributes are significant and which are not. For example, even if you have never seen a caddisfly before, you could use analogy, combined with general knowedge about insects to guess that typical caddisflies have four wings, but that their colour is unimportant -- after being shown a single caddisfly, and noticing that it looks a little like a moth, since moths (and other insects) come in different colours, but the same type of insect generally has the same number of wings, e.g. flies have two, moths have four, fleas have none. ---------- Generalization, Specialization and Analogy ------------------------------------------ When a rule is generalized, it becomes applicable in more situations, but its "hit rate" of successful applications often diminishes. The opposite effect is true of specializing rules. Specialization can either be performed by adding new conditions or replacing a variable with a constant. A specialized rule is often more successful than the more general rule it was created from, but over a smaller area of applicability. Rules need not be specialized in exactly the opposite way from the way they were generalized, and thus new rules can be created which perhaps have a constant replaced by a different constant, or with a single condition replaced by another. This process of generalization and subsequent specialization in a different direction is equivalent to analogy. ---------- Analogical Problem Solving -------------------------- Macrops -- short for "macro operators", can be generated as a by product of the planning process. They are simply sequences of productions which can be used consecutively for achieving a particular plan, or part of a plan. Example: goal = on(a, b) initial state = on(b, a) ^ ontable(a) ^ clear(b) ^ handempty solution sequence = unstack(b, a), putdown(b), pickup(a), stack(a, b) The macrop can be retrieved whenever the current goal matches the goal of the macrop, in which case the new goal is to achieve the initial state in which the macrop applies. A major problem with this approach is that the number of macrops can become so large that checking each of them in turn for appropriateness can in fact be slower than regenerating the plan from scratch. If this happens, heuristic knowledge again becomes essential. Another problem is that the path constraints (goals that must be satisfied throughout the path) may vary, even though the initial state and the goal be the same. Because of these problems, analogical reasoning may be valuable where the new problem varies only slightly from the original problem. For example, supposing that the initial state contains on(a, c) instead of ontable(a). Then, the original macrop would not be valid. However, simply substituting unstack(a, c) for pickup(a) would enable a slight variant of the original macrop to apply. This change could be effected by applying the metarule IF there is a macrop whose initial state has ontable(a) where the initial state of the current problem contains on(a, X) AND the initial state does not otherwise differ from the current state AND the macrop has a solution sequence containing pickup(a) THEN replace pickup(a) by unstack(a, X) Carbonell (in Machine Learning, an Artificial Intelligence Approach), advocates the use of MEA in an "analogy transform problem space" or "T-space" as a method of analogical reasoning. T-space consists of nodes which represent solution sequences, connected by arcs representing T-operators. Carbonell gives the following list of T- operators: Insert a new operator. Delete an operator. Splice in a solution to a new subproblem. Replace an operator. Add operators to either the start or end of the solution sequence. Merge two solutions. Reorder operators. Substitute values for parameters. Eliminate unnecessary operators. Invert the sequence. The last one could be contained in the metarule IF there is a macrop whose goal matches the current state AND the initial state of the macrop matches the current goal THEN invert the solution sequence by reversing it and then substituting each operator by its inverse It should be noted that the nodes in T-space do not necessarily correspond to possible solutions to any problems. Application of MEA in T-space involves the reduction of a difference vector consisting of the four components: (1) the difference between initial states in the original space, (2) the difference between goal states in the original space, (3) the difference between the path constraints, and (4) the applicability of the old solution in the new situation, i.e. the degree to which the initial state of the old solution matches the initial state in the current problem. ---------- LEX -- A Program that Learns from Experience -------------------------------------------- LEX is a program which learns problem-solving heuristics in integral calculus. It was developed by Tom Mitchell. It learns by generating problems to solve, solving them, and examining its solution. The algorithm it uses is: loop generate problem (heuristics) -> a problem; solve problem (problem) -> a search tree; critique problem solution (search tree) -> +ve and -ve instances; refine heuristics (heuristics, +ve and -ve instances) -> new or better heuristics end loop; The heuristics take the form version space -> action The version space contains possible conditions for which the action is appropriate, ranging from the most specific (originally the situation where the heuristic first worked), to the most general (originally, all situations). In practice, the version space is completely specified by just the most specific and most general instances, e.g. most specific | 3x cos(x) dx most general | f(x) g(x) dx -> use integration by parts The problem solver accepts a problem as input, and uses search, guided by the heuristics, to produce a search tree. The search tree will contain a successful path, with unsuccessful branches leading off this. The problem solver has resource limits (the most important being run time) imposed upon it. The critic accepts the search tree produced by the problem solver, and assigns credit (or blame) to the heuristics used in solving the problem. The output of the critic is the set of positive and negative instances of heuristic application. Rather than simply attaching blame to the unexpanded branches, the critic has the option of calling upon the problem solver to check whether their expansion leads to a solution. The generalizer uses the positive and negative instances to make the heuristics used more exact, i.e. to try to make the most general instance of the heuristic more specific, and the most specific instance more general, until the two instances converge. The problem generator generates problems which can then be used by the rest of the system to produce new heuristics or reduce the version space of existing ones. LEX was developed beyond the program described here. In particular, a later version was goal-directed. It was found that the system's performance improved when it had explicit knowledge of its learning goal. ----------