Discovery Learning -- AM and Eurisko ==================================== Introduction ------------ AM and Eurisko are frame-based systems which learn by discovery. Both systems were developed by Doug Lenat. AM was developed in 1976 in Interlisp, and ran on a Digital PDP-10. Its longest run was 1 CPU hour. It carried out research in elementary mathematics, and discovered, along with many others, the following concepts: Subsets Disjoint sets Sets with the same number of elements Numbers Addition, subtraction, multiplication Doubling, halving Squaring, square roots Exponentiation Factors Unique prime factorization Goldbach's conjecture Pythagorean triples Maximally divisible numbers ---------- Design of AM ------------ AM's concepts were represented as frames. AM was initially supplied with 100 concepts, most of whose slots (Lenat calls them facets) were unfilled. AM was to fill in the other slots, and create new concepts. AM decided what to do next by maintaining and using an agenda of promising tasks. The task with the highest priority was worked on until its resources were used up, after which it was removed from the agenda. Tasks contained an activity, such as "check" or "fill in", and referred to a facet of a concept. In addition, they contained a set of reasons, which determined their priority, which in turn determined their place on the agenda. When a task was selected for execution, the facet of the concept it mentioned, together with the same facet on the concept's generalizations (which formed a hierarchy), were inspected for a subfacet mentioning the activity of the task. The subfacets contained heuristics (forward chaining IF-THEN rules) which were coded directly in LISP. (Each heuristic had been attached to the most general concept to which it was potentially relevant.) The heuristics which were found during the search of the generalization hierarchy were sorted in order of increasing generality (i.e. the search was breadth first). They were then run, as a production system, until the task's resources (calculated from its priority) were used up. AM had about 250 heuristics. When a heuristic was executed, all of its actions were performed. The actions either (1) Suggested a new task to add to the agenda, (2) Dictated how some new concept was to be defined, or (3) Added some entry to some facet of some concept. New tasks were added to the agenda along with the reasons for proposing them. If the task was already on the agenda but for different reasons, its priority was increased. ---------- A Sample Run ------------ Most accounts of AM given in AI textbooks print a "sanitized" version of a sample run. Here, part of an actual run is printed, warts and all. Here, AM is generalizing an existing concept -- equality -- into two new concepts, "same size" and "same first element", by modifying the LISP code for equality. Same size turned out to be important, and led to AM discovering numbers. The user interface was not considered important. (Source: Knowledge-Based Systems in Artificial Intelligence, by Randall Davis and Douglas B. Lenat) Beginning 9th cycle. Considering genlizing a recursive defn of OBJ-equal Will try to remove a conjunct. 2 possible conjuncts to choose from. AM generalizes OBJ-EQUAL into the new concept GENL-OBJ-EQUAL, by not recursing on the CAR of each arg. i.e. GENL-OBJ-EQUAL will not have a recursive check like this one, which is present in OBJ-EQUAL: (APPLYB (QUOTE OBJ-EQUAL) (QUOTE DEFN) (CAR BA1) (CAR BA2)) AM generalizes OBJ-EQUAL into the new concept GENL-OBJ-EQUAL-1, by not recursing on the CDR of each arg. i.e. GENL-OBJ-EQUAL-1 will not have a recursive check like this one, which is present in OBJ-EQUAL: (APPLYB (QUOTE OBJ-EQUAL) (QUOTE DEFN) (CDR BA1) (CDR BA2)) If any of (GENL-OBJ-EQUAL GENL-OBJ-EQUAL-1) ever seems to be too specialized, AM will consider disjoining it with other members of that set. Filled in generalizations of OBJ-EQUAL. 0 generalizations existed originally on OBJ-EQUAL. 2 potential entries were just proposed. Eliminating duplicates, the newly constructed entries are: GENL-OBJ-EQUAL GENL-OBJ-EQUAL-1 After eliminating duplicate and already-known entries, AM finds that all 2 new, distinct generalizations of OBJ-EQUAL had to be added. Do-thresh raised from 264 to 355. This cand used 6.667 cpu seconds. The top 3 Cands are: 1: Fill in some examples of Genl-obj-equal-1 2: Fill in some examples of Genl-obj-equal 3: Fill in some examples of Oset-struc I choose first Cand. OK? YES The reason is: (The generalization GENL-OBJ-EQUAL-1 of OBJ-EQUAL is relatively new and has no exs of its own yet, excepting those of OBJ-EQUAL) Beginning 9th cycle. ?: N Rename which existing concept? GENL-OBJ-EQUAL What is its new name? SAME-SIZE Done. ---------- A Typical AM Concept -------------------- NAME: Prime Numbers DEFINITIONS: ORIGIN: Number-of-divisors-of(x) = 2 PREDICATE-CALCULUS: Prime(X) = (VZ)(Z | X -> Z = 1 # Z = X) ITERATIVE: (for x > 1): for i from 2 to x - 1, ~(i | x) EXAMPLES: 2, 3, 5, 7, 11, 13, 17 BOUNDARY: 2, 3 BOUNDARY-FAILURES: 0, 1 FAILURES: 12 GENERALIZATIONS: Numbers, Numbers with an even number of divisors, Numbers with a prime number of divisors SPECIALIZATIONS: Odd-primes, prime-pairs, Prime-uniquely-addables CONJECS: Unique factorization, Goldbach's conjecture, Extremas of divisors-of ANALOGIES: Maximally divisible numbers are converse extremes of divisors-of INTEREST: Conjec's tying prime to times, to divisors of, to closely related operations WORTH: 800 ---------- A Typical Entry on the Agenda ----------------------------- ACTIVITY: FILL-IN some entries FACET: for the GENERALIZATIONS facet CONCEPT: of the PRIMES concept REASONS: because (1) There is only one known generalization of Primes so far. (2) The worth rating on Primes is now very high. (3) Focus of attention: AM just worked on Primes. (4) Very few numbers are Primes; a slightly more plentiful concept may be more interesting. PRIORITY: 350 [on a scale 0-1000] ---------- A Typical AM Heuristic ---------------------- IF the current task was "Fill in examples of X" for some predicate X AND over 100 items are known in the domain of X AND at least 10 CPU seconds have been spent so far on this task AND X has returned True at least once AND X has returned False over 20 times as often as True THEN add the following task to the agenda: "Fill in generalizations of X" for the following reason: "X is rarely satisfied; a slightly less restrictive predicate might be much more interesting" This has a rating which is the False/True results ratio. ---------- Limitations of AM ----------------- AM would, on a good run, execute about 200 tasks before running out of memory. During a good run, about 200 new concepts would get created, of which 25 were estimated by Lenat to be good, and a further 100 acceptable. The performance of the program (the proportion of concepts created which were good) degraded towards end of a run. AM received no help from the user, other than occasionally renaming a concept (which affected the readability of the program's output, but not the concepts it discovered. The reason why AM's performance eventually degraded was because it was unable to create new heuristics, which meant that it was forced to use heuristics further and further away from the concepts it was creating and filling in. For example, when it worked on prime- pairs (pairs of primes separated by 2, e.g. 17 and 19), it was forced to use heuristics originally developed for set union. While these heuristics were still valid, they were too general to deal with concepts such as primes and arithmetic. PRIME-PAIRS | +---------+---------+ | | PRIMES | | | DIVISORS-OF | | | MULTIPLICATION | | | ^ +---------+---------+ | | | ADDITION (Synthesized by AM) | UNION (supplied by Lenat) The other limitations pertained to missing mathematical knowledge. AM never developed the concepts "infinity" or "proof". ---------- Eurisko ------- Eurisko is the successor of AM. It learns new domain concepts and heuristics -- in fact, heuristics are considered to be another domain. It has been applied to the following domains (topics): Wargames Elementary Mathematics LISP Programming VLSI Design Biological Evolution Games (Noughts and Crosses, Go) Heuristics Knowledge Representation Oil Spill Cleanups It is written in a mixture of RLL (a frame language) and Interlisp-D, and runs on a Xerox 1100 LISP Machine. Eurisko differs from AM in the following respects: (1) Each kind of slot has a unit describing it. (2) The control structure of Eurisko is represented as part of its knowledge base -- Eurisko can modify its own code. (3) Eusisko has a separate agenda for each topic, and can create, merge and eliminate agendas. (4) Each heuristic is itself a concept, with different types of IF and THEN slots, e.g. IF-POTENTIALLY-RELEVANT IF-RESOURCES-AVAILABLE THEN-ADD-TASK-TO-AGENDA THEN-CONJECTURE (5) Eurisko always has the initiative -- the user can request, but never demand. ---------- Domains Suitable for Discovery Learning Programs ------------------------------------------------ The domain should have the following characteristics: (1) It should be as little explored as possible. (2) There must be a way to simulate -- or directly carry out -- experiments. (3) The search space should be too immense for other methods to work. (4) There should be many objects, operators, kinds of objects and kinds of operators. They should be related hierarchically and in other ways. (5) The task domain should be rich in heuristic structure. (6) There must be ways to generate, to prune, and to evaluate. (7) The language used to represent concepts must be appropriate, given the set of objects and operators. ----------