Collection of Education

home *** CD-ROM | disk | FTP | other *** search

/ Collection of Education / collectionofeducationcarat1997.iso / COMPUSCI / DATAREP.ZIP / DEMOS.RNO < prev next >

Wrap

Text File | 1990-02-22 | 79KB | 1,939 lines

.control characters .page size 60 .left margin 8 .right margin 73 .subtitle_ _ _ _ _ _ _ _ Anderson, Parallel System .no flags accept *"1 .c 80 Cognitive Capabilities of a Parallel System .b .c 80 James A. Anderson .b .c 80 Center for Neural Science, .c 80 Department of Psychology, .c 80 and Center for Cognitive Science .b .c 80 Brown University .b .c 80 Providence, RI 02912 .b .c 80 U.S.A. .b .c 80 March 3, 1985 .b2 .c 80 Paper presented at .c 80 Nato Advanced Research Workshop .b .c 80 ^&Disordered Systems and Biological Organization\& .b .c 80 Centre de Physique des Houches .b .c 80 74310 Les Houches, France .b4 .c 80 Abstract .p A number of parallel information processing systems have been proposed which are loosely based on the architecture of the nervous system. I will describe a simple model of this kind that we have studied over the the past decade. Perhaps surprisingly, the major testable predictions of these systems fall in the realm of cognitive science: parallel, distributed, associative systems seem to have pronounced 'psychologies'. They perform some 'computations' very well and some very poorly. It then becomes a psychological question as to whether humans show the same pattern of errors and capabilities. I will briefly describe the theory behind the models, and discuss some of the psychological predictions they generate. I will describe the results of large simulations of them. Specifically, I will discuss psychological concept formation, generation of semantic networks in distributed systems, and use of the systems as distributed data bases and somewhat simple minded expert systems. .page .c 80 Any mental process must lead to error. .i34 Huang Po (9th c.) .p This paper is about a psychological model that makes contact with some current work in parallel computation and network modelling. Most of its applications have been to psychological data. It attempts to be an interesting psychological model in that it learns, retrieves what it learned, processes what it learns, and shows hesitations mistakes and distortions. The scientific interest in the approach is in the claim that the patterns of errors shown by the model bear a qualitative similarity to those shown by humans. .p Many of the talks at this conference will talk about similar models, viewed from a different orientation. There has been a recent burst of enthusiasm for parallel, distributed, associative models as ways of organizing powerful computing systems and of handling noisy and incomplete data. There is no doubt such systems are effective at doing some extremely interesting kinds of computations, almost certainly they are intrinsically better suited to many kinds of computations than traditional computer architecture. .p However such architectures have very pronounced 'psychologies' and though they do some things well, they do many things extremely poorly, and can cause 'errors' as a result of satisfactory operation. When one talks about psychological systems, the idea of error becomes rather problematical: one of the tasks of a biological information processing system is to simplify (i.e. distort) the world so complex and highly variable events fall into equivalence classes and can be joined with other events to generate appropriate responses. Psychological ideas like concepts can be viewed as essential simplifications: deciding what data can be ignored and what is essential. .p The brain can be viewed as an engineering solution to a series of practical problems posed by nature. It must be fast and right much of the time. Solutions can be 'pretty good': a pretty good fast solution makes often more biological sense than an optimal slow solution. There is a strong bias toward action, as many have noted. .p If one was able to construct a computing system that mimicked human cognition we might not be too pleased with the results. It is possible that a brain-like computing system would show many of the undesirable features of our own minds: gross errors, unpredictablity, instability, and even complete failure. However, such a system might be a formidable complement to a traditional computer because it could then have the ability to make the good guesses, the hunches, and the suitable simplifications of complex system that are lacking in traditional computer systems but at which humans seem to excel. .p I will describe below a consistent approach to building a parallel, distributed associative model and point out some of the aspects of its psychology that should concern those concerned with such systems from a different perspective. Several examples of cognitive computations using the system will be given: a distributed antibiotic data base, an example of qualtitative physics, and an example of a distributed system that acts like a semantic network. .p ^&Stimulus Coding and Representation.\& We have many billion neurons in our cerebral cortex. The cortex is a layered two dimensional system which is divided up into a moderate number (say 50) of subregions. The subregions project to other subregions over pathways which are physically parallel, so one group of a large number of neurons projects to another large group of neurons. .p It is often not appreciated how much of what we can perceive depends on the details of the way the nervous system converts information from the physical world into discharges of nerve cells. If it is important for us to be able to see colors, or line segments, or bugs (if we happen to be a frog), then neurons in parts of the nervous system will respond to color, edges, etc. Many neurons will respond to these properties, and the more important the property, the more neurons potentially will have their discharges modified by that stimulus property. .p My impression is that much, perhaps most, of the computational power of the brain is in the details of the neural codes, i.e. the biologically proven representation of the stimulus. Perhaps the brain is not very smart. It does little clever computation but powerful, brute force operations on information that has been so highly processed that little needs to be done to it. However the pre-processing is so good, and the numbers of elements so large that the system becomes formidable indeed. .p Our fundamental modelling assumption is that information is carried by the set of activities of many neurons in a group of neurons. This set of activities carries the meaning of whatever the nervous system is doing. Percepts, or mental activity of any kind, are similar if their state vectors are similar. Formally, we represent these sets of activities as state vectors. Our basic approach is to consider the state vectors as the primitive entities and try to see how state vectors can lawfully interact, grow and decay. The elements in the state vectors correspond to the activities of moderately selective neurons decribed above: in the language of pattern recognition we are working with state vectors composed of great numbers of rather poor features. Information is represented as state vectors of large dimensionality. .p ^&The Linear Associator.\& It is easy to show that a generalized synapse of the kind first suggested by Donald Hebb in 1949, and called a 'Hebb' synapse, realizes a powerful associative system. Given two sets of neurons, one projecting to the other, and connected by a matrix of synaptic weights A, we wish to associate two activity patterns (state vectors) f and g. We assume A is composed of a set of modifiable 'synapses' or connection strengths. We can view this as a sophisticated stimulus-response model. .p We make two quantitiative assumptions. First, the neuron acts to a first approximation like a linear summer of its inputs. That is, the i^&th\& neuron in the second set of neurons will display activity g(i) when a pattern f is presented to the first set of neurons according to the rule, .literal g(i) = N S A(i,j) f(j). j .end literal where A(i,j) are the connections between the i^&th\& neuron in the second set of neurons and the j^&th\& neuron in the first set. Then we can write g as the simple matrix multiplication .literal g = A f. .end literal .p Our second fundamental assumption involves the construction of the matrix A, with elements A(i,j). We assume that these matrix elements (connectivities) are modifiable according to the generalized Hebb rule, that is, the change in an element of A, NdA(i,j), is given by .literal NdA(i,j) NO f(j) g(i). .end literal .p Suppose initially A is all zeros. If we have a column input state vector f, and response vector g, we can write the matrix A as .test page 5 .literal T A = Nf g f .end literal where Nf is a learning constant. Suppose after A is formed, vector f is input to the system. A pattern g' will be generated as the output to the system according to the simple matrix multiplication rule discussed before. This output, g', can be computed as .test page 6 .literal g' = A f, NO g, .end literal since the square of the length is simply a constant. Subject to a multiplicative constant, we have generated a vector in the same direction as g. This model and variants have been discussed in many places. It is powerful, but has some severe limitations. (Anderson, 1970; see especially Kohonen (1977, 1984)). .p ^&Categorization\& The model just discussed can function as a simple categorizer by making one assumption. Let us make the coding assumption that the activity patterns representing similar stimuli are themselves similar, that is, their state vectors are correlated. This means the inner product between two similar patterns is large. Now consider the case described above where the model has made the association f NY g. Let us restrict our attention to the magnitude of the output vector that results from various input patterns. With an input pattern f' then .literal (output pattern) = g [f,f'] .end literal If f and f' are not similar, their inner product [f, f'] is small. If f is similar to f' then the inner product will be large. The model responds to input patterns based on similarity to f. Suggests that the perceived similarity of two stimuli should be systematically related to the inner product [f,f'] of the two neural codings. This is a testable prediction in some cases. Knapp and Anderson, (1984) discuss an application of this simple approach to psychological concept formation, specifically the learning of 'concepts' based on patterns of random dots. .p There are two classes of simple concept models in psychology. The form a model for concept learning takes depends on an underlying model for memory structure. Two important classes of psychological models exist: 'exemplar' models where details of single presentations of items are stored and 'prototype' models where a new item is classified according to its closeness to the 'prototype' or best example of a category. .p Consider a situation where a category contains many similar items. Here, a set of similar activity patterns (representing the category members) becomes associated with the same response, for example, the category name. It is convenient to discuss such a set of vectors with respect to their mean. Let us assume the mean is taken over all potential members of the category. .p Specifically consider a set of correlated vectors, {f}, with mean p. Each individual vector in the set can be written as the sum of the mean vector and an additional noise vector, d, representing the deviation from the mean, that is, .literal f = p + d . i i .end literal .p If there are n different patterns learned and all are associated with the same response the final connectivity matrix will be .test page 5 .literal n T A = NS g f i=1 .end literal .test page 4 .literal T n = n g p + NS d i=1 i .end literal .p Suppose that the term containing the sum of the noise vector is relatively small, as could happen if the system learned many randomly chosen members of the category (so the d's cancel on the average and their sum is small) and/or if d is not very large. In that case, the connectivity matrix is approximated by .test page 5 .literal T A = n g p . .end literal .i0 The system behaves as if it had repeatedly learned only one pattern, p, the mean of the set of vectors it was exposed to. Under these conditions, the simple association model extracts a the prototype just like an average response computer. In this respect the distributed memory model behaves like a psychological 'prototype' model, because the most powerful response will be to the pattern p, which may never have been seen. This results is seen experimentally under appropriate conditions. .p However if the sum of the d's is not relatively small, as might happen if the system only sees a few patterns from the set and/or if d is large, the response of the model will depend on the similarities between the novel input and each of the learned patterns, that is, the system behaves like an psychological 'exemplar' model. This result can also be demonstrated expermentally. We can predict when one or the other result can be seen. .p Next, consider what happens when members of more than one category can occur. Suppose the system learns items drawn from three categories with means of pN1, pN2, and pN3 respectively, and responses gN1, gN2, and gN3, with n exemplars presented from each category. Then, if an input f, is input to A, if the distortions of the prototypes presented during learning are small, the output can be approximated by .literal Af = n([pN1,f]gN1+[pN2,f]gN2+[pN3,f]gN3) .end literal Due to superposition (this is a linear system) the actual response pattern is a sum of the three responses, weighted by the inner products. If the p are dissimilar, the inner product between an exemplar of one prototype and the other prototypes is small on the average, and the admixture of outputs associated with the other categories will also be small. We describe a non-linear categorizer (the BSB model) below which will allow us to supress the other responses entirely. Again, observe the details of the neural codings determine the practical categorization ability of the system. .p We can also begin to see how the system can use partial information to reason 'cooperatively'. Suppose we have a simple memory formed which has associated an input fN1 with two outputs, gN1 and gN2, and an input f2 with two outputs gN2 and gN3 so that .literal AfN1 = gN1 + gN2 and AfN2 = gN2 + gN3. .end literal Suppose we then present fN1 and fN2 together. Then, we have .literal A(fN1 + fN2) = gN1 + 2gN2 + gN3, .end literal with the largest weight for the common association. This perfectly obvious consequence of superposition has let us pick out the common association of fN1 and fN2, if we can supress the spurious responses. .p The cooperative effects described in several contexts above depend critically on the linearity of the memory since things 'add up' in memory. We will suggest below that it is very easy to remove the extra responses due to superposition. We want to emphasize that it is the ^&linearity\& that gives rise to most of the easily testable psychological predictions (many of which can be shown to be present, particularly in relation to simple stimuli) and it is the ^&non-linearity\& that has the job of cleaning up the output. .p ^&Error Correction.\& The simple linear associator works, and is effective in making some predictions about concept formation and cooperativity. However it generates too many errors for some applications: that is, given a learned association f NY g, and many other associations learned in the same matrix, the pattern generated when f is presented to the system may not be close enough to g to be satisfactory. By using an error correcting technique related to the Widrow-Hoff procedure, also called the 'delta method', we can force the system to give us correct associations. Suppose information is represented by vectors associated by fN1 NY gN2, fN2 NY gN2 ... We wish to form a matrix A of connections between elements to accurately reconstruct the association. The matrix can then be formed by the following procedure: First, a vector, f, is selected at random. Then the matrix, A, is incremented according to the rule .b .literal T NDA = Nf (g - Af) f .end literal where NDA is the change in the matrix A and where the learning coefficient, Nf, is chosen so as to maintain stability. The learning coefficient can either be 'tapered' so as to approach zero when many vectors are learned, or it can be constant, which builds in a 'short term memory' because recent events will be recalled more accurately than past events. The method is sometimes called the delta method because it is learning the difference between desired and actual responses. As long as the number of vectors is small (less than roughly 25% of the dimensionality of the state vectors) this procedure is fast and converges in the sense that after a period of learning, .literal Af = g. .end literal New information can be added at any time by running the algorithm for a while with the new information vectors added to the vector set. .p If f = g, the association of a vector with itself is referred to by Kohonen as an 'autoassociative' system. One way to view the autoasociative system is that it is forcing the system to develop a particular set of eigenvectors. Suppose we are interested in looking at autoassociative systems, .literal T A = Nf f f .end literal where Nf is some constant. .p We can use feedback to reconstruct a missing part of an input state vector. To show this, suppose we have a normalized state vector f, which is composed of two parts, say f' and f'', i.e. f = f' + f''. Suppose f' and f'' are orthogonal. One way to accomplish this would be to have f' and f'' be subvectors that occupy different sets of elements -- say f' is non-zero only for elements [1..n] and f'' is non-zero only for elements [(n+1)..Dimensionality]. .p Then consider a matrix A storing only the autoassociation of f that is .literal T A = (f' + f'') (f' + f''), .end literal (Let us take Nf = 1). .p The matrix is now formed. Suppose at some future time a sadly truncated version of f, say f' is presented at the input to the system. .p The output is given by .literal (output) = A f' T T T T = (f' f' + f' f'' + f'' f' + f'' f'' ) f'. .end literal Since f' and f'' are orthogonal, .literal (output) = (f' + f'') [f', f']. = c f .end literal .b where c is some constant since the inner product [f',f'] is simply a number. The autoassociator can reconstruct the missing part of the state vector. Of course if a number of items are stored, the problem becomes more complex, but with similar qualitative properties. .p Let us use this technique practically. When the matrix, A, is formed, one way information can be retrieved is by the following procedure. It is assumed that we want to get associated information that we currently do not have, or we want to make 'reasonable' generalizations about a new situation based on past experience. We must always have some information to start with. The starting information is represented by a vector constructed according to the rules used to form the original vectors, except missing information is represented by zeros. Intuitively, the memory, that is the other learned information, is represented in the cross connections between vector elements and the initial information is the key to get it out. The retrieval strategy will be to repeatedly pass the information through the matrix A and to reconstruct the missing information using the cross connections. Since the state vector may grow in size without bound, we limit the elements of the vector to some maximum and minimum value. .p We will use the following nonlinear algorithm. Let f(i) be the current state vector of the system. f(0) is the initial vector. Then, let f(i+1), the next state vector be given by .b .literal f(i+1) = LIMIT [ Na A f(i) + Ng f(i) + Nd f(0) ]. .end literal The first term (Na A f(i) ) passes the current state through the matrix and adds more information reconstructed from cross connections. The second term Ng f(i) causes the current state to decay slightly. This term has the qualitative effect of causing errors to eventually decay to zero as long as Ng is less than 1. The third term, Nd f(0) can keep the initial information constantly present if this needed to drive the system to a correct final state. Sometimes this term is Nd is zero and sometimes Nd is non-zero depending on the requirements of the task. .p Once the element values for f(i+1) are calculated, the element values are 'limited'. This means that element values cannot be greater than an upper bound or lower than a lower bound. If the element values of f(i+1) have values larger than or smaller than upper and lower bounds they are replaced with the upper and lower bounds repespectively. This process contains the state vector within a set of limits, and we have called this model the 'brain state in a box' or BSB model. As is typical of neural net models in general, the actual computations are simple, but the computer time required may be formidable. If one likes sigmoidal functions, then this is a sigmoid with sharp corners: a linear region between limits. .p Because the system is in a positive feedback loop but is limited, eventually the system will become stable and will not change. This may occur when all the elements are saturated or when a few are still not saturated. This final state will be the output of the system. The final state can be interpreted according to the rules used to generate the stimuli. This state will contains the directed conclusions of the information system. It will have filled in missing information, or suggested information based on what it has learned in the past, using the cross connections represented in the matrix. The dynamics of this system are closely related to the 'power' method of eigenvector exctraction. .p It is at this point that the connection of this model with Boltzmann type models becomes of interest. We have showed in the past (Anderson, Silverstein, Ritz, and Jones, 1977) that in the simple case where the matrix is fully connected (symmetric by the learning rule in the autoassociative system) and has no decay, that the vector will monotonically lengthen. We would like to point out that the dynamics of this system are nearly identical to those used by Hopfield for continuous valued systems. (1984) It is one member of the class of functions he discusses, and can be shown to be minimizing an energy function if that is a useful way to analyze the system. In the more general autoassociative case, where the matrix is not symmetric because of limited connectivity (i.e. some elements are identically zero) and/or there is decay, the system can be shown computationally to be minimizing a quadratic energy function (Golden, 1985). In the simulations to be described the Widrow-Hoff technique is used to 'learn the corners' of the system, thereby ensuring that the local energy 'minima' and the associated responses will coincide. .p The information storage and retrieval system just described can be used to realize a data base system that hovers on the fringes of practicality. It is important to emphasize that this is not an information storage system as conventionally implemented. It is poor at handling precise data. It also does not make efficient use of memory in a traditional serial computer. There are several parameters which must be adjusted. Also the output may not be 'correct' in that it may not be a valid inference or it may contain noise. This is the penalty that one must pay for the inferential and 'creative' aspects of the system. .p ^&Example One: A Data Base.\& In the specific examples of state vector generation that we will use for the examples, English words and sets of words are coded as concatenations of the bytes representing their ASCII representation. A parity bit is used. Zeros area replaced with minus ones. (I.e. an 's', ASCII 115, is represented by -1 1 1 1 -1 -1 1 1 in the state vector.) A 200 dimensional vector would represent 25 alphanumeric characters. This is a 'distributed' coding because a single letter or word is determined by a pattern of many elements. It is arbitrary but it gives useful demonstrations of the power of the approach. In the outputs from the simulations the underline, '_', corresponds to all zeros or to an uninterpretable character whose amplitude is below an interpretation threshold. That is, the output strings presented are only those of which the system is 'very sure' because their actual element values were all above a high threshold. The threshold is only for our convenience in interpreting outputs and the full values are used in the computations. Vectors using distributed codings formed by a technique that Hinton calls 'coarse coding' would be a little more reasonable biologically but outputs would be more difficult to interpret. .p Information in AI systems are often represented as collections of atomic facts, relating pairs or small sets of items together. However, as William James commented in 1890, .b .left margin 18 .right margin 63 ... ^&the more other facts a fact is associated with in the mind, the better posession of it our memory retains.\& Each of its associates becomes a hook to which it hangs, a means to fish it up by when sunk beneath the surface. Together, they form a network of attachments by which it is woven into the entire tissue of our thought. .right William James (1890). p. 301. .left margin 8 .right margin 73 .p As the quotation suggests, information is usefully represented as large state vectors containing large sets of correlated information. Each state vectors contains a large number of 'atomic facts' together with their connection, so it is hard to specify the exact information capacity of the system. .p As a simple example of a distributed data base, a small (200 dimensional autoassociative system) was taught a series of connected facts about antibiotics and diseases. (See the Figures Drugs 1-5). This is a complex, real world data base in that one bacterium causes many diseases, the same disease is caused by many organisms, and a single drug may be used to treat many diseases caused by many organisms. .p Figure Drugs-2 and -3 show simple retrieval of stored information. The data base also 'guesses'. When it was asked what drug should be used treat a meningitis caused by a Gram positive bacillus, it responded penicillin even though it never actually learned about a meningitis caused by a Gram positive bacillus. (Figure Drugs-4) It had learned about several other Gram positive bacilli and that the associated diseases could be treated with penicillin. The final state vector contained penicillin as the associated drug. The other partial information cooperated to suggest that this was the appropriate output. This inference may or may not be correct, but it is reasonable given the past of the system. These inferential properties are expected, given the previous discussion. .p As a more complex example, the antibiotic test system was taught that hypersensitivity is a side effect of cephalosporin and that some kinds of urinary tract infection are caused by an organism that respond to cephalosporin. However it learned that other organisms not responding to cephalosporin cause urinary tract infections and that other antibiotics cause hypersensitivity. (See Figure Drugs-5) If the system was asked about either the side effect or the disease it gave one set of answers. If, however, it was asked about both pieces of information together, it correctly suggested cephalosporin as one antibiotic satisifying both bits of partial information. .p The number of iterations required to reconstruct the appropriate answer is a measure of certainty: large numbers of iterations either suggest the information is not strongly represented or the inference is weak, small numbers of iterations suggest the information is well represented or the inference is certain. .p This system behaves a little like an 'expert system' in that it can be applied to new situations. However it does not have formal codification of sets of rules. It potentially can learn from experience by extracting commonalities from a great deal of learned information, essentially (to emphasize this point again) due to the ^&linear\& interactions between stored information. The retrieval of information must contain non-linearities to supress spurious responses. .p These systems are highly parallel and would be very fast if implemented on parallel computers. Because information is stored as a matrix, two potentially useful side effects occur. First, the data is necessarily 'encrypted' in that it is not available in a meaningful form and each 'fact' is spread over many or all matrix elements and mixed together with other facts. Second, the learning phase makes by far the greatest CPU demands on the computer. Retrieval and inference are simply a small number of vector and matrix computations. It would be quite sensible to learn on a large machine, generate the matrix containing the information, and then use the matrix as a retrieval system on a much smaller computer. .p ^&Example Two: Qualitative Physics.\& There is a considerable interest among cognitive scientists in the generation of systems capable of 'intuitive' reasoning about physical systems. This is for several reasons. First, much human real world knowledge is of this kind: i.e. information is not stored in 'propositional' form but in a hazy 'intuitive' form is generated by extensive experience with real systems. (It is almost certain that much human reasoning, even about highly structured abstract systems is not of a propositional type, but of an 'a-logical' spatial, visual, or kinesthetic nature. See Davis and Anderson (1979) and, in particular, Hadamard (1949) for examples.) Second, this kind of reasoning is particularly hard to handle with traditional AI systems because of its highly inductive and ill-defined nature. It would be important to be able to model. Third, it is an area where distributed neural systems may be very effective as part of the system. Riley and Smolensky (1984) have described a 'Boltzmann' type model for reasoning about a simple physical system, and below we describe another. Fourth, I believe that the ideal model for reasoning about complicated real systems will be a hybrid: partly rule driven and partly 'intuitive'. .p For an initial test of these ideas we constructed a set of state vectors representing the functional dependencies found in Ohm's Law, for example, what happens to E when I increases and R is held constant. These vectors were in the form of quasi-analog codings. (Figure Ohms-1) The system was taught according to our usual techniques. The parameters of the system were unchanged from the drug data base simulation. .p The figures show that the system is capable of making the correct responses to novel combinations of parameters, (Ohms-3) if these combinations agree on their effects, another example of consensus reasoning, but cannot handle inconsistent situations (Ohms-4). .p ^&Example Three: Semantic Networks\& A useful way of organizing information is as a network of associated information. The five Network-Figures show a simple example of a computation of this type. Information is represented at 200 dimensional state vectors, constructed as strings of alphanumeric characters as before. .p By making associations between state vectors, one can realize a simple semantic network, an example of which is presented in Figure Network-1. Here each node of the network corresponds to a state vector which contains a related information, i.e. simultaneously present at one node (the leftmost, under 'Subset') is the information that a canary is medium sized, flies, is yellow and eats seeds. This is connected by an upward and a downward link to the BIRD node, which essentially says that 'canary' is an example of the BIRD concept, which has the name 'bird'. A strictly upward connection informs us that birds are ANMLs (with name 'animal'). The network contains three examples of fish, birds and animal species and several examples of specific creatures. For example, Charlie is a tuna, Tweetie is a canary and both Jumbo and Clyde are elephants. The specific set of associations that together are held to realize this simple network are given in Figure Network-2. These sets of assertions were learned using the Widrow-Hoff error correction rule. Two matrices were formed, one corresponding the associations of the state vectors with themselves (auto association) and one corresponding to the association of a state vector with a different state vector (true association). The matrices used were partially (about 50%) connected. .p When the matrix is formed and learning has ceased, the system can then be interrogated to see if it can traverse the network and fill in missing information in an appropriate way. Figures Network-3 and -4 show simple disambiguation, where the context of a probe input ('gry') will lead to output of elephant or pigeon. (Alan Kawamoto (1985) has done extensive studies of disambuation of networks of this kind, and made some comparisions with relevant psychological data. Kawamoto has generalized the model by adding adaptation as a way of destabilizing the system so it moves to new states as time goes on.) Another property of a semantic network is sometimes called 'property inheritance'. Figure Network-5 shows such a computation. We ask for the color of a large creature who works in the circus who we find out is Jumbo. Jumbo is an elephant. Elephants are gray. .p Parameters are very uncritical: they were unchanged for all the three examples presented here. In the network calculation, Mx 2 corresponds to the autoassociative matrix and Mx 1 corresponds to the true associative matrix. When the autoassociative system has reached a stable state, the true associator is applied for 5 iterations. This untidy assumption can easily be done away with by assuming proper time delays as part of the description of a synapse, but at present it is more convenient to keep it because it separates two distinct operations. Eventually this mechanism will be eliminated. .p This work is sponsored primarily by the National Science Foundation under grant BNS-82-14728, administered by the Memory and Cognitive Processes section. .left margin 8 .right margin 73 .flags accept .b2 .c 80 ^&References\& .b .p Anderson, J.A. Two models for memory organization using interacting traces. ^&Mathematical Biosciences.\&, &8, 137-160, 1970. .p Anderson, J.A. Cognitive and psychological computation with neural models. ^&IEEE Transactions on Systems, Man, and Cybernetics\&, ^&SMC-13\&, 799-815, 1983. .p Anderson, J.A. _& Hinton, G.E. Models of information processing in the brain. In G.E. Hinton _& J.A. Anderson (Eds.), ^&Parallel Models of Associative Memory.\& Hillsdale, N.J.: Erlbaum Associates, 1981. .p Anderson, J.A., Silverstein, J.W., Ritz, S.A. _& Jones, R.S. Distinctive features, categorical perception, and probability learning: Some applications of a neural model. ^&Psychological Review.\& &8&4, 413-451, 1977. .p Davis, P.J. _& Anderson, J.A. Non-analytic aspects of mathematics and their implications for research and education. ^&SIAM Review.\&, &2&1, 112-127, 1979. .p Geman, S. _& Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. ^&IEEE: Proceedings on Artificial and Machine Intelligence\&, &6, 721-741, November, 1984. .p Goodman, A.G., Goodman, L.S., _& Gilman, A. ^&The Pharmacological Basis of Theraputics. Sixth Edition.\& New York: MacMillan, 1980. .p Golden, R. Identification of the BSB neural model as a gradient descent technique that minimizes a quadratic cost function over a set of linear inequalities. Submitted for publication. .p Hadamard, J. ^&The Psychology of Invention in the Mathematical Field.\& Princeton, N.J.: Princeton University Press, 1949. .p Hinton, G.E. _& Sejnowski, T.J. Optimal pattern inference. ^&IEEE Conference on Computers in Vision and Pattern Recognition.\&. 1984. .p Hopfield, J.J. Neurons with graded response have collective computational properties like those of two-state neurons. ^&Proc. Natl. Acad. Sci. U.S.A.\&, &8&1, 3088-3092, 1984. .p Huang Po. ^&The Zen Teaching of Huang Po.\& (Trans. J. Blofield). New York: Grove Press, 1958. .p James, W. ^&Briefer Psychology.\& (Orig. ed. 1890). New York: Collier, 1964. .p Kawamoto, A. Dynamic Processes in the (Re)Solution of Lexical Ambiguity. Ph.D. Thesis, Department of Psychology, Brown University. May, 1985. .p Knapp, A.G. _& Anderson, J.A. Theory of categorization based on distributed memory storage. ^&Journal of Experimental Psychology: Learning, Memory, and Cognition.\&, &1&0, 616-637, 1984. .p Kohonen, T. ^&Associative Memory.\& Berlin: Springer, 1977. .p Kohonen, T. ^&Self Organization and Associative Memory.\& Berlin: Springer, 1984. .p Riley, M. S. _& Smolensky, P. A parallel model of sequential problem solving. ^&Proceedings of Sixth Annual Conference of the Cognitive Science Society.\& Boulder, Colorado: 1984. .page .left margin 3 .right margin 72 .no flags accept .page size 62 .c 80 Figure Drugs-1 .b .c 80 Database Information .b .literal F[ 1]. Staphaur+cocEndocaPenicil G[ 1]. Staphaur+cocEndocaPenicil F[ 2]. Staphaur+cocMeningPenicil G[ 2]. Staphaur+cocMeningPenicil F[ 3]. Staphaur+cocPneumoPenicil G[ 3]. Staphaur+cocPneumoPenicil F[ 4]. Streptop+cocScarFePenicil G[ 4]. Streptop+cocScarFePenicil F[ 5]. Streptop+cocPneumoPenicil G[ 5]. Streptop+cocPneumoPenicil F[ 6]. Streptop+cocPharynPenicil G[ 6]. Streptop+cocPharynPenicil F[ 7]. Neisseri-cocGonorhAmpicil G[ 7]. Neisseri-cocGonorhAmpicil F[ 8]. Neisseri-cocMeningPenicil G[ 8]. Neisseri-cocMeningPenicil F[ 9]. Coryneba+bacPneumoPenicil G[ 9]. Coryneba+bacPneumoPenicil F[10]. Clostrid+bacGangrePenicil G[10]. Clostrid+bacGangrePenicil F[11]. Clostrid+bacTetanuPenicil G[11]. Clostrid+bacTetanuPenicil F[12]. E.Coli -bacUrTrInAmpicil G[12]. E.Coli -bacUrTrInAmpicil F[13]. Enteroba-bacUrTrInCephalo G[13]. Enteroba-bacUrTrInCephalo F[14]. Proteus -bacUrTrInGentamy G[14]. Proteus -bacUrTrInGentamy F[15]. Salmonel-bacTyphoiChloram G[15]. Salmonel-bacTyphoiChloram F[16]. Yersinap-bacPlagueTetracy G[16]. Yersinap-bacPlagueTetracy F[17]. TreponemspirSyphilPenicil G[17]. TreponemspirSyphilPenicil F[18]. TreponemspirYaws Penicil G[18]. TreponemspirYaws Penicil F[19]. CandidaafungLesionAmphote G[19]. CandidaafungLesionAmphote F[20]. CryptocofungMeningAmphote G[20]. CryptocofungMeningAmphote F[21]. HistoplafungPneumoAmphote G[21]. HistoplafungPneumoAmphote F[22]. AspergilfungMeningAmphote G[22]. AspergilfungMeningAmphote F[23]. SiEfHypersensOralVPenicil G[23]. SiEfHypersensOralVPenicil F[24]. SiEfHypersensInjeGPenicil G[24]. SiEfHypersensInjeGPenicil F[25]. SiEfHypersensInjeMPenicil G[25]. SiEfHypersensInjeMPenicil F[26]. SiEfHypersensOralOPenicil G[26]. SiEfHypersensOralOPenicil F[27]. SiEfHypersensInje Cephalo G[27]. SiEfHypersensInje Cephalo F[28]. SiEfOtotoxic Inje Gentamy G[28]. SiEfOtotoxic Inje Gentamy F[29]. SiEfAplasticAInje Chloram G[29]. SiEfAplasticAInje Chloram F[30]. SiEfKidneys++Inje Amphote G[30]. SiEfKidneys++Inje Amphote F[31]. SiEfHypersensOral Ampicil G[31]. SiEfHypersensOral Ampicil .end literal .left margin 12 .right margin 68 .b .p A strictly autoassociative system can be used as a database. Here, state vectors correspond to information about antibiotics, bacteria, side effects, and other bits of information. The detailed information is taken from Goodman and Gilman (1980). Because only 25 characters are available, the codings are somewhat terse. If each pairwise fact relation in a single state vector is considered an 'atomic fact' there are several hundred facts in this database, though only 31 state vectors. .p The matrix used in the simulation used random presentation of state vectors for an average of about 40 presentations per item. The matrix was 50% connected: i.e. half the matrix elements were identically zero. .page .left margin 16 .right margin 72 .c 80 Drugs-2 .b .c 80 'Tell about fungal meningitis.' .b .left margin 15 .right margin 68 .literal Mx 2. 1. ________fungMening_______ Check: 80 ... Mx 2. 11. ________fungMening_m__ote Check: 85 Mx 2. 12. ________fungMening_m_hote Check: 90 Mx 2. 13. ________fungMeningAm_hote Check: 104 Mx 2. 14. ________fungMeningAm_hote Check: 126 Mx 2. 15. _s______fungMeningAm_hote Check: 131 Mx 2. 16. _s______fungMeningAm_hote Check: 137 Mx 2. 17. _s______fungMeningAm_hote Check: 147 Mx 2. 18. _s______fungMeningAm_hote Check: 152 Mx 2. 19. _s______fungMeningAmphote Check: 158 Mx 2. 20. _s______fungMeningAmphote Check: 162 Mx 2. 21. _s______fungMeningAmphote Check: 166 Mx 2. 22. _s______fungMeningAmphote Check: 170 Mx 2. 23. _s______fungMeningAmphote Check: 171 Mx 2. 24. _s______fungMeningAmphote Check: 171 Mx 2. 25. _s______fungMeningAmphote Check: 172 Mx 2. 26. _s______fungMeningAmphote Check: 172 Mx 2. 27. _s______fungMeningAmphote Check: 173 Mx 2. 28. _s______fungMeningAmphote Check: 174 Mx 2. 29. _s______fungMeningAmphote Check: 173 Mx 2. 30. _s______fungMeningAmphote Check: 173 Mx 2. 31. _s____i_fungMeningAmphote Check: 173 Mx 2. 32. _s____i_fungMeningAmphote Check: 173 Mx 2. 33. _s____i_fungMeningAmphote Check: 173 Mx 2. 34. _s____i_fungMeningAmphote Check: 173 Mx 2. 35. _s____i_fungMeningAmphote Check: 173 Mx 2. 36. _s____i_fungMeningAmphote Check: 173 Mx 2. 37. As____i_fungMeningAmphote Check: 174 Mx 2. 38. As____i_fungMeningAmphote Check: 176 Mx 2. 39. As____i_fungMeningAmphote Check: 178 Mx 2. 40. As__p_i_fungMeningAmphote Check: 179 Mx 2. 41. As__p_i_fungMeningAmphote Check: 181 Mx 2. 42. As__p_i_fungMeningAmphote Check: 182 Mx 2. 43. As__p_i_fungMeningAmphote Check: 182 Mx 2. 44. As__pgi_fungMeningAmphote Check: 185 Mx 2. 45. AspepgimfungMeningAmphote Check: 185 Mx 2. 46. AspepgimfungMeningAmphote Check: 186 Mx 2. 47. AspepgimfungMeningAmphote Check: 186 Mx 2. 48. Aspepgi_fungMeningAmphote Check: 188 Mx 2. 49. Aspe_gi_fungMeningAmphote Check: 190 Mx 2. 50. Aspe_gi_fungMeningAmphote Check: 191 Mx 2. 51. Aspe_gi_fungMeningAmphote Check: 193 Mx 2. 52. Aspe_gi_fungMeningAmphote Check: 194 Mx 2. 53. Aspe_gi_fungMeningAmphote Check: 195 Mx 2. 54. Aspe_gi_fungMeningAmphote Check: 197 Mx 2. 55. Aspe_gi_fungMeningAmphote Check: 198 Mx 2. 56. Aspe_gi_fungMeningAmphote Check: 198 Mx 2. 57. Aspe_gi_fungMeningAmphote Check: 198 Mx 2. 58. Aspe_gi_fungMeningAmphote Check: 198 Mx 2. 59. AspergilfungMeningAmphote Check: 198 Mx 2. 60. AspergilfungMeningAmphote Check: 198 .end literal .page .left margin 8 .right margin 68 .no flags accept .c 80 Caption for Figure Drugs-2 .b .p We use partial information combined with the reconstructive properties of the autoassociative system to get more information out of the system. The usual way we do this is to put in partial information and let the information at a node be reconstructed using feedback and the autoassociator. In the first stimulus (1) above, the '_' indicates zeros in the input state vector. Once the feedback starts working, '_' indicates a byte with one or more elements below interpretation threshold. .p The Mx notation indicates which matrix is in use by the program. In this case, only the autoassociative matrix is used. The number refers to the interation number, i.e. how often the state vector has passed through the matrix. 'Check' refers to to the number of elements in the state vector that are saturated at that iteration. It is a rough measure of length. It cannot get larger than 200. .p The appropriate antibiotic for fungal meningitis emerges early, because Amphotericin is used to treat all fungal diseases the system knows about. The specific organism takes longer, but is eventually reconstructed. Note the errors corrected in the later iterations (Aspepgim becomes Aspergil). .page .left margin 24 .right margin 80 .c Figure Drugs-3 .b .c 'What are the side effects of Amphotericin?' .b .left margin 15 .right margin 68 .literal Mx 2. 1. SiEf______________Amphote Check: 88 Mx 2. 2. SiEf______________Amphote Check: 88 Mx 2. 3. SiEf______________Amphote Check: 88 Mx 2. 4. SiEf______________Amphote Check: 88 Mx 2. 5. SiEf______________Amphote Check: 89 Mx 2. 6. SiEf______________Amphote Check: 91 Mx 2. 7. SiEf______________Amphote Check: 95 Mx 2. 8. SiEf______________Amphote Check: 98 Mx 2. 9. SiEf______________Amphote Check: 109 Mx 2. 10. SiEf______________Amphote Check: 124 Mx 2. 11. SiEf______________Amphote Check: 126 Mx 2. 12. SiEf______________Amphote Check: 131 Mx 2. 13. SiEf______________Amphote Check: 135 Mx 2. 14. SiEf_____y________Amphote Check: 137 Mx 2. 15. SiEf_____y________Amphote Check: 140 Mx 2. 16. SiEf_____y________Amphote Check: 144 Mx 2. 17. SiEf_____y___K__e_Amphote Check: 146 Mx 2. 18. SiEf__d__y__+K__e_Amphote Check: 150 Mx 2. 19. SiEf__d__y__+K__e_Amphote Check: 155 Mx 2. 20. SiEf__d_ey__+K__e_Amphote Check: 158 Mx 2. 21. SiEf__dney_++K__e_Amphote Check: 160 Mx 2. 22. SiEf_idneys++K__e Amphote Check: 164 Mx 2. 23. SiEf_idneys++K__e Amphote Check: 169 Mx 2. 24. SiEf_idneys++K__e Amphote Check: 174 Mx 2. 25. SiEf_idneys++K__e Amphote Check: 175 Mx 2. 26. SiEf_idneys++K__e Amphote Check: 178 Mx 2. 27. SiEfKidneys++K__e Amphote Check: 183 Mx 2. 28. SiEfKidneys++K__e Amphote Check: 187 Mx 2. 29. SiEfKidneys++K__e Amphote Check: 189 Mx 2. 30. SiEfKidneys++K__e Amphote Check: 190 Mx 2. 31. SiEfKidneys++___e Amphote Check: 190 Mx 2. 32. SiEfKidneys++_n_e Amphote Check: 191 Mx 2. 33. SiEfKidneys++_n_e Amphote Check: 191 Mx 2. 34. SiEfKidneys++_n_e Amphote Check: 192 Mx 2. 35. SiEfKidneys++_n_e Amphote Check: 192 Mx 2. 36. SiEfKidneys++_nje Amphote Check: 193 Mx 2. 37. SiEfKidneys++_nje Amphote Check: 193 Mx 2. 38. SiEfKidneys++_nje Amphote Check: 194 Mx 2. 39. SiEfKidneys++_nje Amphote Check: 195 Mx 2. 40. SiEfKidneys++_nje Amphote Check: 195 Mx 2. 41. SiEfKidneys++_nje Amphote Check: 196 Mx 2. 42. SiEfKidneys++_nje Amphote Check: 197 Mx 2. 43. SiEfKidneys++Inje Amphote Check: 199 Mx 2. 44. SiEfKidneys++Inje Amphote Check: 199 Mx 2. 45. SiEfKidneys++Inje Amphote Check: 199 Mx 2. 46. SiEfKidneys++Inje Amphote Check: 199 Mx 2. 47. SiEfKidneys++Inje Amphote Check: 199 Mx 2. 48. SiEfKidneys++Inje Amphote Check: 199 .end literal .left margin 12 .right margin 68 .p A prudent therapist checks side effects. Amphotericin has serious ones, involving the kidneys among other organs. .page .left margin 24 .right margin 80 .c Figure Drugs-4 .b .c 80 'Tell about Meningitis caused by Gram + bacilli.' .b .left margin 15 .right margin 68 .literal Mx 2. 1. ________+bacMening_______ Check: 80 Mx 2. 2. ________+bacMening_______ Check: 80 Mx 2. 3. ________+bacMening_______ Check: 80 Mx 2. 4. ________+bacMening_______ Check: 80 Mx 2. 5. ________+bacMening_______ Check: 80 Mx 2. 6. ________+bacMening_______ Check: 81 Mx 2. 7. ________+bacMening_______ Check: 82 Mx 2. 8. ________+bacMening_______ Check: 84 Mx 2. 9. ________+bacMening_______ Check: 85 Mx 2. 10. ________+bacMening_______ Check: 88 Mx 2. 11. ________+bacMening_______ Check: 90 Mx 2. 12. ________+bacMening_______ Check: 102 Mx 2. 13. _o_____`+bacMening______m Check: 125 Mx 2. 14. _o_____`+bacMening_e____m Check: 133 Mx 2. 15. _o_____`+bacMening_e____m Check: 135 Mx 2. 16. _o_____`+bacMening_e____m Check: 136 Mx 2. 17. Co_____`+bacMening_en__im Check: 139 Mx 2. 18. Co_____`+bacMening_en__im Check: 143 Mx 2. 19. Co_____`+bacMening_en__i_ Check: 145 Mx 2. 20. Co_____`+bacMening_en__i_ Check: 152 Mx 2. 21. Co_____`+bacMening_en__i_ Check: 155 Mx 2. 22. Co_____`+bacMening_eni_i_ Check: 156 Mx 2. 23. Co_____`+bacMening_enici_ Check: 160 Mx 2. 24. Co_____`+bacMening_enici_ Check: 163 Mx 2. 25. Co_____`+bacMening_enici_ Check: 163 Mx 2. 26. Co_____`+bacMening_enici_ Check: 165 Mx 2. 27. Co_____`+bacMening_enici_ Check: 168 Mx 2. 28. Co_____`+bacMening_enici_ Check: 171 Mx 2. 29. Co_____`+bacMeningPenici_ Check: 174 Mx 2. 30. Co___e_`+bacMeningPenicil Check: 177 Mx 2. 31. Co_y_e_`+bacMeningPenicil Check: 178 Mx 2. 32. Co_y_e_`+bacMeningPenicil Check: 181 .end literal .left margin 12 .p This Figure demonstrates the use of the system for generalization. The data base the system learned contains no information about Meningitis caused by Gram positive bacilli. However it does 'know' that other Gram positive bacilli are treated with penicillin. Therefore it 'guesses' that the right drug is penicillin. This may or may not be correct! But it is a sensible suggestion based on past experience. Notice that the number of iterations to get the answer is fairly long, indicating that the system is not totally sure of the answer. Note there is no internal record of the 'reasoning' used by the system, so errors may be quite hard to correct, unlike rule drive expert systems. .page .left margin 24 .right margin 80 .c 80 Figure Drugs-5 .b .c 80 Use of Converging Information: Consensus .b .c 80 Part I: Urinary Tract Infections .b .left margin 15 .right margin 68 .literal Mx 2. 1. ____________UrTrIn_______ Check: 48 ... Mx 2. 21. _______ -__cUrTrIn_______ Check: 108 ... Mx 2. 31. ___d___ -bacUrTrInC__lamm Check: 147 ... Mx 2. 41. _r____q -bacUrTrIn_e__am_ Check: 157 ... Mx 2. 51. _ro_e_q -bacUrTrIn_e__am_ Check: 162 ... Mx 2. 61. Prote__ -bacUrTrInGe_tamy Check: 185 ... Mx 2. 71. Proteus -bacUrTrInGe_tamy Check: 195 ... Mx 2. 80. Proteus -bacUrTrInGentamy Check: 200 .end literal .b2 .c 80 Part II: Hypersensitivity .b .left margin 15 .right margin 68 .literal Mx 2. 1. ____Hypersen_____________ Check: 64 ... Mx 2. 11. _i__Hypersens______e_____ Check: 81 ... Mx 2. 21. SiEfHypersensIj____e_____ Check: 161 ... Mx 2. 31. SiEfHypersensIn____e_____ Check: 171 ... Mx 2. 41. SiEfHypersensInj___e_____ Check: 174 ... Mx 2. 51. SiEfHypersensInje_Penicil Check: 181 ... Mx 2. 61. SiEfHypersensInje_Penicil Check: 196 .end literal .b2 .c 80 Part III. Hypersensitivity + Urinary Tract Infection .b .left margin 15 .right margin 68 .literal Mx 2. 1. ____HypersenUrTrIn_______ Check: 112 ... Mx 2. 11. Q__dHypersenUrTrInC______ Check: 126 ... Mx 2. 21. Q__dHypersenUrTrInCe__alo Check: 174 ... Mx 2. 31. Q__dHypersenUrTrInCephalo Check: 188 .end literal .page .left margin 8 .right margin 68 .c 80 Caption for Figure Drugs-5 .b .p Suppose we need to use 'converging' information, that is, find a drug that is a 'second best' choice for two requirements, but the best choice for both requirements together. This Figure demonstrates such a situation. Suppose a nasty medical school pharmacology instructor asked, 'What is a drug causing hypersensitivity and which is used to treat Urinary tract infections.' .p If the data base is told 'Urinary Tract Infection', it picks a learned vector, probably the most recent one it saw due to the short term memory effects of the decay term combined with error correction. (This effect is illustrated in Part I. of this Figure.) The drug in this case is gentamycin, whose side effect is ototoxicity. .p Hypersensitivity, used as a probe in Part II, indicates a penicillin family drug. (This is the penicillin 'allergy'.) Since penicillin is the most common drug in the data base, penicillin is the drug most strongly associated with Hypersensitivity. Penicillin is not used (in this data base) to treat urinary tract infections. .p One drug that does both is cephalosporin, and given both requirements, as in Part III, this is the choice of the system, which integrated information from both probes and gave a satisfactory answer. Ampicillin would also be a satisfactory answer. Notice that the form of this vector, where a side effect and a disease occur simultaneously never occurs in the vectors forming the data base. .page .control characters .page size 64 .left margin 6 .right margin 72 .no flags bold .no flags accept *"1 .c 80 Figure Ohms-1 .b .c 80 Stimulus Set for 'Qualitative Physics' Demonstration .b .c 80 Functional Dependencies in Ohms Law .b2 .c 80 Stimulus Set .literal F[ 1]. E__***__I_____**R**______ G[ 1]. E__***__I_____**R**______ F[ 2]. E__***__I____***R***_____ G[ 2]. E__***__I____***R***_____ F[ 3]. E__***__I___***_R_***____ G[ 3]. E__***__I___***_R_***____ F[ 4]. E__***__I__***__R__***___ G[ 4]. E__***__I__***__R__***___ F[ 5]. E__***__I_***___R___***__ G[ 5]. E__***__I_***___R___***__ F[ 6]. E__***__I***____R____***_ G[ 6]. E__***__I***____R____***_ F[ 7]. E__***__I**_____R_____**_ G[ 7]. E__***__I**_____R_____**_ F[ 8]. E**_____I**_____R__***___ G[ 8]. E**_____I**_____R__***___ F[ 9]. E***____I***____R__***___ G[ 9]. E***____I***____R__***___ F[10]. E_***___I_***___R__***___ G[10]. E_***___I_***___R__***___ F[11]. E__***__I__***__R__***___ G[11]. E__***__I__***__R__***___ F[12]. E___***_I___***_R__***___ G[12]. E___***_I___***_R__***___ F[13]. E____***I____***R__***___ G[13]. E____***I____***R__***___ F[14]. E_____**I_____**R__***___ G[14]. E_____**I_____**R__***___ F[15]. E**_____I__***__R**______ G[15]. E**_____I__***__R**______ F[16]. E***____I__***__R***_____ G[16]. E***____I__***__R***_____ F[18]. E__***__I__***__R__***___ G[18]. E__***__I__***__R__***___ F[19]. E___***_I__***__R___***__ G[19]. E___***_I__***__R___***__ F[20]. E____***I__***__R____***_ G[20]. E____***I__***__R____***_ F[21]. E_____**I__***__R_____**_ G[21]. E_____**I__***__R_____**_ .end literal .left margin 12 .right margin 68 The three asterisks in these stimuli should be viewed as an image of a broad meter pointer. The 'E', 'I', and 'R' are for convenience of the reader. If the 'pointer' deflects to the left, the value decreases, in the middle, there there is no change, to the right the value increases. .p We are trying to teach the system the functional dependencies in Ohm's Law: .literal E = I R .end literal The learning set is simply the pattern observed by holding one parameter fixed and letting the others vary. .p The autoassociative matrix generated was 45% connected and received about 25 presentations of each stimulus in random order. .page .c 80 Figure Ohms-2 .b .c 80 Response to a Learned Pattern .b .left margin 12 .literal Mx 2. 1. E***____I__***__R________ Check: 0 Mx 2. 2. E***____I__***__R________ Check: 0 Mx 2. 3. E***____I__***__R________ Check: 14 Mx 2. 4. E***____I__***__R________ Check: 48 Mx 2. 5. E***____I__***__R________ Check: 66 Mx 2. 6. E***____I__***__R*_______ Check: 69 Mx 2. 7. E***____I__***__R*_______ Check: 69 Mx 2. 8. E***____I__***__R*_______ Check: 70 Mx 2. 9. E***____I__***__R***_____ Check: 70 Mx 2. 10. E***____I__***__R***_____ Check: 71 Mx 2. 11. E***____I__***__R***_____ Check: 72 Mx 2. 12. E***____I__***__R***_____ Check: 72 Mx 2. 13. E***____I__***__R***_____ Check: 73 Mx 2. 14. E***____I__***__R***_____ Check: 76 Mx 2. 15. E***____I__***U_R***_____ Check: 79 Mx 2. 16. E***____I__***U_R***_____ Check: 85 .end literal .left margin 8 .right margin 68 .p This input pattern simply indicates that the matrix can respond appropriately to a learned pattern. It is a test that learning was adequate. Note that noise starts to appear in the last two iterations. Spurious associations will appear in the blank positions as the system continues to cycle. Note the region of stability (which displays the correct answer) from iteration 9 to 14. .page .c 80 Figure Ohms-3 .b .c 80 Response to Unlearned but Consistent Set of Inputs .b .c 80 Case 1. .b .left margin 10 .literal Mx 2. 1. E_______I***____R***_____ Check: 0 Mx 2. 2. E_______I***____R***_____ Check: 0 Mx 2. 3. E_______I***____R***_____ Check: 2 Mx 2. 4. E_______I***____R***_____ Check: 24 Mx 2. 5. E**_____I***____R***_____ Check: 26 Mx 2. 6. E***____I***____R***_____ Check: 40 Mx 2. 7. E***____I***____R***_____ Check: 51 Mx 2. 8. E***____I***____R***_____ Check: 63 Mx 2. 9. E***____I***____R***_____ Check: 70 Mx 2. 10. E***____I***____R***_____ Check: 80 Mx 2. 11. E***____I***___*R***_____ Check: 93 Mx 2. 12. E***____I***___*R***_____ Check: 95 Mx 2. 13. E***____I***___*R***___*_ Check: 95 Mx 2. 14. E***____I***_*_*R***___*_ Check: 96 Mx 2. 15. E***____I***_*_*R***___*_ Check: 96 Mx 2. 16. E****___I***_*_*R***___*_ Check: 96 .end literal .left margin 0 .b .c 80 Case 2. .b .left margin 10 .literal Mx 2. 1. E____***I***____R________ Check: 0 Mx 2. 2. E____***I***____R________ Check: 0 Mx 2. 3. E____***I***____R________ Check: 6 Mx 2. 4. E____***I***____R________ Check: 24 Mx 2. 5. E____***I***____R_____*__ Check: 27 Mx 2. 6. E____***I***____R_____**_ Check: 40 Mx 2. 7. E____***I***____R____***_ Check: 50 Mx 2. 8. E____***I***____R____***_ Check: 59 Mx 2. 9. E____***I***____R____***_ Check: 66 Mx 2. 10. E*___***I***___*R____***_ Check: 76 Mx 2. 11. E*___***I***___*R____***_ Check: 80 Mx 2. 12. E*___***I***___*R____***_ Check: 93 Mx 2. 13. E*___***I***___*R____***_ Check: 96 Mx 2. 14. E*_*_***I***___*R____***_ Check: 96 Mx 2. 15. E*_*_***I***___*R____***_ Check: 96 Mx 2. 16. E*_*_***I***___*R___****_ Check: 96 .end literal .left margin 8 .right margin 72 .p In these two tests, the system sees a pattern it never saw explicitly and it must respond with the 'most appropriate' answer. Note that although the problem is ill defined, there is a consensus answer. If we look at Ohm's Law in both the first and second cases, the equation suggests a consistent interpretation: .b First Case, I and R both are down, therefore .b .i20 NR I NR R ==> NR E .b Second Case, E is up and I is down, therefore .b .i22 NE E .i20 ------ ==> NE R .i22 NR I .page .c 80 Figure Ohms-4 .b .c 80 Inconsistent Stimulus Set .b .left margin 10 .literal Mx 2. 1. E***____I***____R________ Check: 0 Mx 2. 2. E***____I***____R________ Check: 0 Mx 2. 3. E***____I***____R________ Check: 10 Mx 2. 4. E***____I***____R________ Check: 70 Mx 2. 5. E***____I***____R________ Check: 72 Mx 2. 6. E***____I***____R________ Check: 72 Mx 2. 7. E***____I***____R________ Check: 72 Mx 2. 8. E***____I***____R________ Check: 72 Mx 2. 9. E***____I***____R*_______ Check: 72 Mx 2. 10. E***____I***____R*_______ Check: 72 Mx 2. 11. E***____I***____R*____**_ Check: 72 Mx 2. 12. E***____I***____R*___***_ Check: 72 Mx 2. 13. E***____I***____R*_*_***_ Check: 72 Mx 2. 14. E***____I***__U_R***_***_ Check: 72 Mx 2. 15. E***____I***__U_R***_***_ Check: 72 Mx 2. 16. E***____I***__U_R***_***_ Check: 72 .end literal .left margin 8 .right margin 68 .p There is no such consistency in this case, and there is no consensus. Note the answer is 'confused' and shows many possible answers. .p In this case, E is down and I is down. If we look at the equation, .b .i22 NR E .i20 ------ ==> NRNE R .i22 NR I .b the top and bottom of the equation 'fight' each other and there is no agreement. .page .no flags accept .page size 62 .right margin 79 .left margin 0 .c Figure Network-1 .b .c A Simple 'Semantic' Network .b .literal Superset |------------------------------> ANML <------------------| | | | | (gerbil) <--> animal <--> (elephant) | | small ^ large | | dart v walk | | skin (raccoon) skin | | brown medium gray | | climb ^ | skin | | Subset BIRD black | | | | | (canary) <--> bird <--> (robin) (examples) | | medium ^ medium Clyde ----------->| | fly v fly Fahlman | | seed (pigeon) worm | | yellow medium red | | ^ fly Jumbo ----------->| | | junk large | | gray circus | | | | | |-----------------------------------------Tweetie | small | cartoon | | | |---------------------------------------------------------- FISH | (guppy) <--> fish <--> (tuna) <-------------Charlie small ^ large StarKist swim v swim inadequate food (trout) fish transparent medium silver swim bugs silver .end literal .left margin 12 .right margin 68 .p The network simulation will realize a system that acts as if it was described by this network. The material and structure of the simulation was inspired by the network made famous by Collins and Quillian. One (of many) ways of realizing this network in terms of pairs of associations is given in Figure Network-2. .page .c Figure Network-2 .b .c Stimulus Set .left margin 5 .right margin 72 .b .literal F[ 1]. BIRD_*_bird___fly_wormred G[ 1]. _____*_robin__fly_wormred F[ 2]. _____*_robin__fly_wormred G[ 2]. BIRD_*_bird___fly_wormred F[ 3]. BIRD_*_bird___fly_junkgry G[ 3]. _____*_pigeon_fly_junkgry F[ 4]. _____*_pigeon_fly_junkgry G[ 4]. BIRD_*_bird___fly_junkgry F[ 5]. BIRD_*_bird___fly_seedylw G[ 5]. _____*_canary_fly_seedylw F[ 6]. _____*_canary_fly_seedylw G[ 6]. BIRD_*_bird___fly_seedylw F[ 7]. ANML*__animal_dartskinbrn G[ 7]. ____*__gerbil_dartskinbrn F[ 8]. ____*__gerbil_dartskinbrn G[ 8]. ANML*__animal_dartskinbrn F[ 9]. ANML_*_animal_clmbskinblk G[ 9]. _____*_raccoonclmbskinblk F[10]. _____*_raccoonclmbskinblk G[10]. ANML_*_animal_clmbskinblk F[11]. ANML__*animal_walkskingry G[11]. ______*elephanwalkskingry F[12]. ______*elephanwalkskingry G[12]. ANML__*animal_walkskingry F[13]. BIRD_____________________ G[13]. ANML_____________________ F[14]. _______Clyde___Fahlman___ G[14]. ______*elephanwalkskingry F[15]. ____*__Tweetie_cartoon___ G[15]. _____*_canary_fly_seedylw F[16]. ______*Jumbo____circus___ G[16]. ______*elephanwalkskingry F[17]. FISH_____________________ G[17]. ANML_____________________ F[18]. FISH*__fish___swimfoodxpr G[18]. ____*__guppy__swimfoodxpr F[19]. ____*__guppy__swimfoodxpr G[19]. FISH*__fish___swimfoodxpr F[20]. FISH_*_fish___swimbugsslv G[20]. _____*_trout__swimbugsslv F[21]. _____*_trout__swimbugsslv G[21]. FISH_*_fish___swimbugsslv F[22]. FISH__*fish___swimfishslv G[22]. ______*tuna___swimfishslv F[23]. ______*tuna___swimfishslv G[23]. FISH__*fish___swimfishslv F[24]. StarKistCharlieinadequate G[24]. ______*tuna___swimfishslv .end literal .left margin 12 .right margin 68 .p This is one set of pairs of stimuli that realize the simple 'semantic' network in Figure Network-1. Two matrices were involved in realizing the network, an autoassociative network, where every allowable state vector is associated with itself, and a true associator, where f was associated with g. The Widrow-Hoff learning procedure was used. Pairs were presented randomly for about 30 times each. Both matrices were about 50% connected. .page .c Figure Network-3 .b .c 'Tell me about gray animals' .left margin 15 .right margin 68 .b .literal Mx 2. 1. ANML___animal_________gry Check: 0 Mx 2. 2. ANML___animal_________gry Check: 5 ... Mx 2. 12. ANML___animal_________gry Check: 107 Mx 2. 13. ANML__*animal_________gry Check: 122 Mx 2. 14. ANML__*animal_________gry Check: 128 Mx 2. 15. ANML__*animal_________gry Check: 128 Mx 2. 16. ANML__*animal_________gry Check: 129 Mx 2. 17. ANML__*animal_______i_gry Check: 131 Mx 2. 18. ANML__*animal_______i_gry Check: 132 Mx 2. 19. ANML__*animal_______i_gry Check: 133 Mx 2. 20. ANML__*animal___l___i_gry Check: 133 ... Mx 2. 26. ANML__*animal___lk_kingry Check: 149 Mx 2. 27. ANML__*animal__alk_kingry Check: 150 Mx 2. 28. ANML__*animal_walkskingry Check: 154 Mx 2. 29. ANML__*animal_walkskingry Check: 157 Mx 2. 30. ANML__*animal_walkskingry Check: 163 Mx 2. 31. ANML__*animal_walkskingry Check: 165 Mx 2. 32. ANML__*animal_walkskingry Check: 167 Mx 2. 33. ANML__*animal_walkskingry Check: 168 Mx 2. 34. ANML__*animal_walkskingry Check: 169 Mx 2. 35. ANML__*animal_walkskingry Check: 172 Mx 2. 36. ANML__*animal_walkskingry Check: 176 Mx 2. 37. ANML__*animal_walkskingry Check: 176 Mx 2. 38. ANML__*animal_walkskingry Check: 176 Mx 1. 39. ANML__*______nwalkskingry Check: 128 Mx 1. 40. ______*elephanwalkskingry Check: 136 Mx 1. 41. ______*elephanwalkskingry Check: 150 Mx 1. 42. ______*elephanwalkskingry Check: 152 Mx 1. 43. ______*elephanwalkskingry Check: 152 Mx 2. 44. ______*elephanwalkskingry Check: 152 Mx 2. 45. ______*elephanwalkskingry Check: 152 Mx 2. 46. ______*elephanwalkskingry Check: 152 Mx 1. 47. ANML__*______nwalkskingry Check: 128 Mx 1. 48. ANML__*ani_a_nwalkskingry Check: 160 Mx 1. 49. ANML__*animal_walkskingry Check: 170 Mx 1. 50. ANML__*animal_walkskingry Check: 173 .end literal .p .left margin 12 Once the system has learned satisfactorily, and the matrices are formed, the matrices can be used to extract stored information. First, the autoassociative matrix is used to reconstruct information at a node. When the number of limited elements in the state vector stabilizes, the true association matrix is used, and the state of the system changes nodes. (See iterations 39 and 47.) The color, 'gry', appears in several different stimuli, but is disambiguated by the other information. (See Figure Network-4). Note the simulation will endlessly move back and forth between these two nodes unless jarred loose by some other mechanism such as adaptation. .page .c Figure Network-4 .b .c 'Gray birds' .b .left margin 15 .literal Mx 2. 1. BIRD___bird___________gry Check: 0 Mx 2. 2. BIRD___bird___________gry Check: 0 Mx 2. 3. BIRD___bird___________gry Check: 26 Mx 2. 4. BIRD___bird___________gry Check: 43 Mx 2. 5. BIRD___bird___________gry Check: 46 Mx 2. 6. BIRD___bird___________gry Check: 51 Mx 2. 7. BIRD___bird___________gry Check: 58 Mx 2. 8. BIRD___bird___________gry Check: 67 Mx 2. 9. BIRD___bird___f_______gry Check: 71 Mx 2. 10. BIRD___bird___f_______gry Check: 76 ... Mx 2. 20. BIRD_**bird___f___j__kgry Check: 127 Mx 2. 21. BIRD_**bird___f_y_ju_kgry Check: 128 Mx 2. 22. BIRD_**bird___fly_junkgry Check: 129 Mx 2. 23. BIRD_**bird___fly_junkgry Check: 134 Mx 2. 24. BIRD_**bird___fly_junkgry Check: 140 Mx 2. 25. BIRD_**bird___fly_junkgry Check: 141 Mx 2. 26. BIRD_**bird___fly_junkgry Check: 144 Mx 2. 27. BIRD_**bird___fly_junkgry Check: 144 Mx 2. 28. BIRD_**bird___fly_junkgry Check: 146 Mx 2. 29. BIRD_**bird___fly_junkgry Check: 147 Mx 2. 30. BIRD_**bird___fly_junkgry Check: 149 Mx 2. 31. BIRD_**bird___fly_junkgry Check: 149 Mx 2. 32. BIRD_**bird___fly_junkgry Check: 149 Mx 1. 33. BIRD_**_i__on_fly_junkgry Check: 112 Mx 1. 34. B____**pi_eon_fly_junkgry Check: 120 Mx 1. 35. _____**pigeon_fly_junkgry Check: 122 Mx 1. 36. _____**pigeon_fly_junkgry Check: 125 Mx 1. 37. _____*_pigeon_fly_junkgry Check: 129 Mx 2. 38. _____**pigeon_fly_junkgry Check: 132 Mx 2. 39. _____**pigeon_fly_junkgry Check: 137 Mx 2. 40. ___L_**pigeon_fly_junkgry Check: 139 Mx 2. 41. ___L_**pigeon_fly_junkgry Check: 142 Mx 2. 42. ___L_**pigeon_fly_junkgry Check: 143 Mx 2. 43. ___L_**pigeon_fly_junkgry Check: 146 Mx 2. 44. ___L_**pigeon_fly_junkgry Check: 149 Mx 2. 45. ___L_**pigeon_fly_junkgry Check: 149 .end literal .left margin 12 .p We now use 'gry' in the context of birds to demonstrate disambiguation, among other things. The system now tells us about pigeons rather than elephants. Note the confusion where the simulation is not sure whether pigeons are medium sized or large. Note also the intrusion of the 'L', (Iteration 40) probably from ANML, which is the upward association of BIRD. .page .c Figure Network-5 .b .c 'Large circus creature.' .b .left margin 15 .literal Mx 2. 1. ______*_________circus___ Check: 0 Mx 2. 2. ______*_________circus___ Check: 0 Mx 2. 3. ______*_________circus___ Check: 1 Mx 2. 4. ______*_________circus___ Check: 6 Mx 2. 5. ______*_________circus___ Check: 19 Mx 2. 6. ______*_________circus___ Check: 39 Mx 2. 7. ______*_________circus___ Check: 49 Mx 2. 8. ______*_________circus___ Check: 51 Mx 2. 9. ______*J________circus___ Check: 58 Mx 2. 10. ______*J________circus___ Check: 65 Mx 2. 11. ______*J__bo____circus___ Check: 68 Mx 2. 12. ______*J__bo____circus___ Check: 72 Mx 2. 13. ______*J__bo____circus___ Check: 73 Mx 2. 14. ______*Ju_bo____circus___ Check: 76 Mx 2. 15. ______*Jumbo____circus___ Check: 80 Mx 2. 16. ______*Jumbo____circus___ Check: 82 Mx 2. 17. ______*Jumbo____circus___ Check: 86 Mx 2. 18. ______*Jumbo____circus___ Check: 88 Mx 2. 19. ______*Jumbo____circus___ Check: 92 Mx 2. 20. ______*Jumbo____circus___ Check: 93 Mx 2. 21. ______*Jumbo____circus___ Check: 94 Mx 2. 22. ______*Jumbo____circus___ Check: 94 Mx 2. 23. ______*Jumbo____circus___ Check: 97 Mx 2. 24. ______*Jumbo____circus___ Check: 97 Mx 2. 25. ______*Jumbo____circus___ Check: 97 Mx 1. 26. ______*_____anw__________ Check: 67 Mx 1. 27. ______*el_phanwa_ksk_ngr_ Check: 105 Mx 1. 28. ______*elephanwalksk_ngr_ Check: 136 Mx 1. 29. ______*elephanwalkskingr_ Check: 145 Mx 1. 30. ______*elephanwalkskingry Check: 148 Mx 2. 31. ______*elephanwalkskingry Check: 149 Mx 2. 32. ______*elephanwalkskingry Check: 149 Mx 2. 33. ______*elephanwalkskingry Check: 149 Mx 1. 34. ANML__*______nwalkskingry Check: 133 Mx 1. 35. ANML__*ani_a_nwalkskingry Check: 160 Mx 1. 36. ANML__*ani_al_walkskingry Check: 165 Mx 1. 37. ANML__*ani_al_walkskingry Check: 171 Mx 1. 38. ANML__*ani_al_walkskingry Check: 173 Mx 2. 39. ANML__*ani_al_walkskingry Check: 172 Mx 2. 40. ANML__*ani_al_walkskingry Check: 173 Mx 2. 41. ANML__*animal_walkskingry Check: 174 Mx 2. 42. ANML__*animal_walkskingry Check: 174 .end literal .left margin 12 .p How to answer the perennially interesting question, 'What color is Jumbo?'. Or, if you wish, how to do straightforward property inheritance with distributed models. .page .c Figure Network-6 .b .c 'Tell me about Tweetie.' .b .left margin 15 .literal Mx 2. 1. _______Tweetie___________ Check: 0 Mx 2. 2. _______Tweetie___________ Check: 0 Mx 2. 3. _______Tweetie___________ Check: 6 Mx 2. 4. _______Tweetie___________ Check: 9 Mx 2. 5. _______Tweetie___________ Check: 13 Mx 2. 6. _______Tweetie___________ Check: 16 Mx 2. 7. _______Tweetie___________ Check: 22 Mx 2. 8. _______Tweetie_car_______ Check: 26 Mx 2. 9. _______Tweetie_car_______ Check: 32 Mx 2. 10. _______Tweetie_cart______ Check: 37 Mx 2. 11. ____*__Tweetie_cart______ Check: 42 Mx 2. 12. ____*__Tweetie_cart______ Check: 54 Mx 2. 13. ____*__Tweetie_cartoon___ Check: 63 Mx 2. 14. ____*__Tweetie_cartoon___ Check: 84 Mx 2. 15. ____*__Tweetie_cartoon___ Check: 92 Mx 2. 16. ____*__Tweetie_cartoon___ Check: 99 Mx 2. 17. ____*__Tweetie_cartoon___ Check: 101 Mx 2. 18. ____*__Tweetie_cartoon___ Check: 104 Mx 2. 19. ____*__Tweetie_cartoon___ Check: 108 Mx 2. 20. ____*__Tweetie_cartoon___ Check: 112 Mx 2. 21. ____*__Tweetie_cartoon___ Check: 113 Mx 2. 22. ____*__Tweetie_cartoon___ Check: 115 Mx 2. 23. ____*__Tweetie_cartoon___ Check: 116 Mx 2. 24. ____*__Tweetie_cartoon___ Check: 117 Mx 2. 25. ____*__Tweetie_cartoon___ Check: 119 Mx 2. 26. ____*__Tweetie_cartoon___ Check: 120 Mx 2. 27. ____*__Tweetie_cartoon___ Check: 120 Mx 2. 28. ____*__Tweetie_cartoon___ Check: 120 Mx 1. 29. ____**_______ef__r_____lw Check: 68 Mx 1. 30. _____*__anary_fly_seedylw Check: 103 Mx 1. 31. _____*_canary_fly_seedylw Check: 127 Mx 1. 32. _____*_canary_fly_seedylw Check: 133 Mx 1. 33. _____*_canary_fly_seedylw Check: 134 Mx 2. 34. _____*_canary_fly_seedylw Check: 135 Mx 2. 35. _____*_canary_fly_seedylw Check: 135 Mx 2. 36. _____*_canary_fly_seedylw Check: 135 Mx 1. 37. BIRD_*_____ry_fly_seedylw Check: 112 Mx 1. 38. BIRD_*_b_rd_y_fly_seedylw Check: 141 Mx 1. 39. BIRD_*_bird___fly_seedylw Check: 143 Mx 1. 40. BIRD_*_bird___fly_seedylw Check: 151 Mx 1. 41. BIRD_*_bird___fly_seedylw Check: 152 Mx 2. 42. BIRD_*_bird___fly_seedylw Check: 150 .end literal .left margin 12 .p Information about Tweetie is generated by the name. Note that Tweetie is small, but canaries are in general medium sized and yellow, so small is stored at the Tweetie node.