DART1--A Possible Ideal Building Block of Memory Systems

C. J. Wu

P.O. Box 2825 c/s

Department of Computer Science

New Mexico Tech.

Socorro, NM 87801

Abstract - In this paper, a bidirectional associative memory dual adaptive resonance theory 1 (DART1) consisting of two simplified ART1 (SART1) networks, which share the same category layer called the conceptual layer and use only top-down weight matrices, is studied. The DART1 can meet the most important requirement of the neural networks as the memory device, that is, the ability to store arbitrary patterns in networks and retrieve them correctly. Other advantages of this model include capable of associating one pattern with another arbitrary pattern like the MLP (multilayer perceptron) networks, eliminating the input noise as a nearest-neighbor classifier in terms of the Hamming distance, capable of handling both spatial and temporal patterns, full memory capacity like the ART1 networks, fast construction of weights, easy to update the memory system, and self-organizing ability in response to arbitrary binary patterns. In addition, The DART1 can deal with the 1-to-many and many-to-1 mappings in the domains formed by trained patterns. These features make the DART1 very attractive as the building block of memory system. Therefore, it is possible to build the neural-based database or expert systems, which are the future researches.

I. INTRODUCTION

Without memories, man is the creature of instant. To interpret the input information and have proper responses, the past experience related to those information must be retrieved correctly from the associative memory system; therefore, the memory system plays a crucial role in an integrated artificial neural network. In this work, the following criteria are used to evaluate various memory systems.

1. Guaranteed recall of all trained patterns

2. High memory capacity

3. No spurious memories.

4. The incremental ability, that is, easy to update the content of system

5. Short training time and fast retrieval

6. The ability to correct the input noise

7. Capable of handling both spatial and temporal patterns

8. Capable of handling 1-to-1, many-to-1, and 1-to-many mappings

For example, the performance analysis of multilayer perceptron (MLP) with back propagation (BP) learning algorithm [1] indicates that MLP has satisfactory performance regarding criteria 2, 3, 6, and 8 (except 1-to-many mapping), but fails in criteria 4, 5, and 7. In theory, MLP can meet criterion 1 though, in practice, initial weights and learning rate must be set up correctly to succeed. The analysis for another famous paradigm bidirectional associative memory (BAM) [2] shows that BAM can meet criteria 5-7; however, BAM usually has disadvantages of no guaranteed recall of trained patterns, lots of spurious memories, and its memory capacity for non-homogeneous BAM [3]. For the incremental ability, in theory, you can add or delete a pair of memories from system by

where are the target memories and are the new and old correlation weight matrices, respectively. However, there is no guarantee that the previous attractors will still be the attractors afterwards. In addition, only the 1-to-1 mapping is allowed in the input domain, that is, there exist no 1-to-many mapping, like and , and many-to-1 mapping, like and .

Many models had been proposed to solve these problems since Kosko introduced the BAM, particularly on retrieval correctness and memory capacity improvement [4-10]; however, none of them can achieve the goals of full memory capacity and the association of two arbitrary patterns. In [11], a multilayer perceptron (MLP) like neural network had been proposed to improve the capacity of Hopfield nets. Despite of its significant improvement of capacity, some input patterns will be ruled out under this model; hence, there is no guarantee that you can store any patterns you like in networks. Other disadvantages of their model are unidirectional association and incapable of handling the temporal patterns, 1-to-many, and many-to-1 mappings. The MBDS [12] can achieve the guaranteed recall of trained patterns by adding two more layers and four more weights; however, it is only suitable for the batch learning and lacks incremental ability since previous stored patterns need to be rechecked whenever a new pair of inputs is added into system. The ARTMAP [13], which consists of two independent ART1 units and an unbalanced map field, can meet most of the criteria except bidirectional associations, 1-to-many mapping, and temporal patterns.

In this work, the Dual ART1 (DART1) consisting of two simplified ART1 (SART1) networks, which share the same category layer called the conceptual layer, is proposed to conquer these disadvantages of ARTMAP and more cost efficiency since the DART1 use only two feature layer, one category layer, and two weight matrices instead of four layers, one map field, and at least five weight matrices in the ARTMAP. In a word, the DART can satisfy all the criteria mentioned above and be an ideal building block for future memory system. The rest of the paper is organized as follows. Section II describes the architecture and the learning algorithms of DART1 networks. The features of DART1 are discussed in section III. Lastly, conclusions and discussions are given in section IV.

II. THE ARCHITECTURE AND LEARNING ALGORITHMS OF DART1

The adaptive resonance theory 1 (ART1) [14], [15], has many advantages, such as the full memory capacity and no spurious memories. Unfortunately, the ART1 networks can not handle the bidirectional associations like the BAM. The DART1 is introduced to overcome this shortcoming. The architecture of the DART1 can be seen in Fig. 1. The system consists of two simplified ART1 (SART1) networks, which share the same conceptual layer () running in the competitive mode by using the on-center and off-surround feedback networks. In addition, each SART1 uses one weight matrix instead of two in the original ART1 network. The advantages of SART1 are less processing time and implementation cost.


Other modifications include using the Hamming distance to measure the similarity for guaranteed recall of all stored patterns [16], and allowing the vigilance parameter bigger than 1 to adapt the DART1 to the 1-to-many and many-to-1 mappings in the input domains. For the many-to-1 mappings like and , and are stored in and where and are the winning nodes of two input pairs respectively, but is copied to both and . For 1-to-many mappings like and , the operations are similar to those of many-to-1 mappings except that is copied to both and , and and are stored in the and respectively.

A. The Encoding Algorithm Of DART1

We assume that there are no repeated input pairs, that is, for P pairs of inputs, if (), then (). In practice, the repeated pairs can be found by running the retrieval algorithm first. Please note that there is a trade-off between the guaranteed recall of all trained patterns and the adaptive ability. By setting vigilance parameters bigger than 1, it is guaranteed that we can retrieve all trained patterns; however, the networks lose the adaptive ability. To encode the input pair from side, the algorithm is as follows:

Step 1A. Read the unipolar inputs and initialize the weight vectors associated with an uncommitted node in the conceptual layer as follows:

(1)

For the vigilance parameter , there are two cases:

Case 1: guaranteed recall of all trained patterns

(2)

Case 2: noisy environment or the features needed to be extracted from patterns

(3)

where is the side of vigilance parameter.

Step 2A. Present the binary (unipolar) input vector to . The winning node, say j, is the node with the weight vector () most similar to in terms of the Hamming distance in where M is the dimension of . In case of tie, one of them is selected arbitrarily.

Step 3A. Test the similarity between the prototype of winner and by computing the following ratio:

(4)

Step 4A. If , go to Step 5A; otherwise, a new pair of categories will be created unless there is no new node available in . In that case, the operation goes to Step 6A.

Step 5A. Update only the weight vectors associated with the winning node as follows.

(5)

(6)

where is the counterpart vector of presented to layer, and M, N are the dimensions of and , respectively. Thus, the input pair will be stored in respectively.

Step 6A. If there is no new input vector, the operation is terminated; otherwise, the system gets the next input vector and goes back to Step 2A.

Please note that one iteration is enough for each input presented to networks instead of several possible iterations needed in the original ART1 algorithm. For example, for 4 pairs of inputs as follows:

=(1, 0, 0, 1, 1), =(1, 1 , 0, 0),

=(0, 1, 1, 1, 1), =(1, 1 , 0, 0),

=(1, 0, 0, 1, 1), =(1, 0, 1, 0),

=(0, 0, 1, 1, 1), =(0, 0, 1, 1),

After encoded by setting , the weight matrices are as follows:

where the columns of and represent the input patterns copied from and , respectively. Please note that the first and third columns of are the same, so are the first and second columns of ; hence, we have the 1-to-many and many-to-1 mappings on both directions.

B. The Retrieval Algorithm Of DART1

There are two modifications in this algorithm. One is to use the Hamming distance to measure the similarity of input pattern and existed prototypes. The other is, unlike ordinary ART networks, that the retrieval does not cease after the first successful search; hence, the networks intend to retrieve all possible answers one by one. If is used to retrieve and vigilance parameter is set as 1, only the prototypes (of ) identical to the input will be selected; otherwise, all the prototypes (of ) similar enough to will be picked out. The other direction of retrieval, from to , can be done in the similar way.

Step 1B. Initialize the vigilance parameter as follow:

(7)

Step 2B. Same as Step 2A

Step 3B. Same as Step 3A

Step 4B. If , output the counterpart prototype of , , as the answer; otherwise, the operation terminates. For each node m in layer , the output value is decided as follows:

(8)

(9)

(10)

where P and are the number of committed nodes and output value of node j in respectively, is the weight matrix between the and , and is the output vector associated with .

Step 5B. Node is deactivated by setting . Then the maximum value of in the remaining nodes will be picked out as a new winning node, and the operation goes back to Step 3B to search other possible answers. If all the committed nodes have been deactivated, the operation terminates.

Please note there are several possible iterations between Step 3B and 4B for each retrieval to search all qualified answers.

III. THE FEATURES OF DART1

The DART1 has the following properties:

1. Like the ART1, it has the full memory capacity which is restricted by the number of nodes in the conceptual layer. This property had been demonstrated by Grossberg [15]. Since adding a new node in conceptual layer will not affect the weights connected with other nodes, a new node can be added any time when needed; hence, the system can be updated at any time, and the real limit of capacity is the number of nodes in the conceptual layer.

2. It has the ability to associate one pattern with another arbitrary one. To encode the inputs and deal with the 1-to-many and many-to-1 mappings, we can set vigilance bigger than 1. For encoding, if node j is the winner, the input pair will be stored in respectively; therefore, the association of and is certain through the node j in the conceptual layer which is shared by both SART1 networks. For retrieval, we let networks continuously search all possible answers until the first failure of similarity test occurs.

3. The weight matrices can be constructed very fast. Although two SART1 networks are used in a DART1 network, the time needed to construct the weights is almost equal to a SART1 network. Therefore, there is little overhead for the DART1. Since we compare the inputs with existed prototypes directly, only one iteration is needed for encoding each input pair instead of several possible iterations in the original algorithm.

4. It uses the Hamming distance to correct the input noise or achieve the generalization by pushing the input point to the nearest (or the most similar) stored pattern in the Hamming distance space.

5. There are no spurious memories in the DART1. In contrast to the correlation matrices and energy functions used in the BAM, no local minima (or spurious memories) existed in this methodology since the DART1 always responds to the retrieval inquiry with one of the stored memories.

6. It is capable of handling both spatial and temporal patterns.

IV. CONCLUSIONS AND DISCUSSIONS

In this work, the DART1 as a possible ideal memory building block has been studied. It is shown that the DART1 can guarantee the association of two arbitrary patterns bidirectionally and is capable of handling both 1-to-many and many-to-1 mappings in the input domains. In addition, the DART1 also has other advantages inherited from the ART1 [17], such as full memory capacity, easy to update the memory content, no spurious memories, fast weight construction, and adaptive ability. A possible expansion of DART1 to accommodate multi-spatial and temporal patterns can be done by hooking up more feature layers with the central category layer. These features make the DART1 very attractive as the building block of memory system. Therefore, it is possible to build the neural-based database or expert systems which are the future researches. Other important criteria which have not been introduced are the system reaction under the limited memory capacity and robustness. For example, that ART1 simply stop learning after saturation (running out of nodes) is not what we want. Also, The ART1 is very vulnerable for node failure. These issues will be the further research topics as well.

REFERENCES

[1] D. E. Rumelhart, J. L. McClelland, and the PDP Research Group (1986), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations, Cambridge: MIT Press.

[2] B. Kosko, "Bidirectional Associative Memories," IEEE Trans. on System, Man, and Cybern., vol. 18, no. 1, pp. 49-60, Jan./Feb. 1988.

[3] H. Karen and R. Hecht-Nielsen, "A BAM with Increased Information Storage Capacity," in Proceedings of IEEE/INNS International Joint Conference on Neural Networks, 1988, pp. 181-190.

[4] Y. F. Wang, J. B. Cruz, and J. H. Mulligan, "Two Coding Strategies for Bidirectional Associative Memory," IEEE Trans. on Neural Networks, vol. 1, no. 1, pp. 81-91, March 1990.

[5] Y. F. Wang, J. B. Cruz, and J. H. Mulligan, "Guaranteed Recall of All Training Pairs for Bidirectional Associative Memory," IEEE Trans. on Neural Networks, vol. 2, no. 6, pp. 559-567, Nov. 1991.

[6] P. K. Simpson, "Higher-Ordered and Intraconnected Bidirectional Associative Memories," IEEE Trans. on Systems, Man, Cybernetics, vol. 20, no. 3, pp. 637-653, 1990.

[7] B. L. Zhang, B. Z. Xu, and C. P. Kwong, "Stability and Attractivity Analysis of Bidirectional Associative Memory from the Matched-Filtering Viewpoint," in Proc. IEEE/INNS International Joint Conference on Neural Networks, 1991, pp. 2277-2281.

[8] J. Sun and T. Nagata, "Associative Memory on the Basis of High Correlation," in Proc. IEEE/INNS International Joint Conference on Neural Networks, vol. 1, 1992, pp. 482-486.

[9] C. C. Wang and H. S. Don, "High-Capacity Discrete Exponential BAM," in Proc. IEEE/INNS International Joint Conference on Neural Networks, vol. 2, 1992, pp. 178-183.

[10] H. Oh and S. C. Kothari, "A Pseudo-Relaxation Learning Algorithm for Bidirectional Associative Memory," in Proc. IEEE/INNS International Joint Conference on Neural Networks, vol. 2, 1992, pp. 208-213.

[11] J. Hao and J. Vandewalle, "A New Model of Neural Associative Memories," in Proc. IEEE/INNS International Joint Conference on Neural Networks, vol. 2, 1992, pp. 116-171.

[12] W.-J. Wang and D.-L. Lee, "A Modified Bidirectional Decoding Strategy Based on the BAM Structure," IEEE Trans. on Neural Networks, vol. 4, no. 4, pp. 710-717, Jul. 1993.

[13] G. A. Carpenter, S. Grossberg, and J. H. Reynolds, "ARTMAP: Supervised Real-Time Learning and Classification of Nonstationary Data by a Slef-Organizing Neural Network," Neural Networks, vol. 4, no. 5, 1991, pp. 565-588.

[14] S. Grossberg, "Adaptive Pattern Classification and Universal Recording: I. Parallel Development and Coding of Neural Feature Detectors," Biological Cybernetics, vol. 23, pp. 121-134, 1976.

[15] G. A. Carpenter and S. Grossberg, "A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine," Computer Vision, Graphics, and Image Processing, vol. 37, pp. 54-115, 1987.

[16] C. J. Wu, A. H. Sung, and H. S. Soliman, "Image Data Compression Using ART Networks," in Proc. Artificial Neural Networks In Engineering, 1993, pp. 417-422.

[17] G. A. Carpenter and S. Grossberg, "The ART of Adaptive Pattern Recognition by a Self-Organizing Neural Networks," Computer, vol. 21, 1988, pp. 77-88.

C.J. Wu, " DART1--A Possible Ideal Building Block of Memory Systems," Proc. of IEEE Int. Conf. on Neural Networks (Orlando, FL), pp. 1086-1091, June 1994.