# A Fine Grain Microprocessor Design Education considering Situated Nature of Learning

## Ryuichi TAKAHASHI<sup> $\dagger$ </sup>, Hajime OHIWA<sup> $\dagger\dagger$ </sup> and Yoshiyasu TAKEFUJI<sup> $\dagger\dagger$ </sup>,

<sup>†</sup>Hiroshima City University, Hiroshima Japan, <sup>††</sup> Keio University, Kanagawa Japan

#### Summary

This paper proposes a new training method for instructing newcomers in the field of microprocessor design. By using this method, which is an extension of the method known as legitimate peripheral participation (LPP) proposed by Lave and Wenger, the newcomers can obtain creativity beyond the teaching materials in the course. In our microprocessor design education for junior students, we had been using pipelining as the subject for the first 5 years, which resulted in a failure, since it requires a first step, which could not be appropriate for the observation of the microprocessors. After we started to use the instruction issue logic for the superscalar microprocessors as a "way-in" considering LPP, many devices have become appeared among the junior students' designs. Instruction issue logic has worked as a very good way-in for the observation, since it is the heart of the superscalar microprocessors and designed at the last stage of the design phases. We expect that showing the central part of the product designed at the last stage will be effective for many other cases of microelectronic systems design education.

#### Key words:

Pipelining, superscalar microprocessor, design education, legitimate peripheral participation, way-in

## **1. Introduction**

The effective science training has become critically important as the information technology society grows. We had been searching an effective method for training the skill to design fine grain parallel processors, which are for the instruction level parallelism and are used widely for the commercially available microprocessors as pipelined microprocessors, superscalar microprocessors and very long instruction word (VLIW) processors. The course for teaching computer organization and architecture [1] is one of the prior attempts to train undergraduate students to be educated members of this information technology society. In our educational program, junior students are expected to be creative through the course where they design their original architectures implemented by their original organizations, which are implemented by using FPGA with appropriate random logics and I/Os. Although students were expected to design their own original architectures and organizations, the first 5 years of this course, where the pipelining had been used as a subject, resulted in a failure. They just solved the easiest assignment prepared by us without any original devices. This situation was greatly improved by introducing a new method considering a theory known as legitimate peripheral participation (LPP) [2] proposed by Lave and Wenger. Our new program since 2001, where the instruction issue logic for the superscalar microprocessors has been used as the way-in for the LPP, has worked very well. Many devices have become appeared among the designs by the junior students more than we expected. We believe this method can be applied to many other cases where we have to initiate young scientists into creative design work in our information technology society.

The next section describes the prior pipelining design education, which resulted in a failure. Section 3 describes the LPP with our extension. Section 4 describes the latest result of the successful superscalar microprocessor design education considering situated nature of learning.

## 2. Pipelining design education

Instruction level parallelism is widely explored by fine grain microprocessors implemented as pipelined microprocessors,



Fig. 1 Reservation Tables related to CISC-3.

Manuscript received June 5, 2008. Manuscript revised June 20, 2008.

superscalar microprocessors and very long instruction word (VLIW) processors. Pipelining [3] is the most basic implementation of the fine grain microprocessors

In our educational environment City-1, pipelining had been used as the subjects for the first 5 years since 1996. Reservation table is a two-dimensional tabular description representing the stage utilization for the pipelining. One of the authors wrote down an example description of a pipelined microprocessor named CISC-3 in 639 lines Verilog HDL with some (micro) architectural assignments.

Table 1: Records in 2000 (Pipelining)

| Lines | Gates | AO.   | Features | ID     |
|-------|-------|-------|----------|--------|
| 1,261 | 3,500 | R13P3 | B,I      | E09    |
| 1,149 | 4,250 | R15P3 | IE       | E28    |
| 1,137 | 5,400 | R19P3 | IE       | E27    |
| 1,085 | 7,375 | R20P3 | B,IE     | E54    |
| 1,037 | 5,750 | R16P3 | IE       | E31    |
| 971   | 3,225 | R15P3 | IE       | E25    |
| 884   | 4,675 | R19P3 | IE       | E45    |
| 871   | 3,925 | R15P3 | IE       | E14    |
| 870   | 3,550 | C14P3 | MD       | E16    |
| 803   | 2,625 | R13P3 | -        | E46    |
| 799   | 3,775 | R12P3 | IE       | E41    |
| 757   | 1,100 | C10P3 | -        | E19    |
| 753   | 2,450 | C14P3 | Ι        | E44    |
| 752   | 1,775 | R12P3 | -        | E36    |
| 695   | 2,325 | C11P3 | Е        | E53    |
| 689   | 1,800 | C13P3 | IE       | E52    |
| 689   | 3,500 | C11P3 | -        | D49    |
| 686   | 2,150 | R12P3 | -        | E32    |
| 685   | 2,100 | R14P3 | -        | E40    |
| 682   | 1,125 | C12P3 | -        | E22    |
| 672   | 2,150 | R08P3 | -        | E37    |
| 671   | 1,700 | C13P3 | -        | E55    |
| 666   | 2,475 | R14P3 | Е        | E48    |
| 663   | 1,500 | C12P3 | -        | E07    |
| 663   | 1,450 | C09P3 | -        | E02    |
| 657   | 2,075 | R08P3 | -        | E21    |
| 657   | 1,775 | C13P3 | -        | E08    |
| 655   | 1,125 | R16P3 | -        | E05    |
| 652   | 1,025 | C10P3 | -        | E34    |
| 652   | 3,500 | C10P3 | -        | E03,06 |
| 650   | 1,125 | C08P3 | -        | E20    |
| 649   | 2,050 | R06P3 | -        | E38    |
| 649   | 1,300 | C10P3 |          | D10    |
| 641   | 1,675 | C12P3 |          | E04    |
| 632   | 1,050 | C09P3 |          | E29    |
| 629   | 1,275 | C09P3 |          | E01    |
| 547   | 1,725 | C09P3 |          | E18    |
| 531   | 2,000 | C09P3 | -        | E13    |

Figure 1 (a) illustrates the original reservation table for the example description named CISC-3. Since it was a complex instruction set computer (CICS), execution phase took 2 times longer period in comparison with fetch and decode.

The easiest assignment for the junior students was to modify the design into a reduced instruction set computer (RISC) which does not require the operand fetch. The reservation table for the new RISC is illustrated in figure1 (b). More difficult assignment was to introduce a special stage for the operand fetch preserving the CISC architecture as illustrated in the figure 1 (c). Since both of these modifications brings two times faster throughputs, we expected many students to try both of these challenges with great enthusiasm and get deep understanding of the fine grain parallel processors with many original devices.

Table 1 shows the result in 2000, which was the last year for the pipelining design education using the example description CISC-3. In this year, 39 students out of 49 junior students succeeded to complete the entire design and fabrication phases. The last column of table 1 indicates the students ID. One of the designs was a collaboration. The first column indicates the number of the lines in Verilog HDL for each design. The second column is for the roughly estimated scales in the number of the gates. The letters AO stand for architecture and organization. For example, R13P3 indicates a pipelined RISC having 13 instructions and 3 stages. C14P3 is a pipelined CISC having 14 instructions and 3 stages. In the column "features," the letter B stands for branch prediction. The letters "I/E" stand for internal/external interruption handling respectively. The letter "M" stands for multiplication. The letter "D" is for division.

The pipelining education was a failure. According to the first column of table 1, the average number of the lines of the modification was 127 and the standard deviation was only 162 lines. The column AO indicates that none of the students tried the second assignment to introduce a special stage for the operand fetch. The column for the features indicates that only 3 students succeeded to introduce their original devices except internal/external interruption handling. The columns for lines and AO indicate that 20 out of 39 students used the example CISC-3 as it was with few modifications. Moreover the completion ratio was 80%.

### 3. Legitimate peripheral participation (LPP)

Our problem was the result described above where we failed to initiate junior students into creative work in our fine grain microprocessor design education. The standard deviation of the number of the lines of the modification by the students, which was only 162 in 2000, is one of the barometers of the creativity. The students only modified

the given example description into RISCs according to the scaffolding given as the easiest assignment by us. There were no original devices.

To put it simply, the pipelining was too difficult. For the pipelining, we have to divide the computer organization into modules for the pipeline stages before tuning the behavior of each stage. The steps we taught were the design steps themselves for the professional designers, which were too difficult to learn. We looked for a new method to improve this situation.

We found a theory known as the legitimate peripheral participation (LPP) introduced by Lave and Wenger[2], who noticed the importance of the situated nature of learning. They investigated the tailors in West Africa. The steps of the apprenticeship were reversed production steps, which have the effect of focusing the apprentices' attention first on the broad outline of the product construction. The apprentices begin by learning the finishing stages of producing a garment, go on to learn to sew it, and only later learn to cut it out. Each step offers the unstated opportunity to consider how the previous step contributes to the present one. In addition, this ordering minimizes experiences of failure and especially of serious failure.

The learning of each operation is subdivided into "wayin" and "practice." "Way-in" refers to period of observation and attempts to construct a first approximation of the garment. In the practice phase, apprentices reproduce a production segment from beginning to end.

We paid attention to the fact that the learning steps are reversed production steps. This is for the learners to have an opportunity to get the broad outline of the product. The division of the modules for the pipelining was very difficult for the junior students, since it was "cutting" at the first step for the pipelining. We noticed that we should use a subject treated at the final stage of the design phases. We also noticed that if we could find a subject which is the heart of the product, the subject will be a very good way-in for the LPP. This is our extension. The subject for the way-in is desirable to be central part of the product.

The answer was the instruction issue logic for the superscalar microprocessors. The logic is tuned at the last stage of the design phases and is the central part of the superscalar microprocessor. The "practice" is expected to be done through the effort to run the application program on their machines. In our educational environment City-1, the specification is given only by showing an application program that should run their machines. We had been using Euclidean algorithm, which calculates the greatest common measure (GCM) and the least common multiple (LCM) for the given inputs. We decided to continue to use the same application program for the new program, expecting the completion ratio to be increased through the practice phase.

#### 4. Superscalar processor design education

We started superscalar microprocessor design education in 2001 to turn the students' attention to the instruction issue logic for the superscalar microprocessors [4] from the module organization for the pipelining. The first 3 years were a trial, which appeared to be prospective.

Figure 2 illustrates the organization of the new example description named RISC-3FB4, which has FIFO buffer as the instruction widow between decoders and functional units. RISC-3FB4 was written by one of the authors in 3,725 lines Verilog HDL in 2005 for the following 3 years. The instruction issue logic, which had been specified incompletely on purpose, is beside the FIFO buffer.



Fig. 2 RISC-3FB4 organization.

Table 2 illustrates the result of 2007, which is the 3rd year after we introduced RISC-3FB4 with 200,000 gates FPGA (Xilinx XC2S200-5PO208) required for the new program instead of prior 10,000 gates FPGA (Xilinx XC4010E-PG191) for the pipelining design education. In 2007, 50 out of 53 junior students succeeded to complete the entire design and fabrication phases. The letters "ea" in the last column stand for "et al." which means that those were collaborations. The first column again indicates the number of the lines in Verilog HDL for each design. The second column is for the number of the gates. The letters for architecture AO again stand and organization. "V48P2" indicates that the machine was a pipelined VLIW having 48 instructions and 2 stages. In this column, the letters "S3" is for pipelined superscalar processor having 3 stages. In the column for the features, "I" is for internal interruption. "E" is for the external interruption like the table 1.

| Lines  | Gates  | AO          | Features | ID    |
|--------|--------|-------------|----------|-------|
| 10,238 | 29,310 | R181V44S3V3 | IEMDAC   | L29   |
| 4,576  | 12,319 | R14S3       | EAS      | L34   |
| 4,423  | 27,621 | R14S3       | IEMDA    | L12   |
| 4,392  | 10,506 | R16S3       | IEMDAC   | L30   |
| 4,377  | 16,937 | R16S3       | IEA      | L50   |
| 4,315  | 9,679  | R15S3       | IEMDA    | L11   |
| 4,252  | 9,645  | R13S3       | -        | L41ea |
| 4,235  | 9,221  | R14S3       | -        | L02   |
| 4,139  | 12,286 | R17S3       | IEMDAW   | L19   |
| 4,095  | 11,961 | R15S3       | IEMDA    | L03   |
| 4,028  | 8,575  | R14S3       | -        | L31   |
| 4,004  | 9,613  | R13S3       | IE       | L36   |
| 3,982  | 8,596  | R14S3       | -        | L28   |
| 3,974  | 8,424  | R13S3       | -        | L24   |
| 3,942  | 12,373 | R13S3       | IEA      | L51   |
| 3,933  | 8,492  | R12S3       | IM       | L05   |
| 3,925  | 8,513  | R15P3       | 24bit    | L21ea |
| 3,920  | 14,644 | R34S3       | IEMDC    | L38   |
| 3,912  | 8,217  | R13S3       | -        | L39   |
| 3,905  | 6,305  | R13S3       | IE       | L37   |
| 3,868  | 8,721  | R16S3       | IEGL     | L54   |
| 3,842  | 8,690  | R16S3       | IEGL     | L18   |
| 3,826  | 6,351  | R14S3       | IE       | L52   |
| 3,822  | 23,254 | R16S3       | IEMDGL   | L23   |
| 3,817  | 14,834 | R15S3       | IEMA     | L43   |
| 3,806  | 6,633  | R08S3       | EMD      | J14   |
| 3,801  | 6,633  | R13S3       | MD       | L35   |
| 3,785  | 7,121  | R11S3       | IE       | L14   |
| 3,778  | 8,060  | R12S3       | Ι        | K03   |
| 3,727  | 8,191  | R13S3       | IE       | L49   |
| 3,713  | 9,173  | R13S3       | IE       | L04   |
| 3,713  | 8,755  | R15S3       | IE       | L08   |
| 3,709  | 3,651  | R11S3       | -        | L25   |
| 3,694  | 8,669  | R08S3       | D        | L13ea |
| 3,677  | 6,856  | R12S3       | -        | L42   |
| 3,665  | 8,737  | R15S3       | IE       | L33   |
| 3,664  | 6,351  | R10S3       | -        | L06   |
| 3,659  | 10,101 | R13S3       | D        | L47ea |
| 3,645  | 8,915  | R11S3       | -        | L01   |
| 3,629  | 6,816  | R12S3       | М        | L45   |
| 3,612  | 6,971  | R11S3       | IE       | L09   |
| 3,605  | 6,362  | R09S3       | Ι        | L07   |
| 3,596  | 6,944  | R11S3       | -        | L15   |
| 3,042  | 6,303  | R12S3       | -        | L20   |
| 798    | 24,108 | V48P2       | W        | L17   |

"M" is for the multiplication, "D" is for the division. These students introduced special instructions for "A" is for a special multiplication and division. instruction for the Euclidean algorithm. The machines having the feature indicated by the letter "A" were implemented by introducing special units to calculate GCM and LCM. The letter "G" is for a separated algorithm for GCM. The letter "L" is for the LCM. The letter "C" in the column "Feature" indicates that the machine can handle subroutine call and return by using appropriate stack in the memory. The letter "S" indicates that the machine can handle speculations to improve the branch penalties. The letter "W" indicates that the instruction memory bandwidth is doubled in comparison with other implementations by using appropriate clock signals.

According to the first column of table 2, the average number of the lines of the modification was 407 and the standard deviation was 975 lines. The column AO indicates that some of the students modified the pipelining. The column for the features indicates that 20 students succeeded to introduce their original devices in addition to the simple internal/external interruption handling.

The result of the prior 3 years since 2004 were similar to the result described above. There existed a student, in 2006, who completed a pipelined superscalar CISC having 4 stages, which was far beyond the second assignment for the pipelining design education in the past.

The remarkable point is the fact that the students have begun to use their head to create their original designs. They could have an opportunity to understand the very mechanism of the superscalar microprocessor, since the way-in was the central part of the product and ,with this understanding, 50 out of 53(94%) students succeeded to pass the practice phase to complete.

Figure 3 is an example of the 200,000 gates FPGA computer created by a student in 2007.



Fig. 3 An FPGA computer by a student in 2007.

## 5. Conclusion

A new training method for instructing newcomers in the field of fine grain microprocessor design is proposed considering situated nature of learning. The key point is to find the heart of the product designed at the last stage of the design phases. The idea to use the reversed production steps for the education is by the LPP proposed by Lave and Wenger. We extended the idea to the choice of the way-in as the subject which should be the central part of the product. If you find such a component in a product, similar fruitful result is expected in the field of microelectronic systems designs including those for embedded systems.

#### References

- [1] Ney Laert Vilar Calazans and Fernando Gehm Moraes, "Integrating the Teaching of Computer Organization and Architecture with Digital Hardware Design Early in Undergraduate Courses," IEEE Trans. Educ., vol.44, No.2, pp.109-119, May 2001.
- [2] Jean Lave and Etienne Wenger, Situated learning, Legitimate peripheral participation. Cambridge university press, 1991.
- [3] Peter M. Kogge, *The Architecture of Pipelined Computers*, McGraw-Hill Book Company, 1981
- [4] Mike Johnson, *Superscalar Microprocessor Design*, P T R Prentice Hall, Inc., 1991



**Ryuichi Takahashi** received the B.S. degree in physics from Waseda University in 1978 and M.E. degree in information processing from Tokyo Institute of Technology in 1981. During 1981-1991, he worked for NEC Corp. as a researcher as well as a VLSI engineer. In 1991, he moved to Tokyo Institute of Technology, where he had been having a class for microcomputer design using TTL as an assistant

professor.. He joined Hiroshima City University in 1994, where he is currently an associate professor on faculty of information sciences. He received excellent educator award from Information Processing Society of Japan in 2004 for his educational activity known as City-1.



Hajime Ohiwa is a professor of Teikyo Heisei University and an emeritus professor of Keio University, where he had been a professor of Environmental Information. Before joining Keio, he was a faculty member of Toyohashi University of Technology from 1978 to 1992. He received his BS(1965), MS(1967), and DSc(1971) in physics from the University of Tokyo. He was a British Council Scholar at Cavendish Laboratory of Cambridge University (1976-78) and a visiting associate professor of Cornell University (1979-80). His research interest was charged particle optics with its application to micro-fabrication, and nonlinear optimization. Since joining Toyohashi University of Technology, he started research on software and cognitive engineering, including keyboard training, teaching programming from novices to professionals, and requirement acquisition methodology.



Yoshiyasu Takefuji is a tenured professor on faculty of environmental information at Keio University since April 1992 and was on tenured faculty of Electrical Engineering at Case Western Reserve University since 1988. Before joining Case, he taught at the University of South Florida for 2 years and the University of South Carolina for 3 years. He received his BS (1978), MS (1980), and Ph.D.

(1983) in Electrical Engineering from Keio University. His research interests focus on neural computing, security, electronic toys. He received the National Science Foundation Research Initiation Award in 1989, the distinct service award from IEEE Trans. on Neural Networks in 1992, the TEPCO research award in 1993, the Takayanagi research award in 1995, the Kanagawa Academy of Science and Technology research award in 1993, the best courseware award from Asia multimedia forum in 1999, the best paper award of Information Processing Society of Japan in 1980, special research award from the US air force office of scientific research in 2003, chairman award from JICA in 2004. He authors 25 books including neural network parallel computing in 1992, and has published more than 200 papers.