MicroProcessors

AMD 29000 Family

Motorola 680x0 Family

Motorola was late to the 16-bit microprocessor party, so it decided to arrive in style. The hybrid 16-bit/32-bit MC68000 packed in 68,000 transistors, more than double the number of Intel's 8086. Internally it was a 32-bit processor, but a 32-bit address and/or data bus would have made it prohibitively expensive, so the 68000 used 24-bit address and 16-bit data lines.

— Chip Hall of Fame: Motorola MC68000 Microprocessor (IEEE Spectrum, 30 June 2017)

In 1979, the 16-bit microprocessor competition included:

Year	Vendor	Chip	Transistors	Comment
1976	TI	TMS9900	8,000
1978	Intel	8086	29,000
1979	Intel	8088	29,000	8086 with 8-bit bus
1979	Zilog	Z8000	17,500	Not Z80 compatible

Motorola decided to build a 32-bit architecture to compete in the 16-bit microprocessor space. The first physical implementation would be 16-bit in places, but the base would laid for a true 32-bit chip in the near future. And all the software written for the 68000 would (in theory) be able to run on future fully 32-bit implementations.

The 68000 was a success, but came with tradeoffs. The chip came on a 68-pin package, which was more expensive than the 40-pin package that the Intel 8088 used. This mattered when IBM was selecting a CPU for its IBM PC product.

Additionally, the followup 68020 added a number of addressing modes that would create challenges for high performance implementations later on.

At the "Oral History Panel on the Development and Promotion of the Motorola 68000" panel session held on July 23, 2007 in Austin, Texas, Tom Gunter says this when asked what decisions could have been made differently:

Well the main thing that we did is we added complexity to the machine, especially in transitions. As we went up the chain a little bit to 68010, 68020, we created a monster in terms of all the addressing modes that we had. We thought that adding more addressing modes was the way you made a machine more powerful, totally contrary to the principle of RISC later.

Motorola 88000 Family

Motorola introduced the 68000 family of CPUs in 1979, but by the mid-1980s scaling the performance competitively against Intel's x86 family and the newer RISC chips was becoming a problem. DEC would face a similar problem scaling VAX performance at around the same time and DEC's solution was to build a RISC CPU, the DEC Alpha. Motorola's solution was the same, and Motorola produced the 88000 RISC architecture.

Unfortunately for Motorola, most of the 680x0 workstation vendors had already switched to other RISC chips by the time Motorola delivered the first 88000 CPUs and while Apple considered the 88000 (as well as many other RISC CPUs) and even developed an engineering prototype around the chip, Apple eventually decided to go with an IBM POWER derivative, the PowerPC, to be developed jointly by IBM, Apple and Motorola. This effectively ended the 88000 family.

DEC Alpha Family

DEC was extremely late to the RISC CPU game but eventually delivered the Alpha architecture. Richard L Sites provides a nice writeup (in Alpha AXP Architecture with a copy here) discussing the architecture and how architectual choices were made for the Alpha:

This paper discusses the architecture from a number of points of view. It begins by making the distinction between architecture and implementation. The paper then states the overriding architectural goals and discusses a number of key architectural decisions that were derived directly from these goals. The key decisions distinguish the Alpha AXP architecture from other architectures. The remaining sections of the paper discuss the architecture in more detail, from data and instruction formats through the detailed instruction set. The paper concludes with a discussion of the designed-in future growth of the architecture. An Appendix explains some of the key technical terms used in this paper. These terms are highlighted with an asterisk in the text.

It is interesting to compare the DEC choices for Alpha as described in "Alpha AXP Architecture" with HP's choices for PA-RISC in "Hewlett-Packard Precision Architecture: The Processor."

MCST Elbrus-2000 Family

The Elbrus name has been used for a number of Soviet (and now Russian) computers, not all of which have the same ISA or architecture.

Starting in the mid to late 1980s the Soviet Institute of Fine Mechanics and Computer Engineering organization began working on a VLIW computer. This computer 'arrived' as the 512-bit word Elbrus-3 in the early 1990s, but the collapse of the Russian economy following the dissolusion of the Soviet Union meant that there was no customer and the computer never made it into production.

Work on the architecture continued starting in 1992 at the descendent company "Moscow Center of SPARC Technologies" (though this architecture had nothing to do with SPARC) and the Elbrus-2000 CPU was essentially an Elbrus-3 implemented as a microprocessor.

Over time further chips were released with higher frequencies, higher core counts, more bandwidth, etc.

Intel i860 Family

Intel has been aware that CPUs without the x86 architectures legacy could be simpler and cleaner than those required to be backward compatible with x86.

The i432 was Intel's first attempt to replace x86 that made it to market. The i432 failed and Intel shipped the 80286 followed by the 32-bit 80386. But the x86 architecture still had a lot of legacy bits that would be nice to jetison.

By the mid to lat 1980s Intel could see RISC chips such as SPARC and MIPS providing high performance without being encumbered by x86 legacy support. Intel could make a RISC chip too, and Intel's next attempt to get away from x86 was the 80860, a 64-bit RISC chip whose first implemenation was superscalar and which came with an on-chip FPU. On paper, the chip looked like a formidable competitor to the 80486 and also to the contemporaneous SPARC and MIPS CPUs.

However, dropping the legacy x86 cruft mean that the 80860 was not backward compatible with x86 and thus could not run existing DOS software. Intel was (wisely) unwilling to abandon x86 and bet everything on the new 80860 and allowed the 80486 to compete with the 80860. The 80486 was fast enough and could run existing software and the 80860 thus had only two generations of implementation.

Intel i960 Family

ARM Family

Acorn Computer, founded in 1978, produced a number of personal computers, primarily sold in the United Kingdom. Initial computers were built around the 8-bit 6502 CPU, the same CPU used in the Apple II and Commodore 64 computers.

Intel Itanium Family

Intel had created several potential replacements for x86 before 1990. The replacements all failed to replace the x86, though some of them found niche success.

The i432 architecture in the late 1970s failed on launch, delivering a bad combination of expensive and slow.

In the mid-1980s Intel decided to try again and produced the i960 chip in early 1986.

Then, in 1989, with RISC architectures all the rage and Intel released its own RISC, the i860.

The i860 wound up competing with the 80486 from Intel and the 80486 had a huge installed base and pushed the clock speed much higher than the i860 ever managed.

Intel did not give up on replacing the x86 architecture. In the late 1990s several concerns from HP and Intel converged:

Hewlett-Packard was realizing the future PA-RISC development would get more and more expensive, but the PA-RISC market was not growing. Eventually further development costs for a niche CPU would be economically non-viable.
There was growing concern that further Out-of-Order performance scaling would be poor. A variation of VLIW was seen as a way to get around this. HP was exploring VLIW architectures.
Intel was unhappy about the existence of x86 clone vendors, especially AMD.
The x86 architecture was limited to 32-bit addresses and this was expected to be a problem in the near future. A 64-bit machine would address this problem.

One solution to many of these problems was:

For Intel and HP to jointly develop a new CPU architecture. Intel would be responsible for manufacturing the chips.
The architecture would be VLIW instead of Out-of-Order Superscalar.
The architecture would be 64-bit rather than 32-bit.
The instruction set could be protected by patents that would prevent clones.

The result was Itanium, announced in 1994. The first chip, Merced, was to ship in 1999, but eventually slipped to 2001. In addition, the first chip delivered substantially worse performance than desired.

Meanwhile, AMD had been working on a 64-bit extension to x86. AMD had no other realistic choice if it wanted to remain in the high-end CPU business: Intel would not license the Itanium architecture since one of the points to Itanium was to prevent clones and a brand new 64-bit architecture from AMD would not generate enough industry support. This 64-bit extension work was announced in 1999, the specfication released in 2000 and the first chip implementing the 64-bit x86 architecture, Opteron, released in 2003.

The combination of poor Itanium performance and good Opteron performance created a large dis-incentive to Itanium adoption. For customers, x86 backward compatibility was a desireble feature and multiple vendors (Intel and AMD) were also a good thing.

Without a performance advantage, Itanium offered nothing to existing x86 customers that 64-bit x86 chips did not and Itanium only made substantial sales into the Unix workstation and server market. Eventually, HP was the only substantial Itanium customer and the sales volume was not large enough for Intel to want to remain in the business. The last Itanium shipments were in 2021.

MIPS Family

From 1981 to 1984 John L. Hennessy's research group at Stanford worked at creating a RISC (Reduced Instruction Set Computer) CPU that would be competitive with commercial CPU offerings. An early paper on the Stanford MIPS implementation ("MIPS: A Microprocessor Architcture (1982)") provided the following comparison against a Motorola 68000 CPU when running the Puzzle Benchmark:

	Motorola 68000	MIPS
Transistor Count	65,000	25,000
Clock speed	8 MHz	8 MHz
Data path width	16 bits	16 bits
Static Instruction Count	1300	647
Static Instruction Bytes	5360	2588
Execution Time (sec)	26.5	6.5

This research led to the formation of MIPS Computer Systems and the MIPS commercial CPUs beginning with the R2000 in 1986. The chip's primary competition at the time of introduction was Intel's 80386 and Motorola's 68020. The R2000 chip and future MIPS chips were used in numerous workstations, but as the workstaton market consolidated MIPS lost customers.

In 1992 MIPS Computer Systems was acquired by SGI (Silicon Graphics). The New York Times reported on March 13, 1992:

Silicon Graphics Inc., the leading maker of the computer work stations that engineers, architects and movie artists use to fashion three-dimensional images, said today that it was buying MIPS Computer Systems Inc. in a stock swap valued at about $406.1 million.

The merger will unite two of Silicon Valley's leading electronics companies to create an enterprise with revenues approaching $1 billion.

MIPS designs the microprocessors that serve as the brains for many of the most powerful desktop computers, including the work stations made by Silicon Graphics. Once hailed as the next Intel Corporation for its advanced designs, MIPS has suffered from inconsistent profits and employee defections, including the departure last month of the company's president, Charles M. Boesenberg.

Customers have been departing as well. Prime Computer and Groupe Bull of France recently stopped making computers based on MIPS chips. And Digital Equipment, MIPS's largest customer, has said it will begin producing a competing microprocessor of its own.

Silicon Graphics is so dependent on the MIPS microprocessor that it has already gone so far as to develop its own version of the chip, and was reported to be in talks with Toshiba of Japan about manufacturing it.

...

The close relationship between Silicon Graphics and MIPS dates to the early 1980's, when the founders of both companies were all professors at Stanford University. Silicon Graphics was the first customer for MIPS's chip designs.

In addition to SGI workstations, MIPS chips saw some success in the game station (the Sony PlayStation 2 and Nintendo 64 were built on MIPS CPUs) and embedded markets.

At the end of the 1990s SGI moved away from the MIPS architecture in favor of Intel's Itanium which ended the MIPS architecture's future as a high end workstation and server chip. SGI went bankrupt in 2009 and the MIPS business, by then almost entirely focused on embedded markets was sold off.

Except ... that China had picked up the MIPS architecture with the Loongson family of chips, beginning with the Godson-1 in 2002. Loongson is still being developed today for Chinese internal use.

NEC V Family

National Semiconductor 32000 Family

National Semiconductor had produced microprocessors since the early 1970s. The multi-chip 16-bit IMP-16 came out in 1973. In 1974 National released a single chip implementation, the PACE (and in later yeas the INS8900). Neither sold well and the major 16-bit microprocessors in the late 1970s were the TMS9900, 8086/8088 and Z8000.

National was not ready to give up and in 1982 produced the 32016. The 32016 was not backward compatible with the earlier IMP-16 and PACE chips, though with little installed base for those chips this was not important.

Unfortunately for National, the 68000 had been out for a few years before the 32016 and the 32016 sold poorly. Things did not improve with later chips in the family and National discontinued the line after the 32532.

As the 680x0 family petered out in the late 1980s, Motorola simplified the 680x0 family into the Coldfire family of chips and targeted the embedded market with the Coldfire. National did something similar with the 32000 family and the result was the Swordfish family of CPU/DSP chips. Showing some awareness of the CPU environment at the time, National advertised these chips as RISC and the Swordfish chips turned the 32000 instructions into internal VLIW instructions (much like Intel eventually turned x86 instructions into something much closer to RISC internal instructions).

HP PA-RISC Family

HP delivered its first minicomputer, the HP 2100, in 1966. Initially intended to support HP instruments, the company discovered that businesses were purchasing the computers for normal business computer applications and HP found itself to be a general minicomputer vendor.

Over time, HP found itself with a number of incompatible computer architectures (much like IBM before the System/360 project and much like DEC before VAX) and HP decided to consolidate them into a single, new architecture. This was PA-RISC.

IBM POWER Family

Sun SPARC Family

SPARC Microprocessor Oral History Panel Session One Origin and Evolution

Inmos Transputer Family

Japan Tron Family

DEC VAX Family

DEC (Digital Equipment Corporation) introduced its 32-bit VAX minicomputer in 1977. The first member of the computer family, the VAX 11/780, had a CPU consisting of 21 distinct cards (and so was not a microprocessor). In 1985 DEC introduced the first microprocessor implementation of the VAX architecture with the MicroVAX 78032. Eventually even the mainstream VAX minicomputers were built with microprocessor CPUs.

VAX performance failed to scale competitively against RISC CPUs and even against x86 CPUs and DEC eventually replaced the VAX architecture with DEC's homegrown Alpha RISC architecture.

Western Electric 32000 Family

Fairchild Clipper Family

Clipper was not Fairchild's first CPU — that would be the 8-bit Fairchild F8 released in 1975. Fairchild does not seem to have produced a 16-bit CPU and in the mid-1980s released the Clipper C100. The chip did not have commercial success, competing with established chips such as the 80386 and 68030 as well as RISC chips such as SPARC and MIPS.

The single large Clipper customer, Intergraph, purchased the Clipper division from Fairchild after the C100 had shipped. A few more generations of Clipper were developed before the architecture was abandoned.

Intel 80x86 CPUs

The Intel 80386 was a 32-bit microprocessor backward compatible with the 16-bit 80286 (introduced in 1982). The 80286 was backward compatible with the 16-bit 8086 chip (introduced in 1978). And the 8086 was assembly source compatible with the earlier 8-bit 8080, which meant that 8080 assembly programs could be re-assembled to run on the 8086 even though 8080 object code would not run on the 8086.

The 8080 was introduced in 1974, so by the mid-1980s Intel was building a 32-bit microprocessor that had binary compatibility with 16-bit CPUs from 1978 and assembly compatibility with one of the earliest 8-bit processors ever.

Some of the weirdness and limitations can be explained by this lineage.

Through the last years of the 1980s the 80386 and its descendent chip the 80486 were competing — to the extent that they were competing with any other chips — with the Motorola 680x0 family of chips. This competition manifested itself as IBM PCs running DOS vs Apple Macintosh computers.

By 1990 the 680x0 family had failed to scale performance competitively with the x86 family and in 1991 Apple announced that it was switching the Macintosh computer line away from the 680x0 to the (new-ish) PowerPC architecture.

Also, by 1990 RISC CPUs had appeared — PowerPC was one such RISC — and Intel had to worry about RISC CPUs such as MIPS and SPARC as well as PowerPC competing for x86 chip sales.

The Intel Pentium Pro, introduced in 1995, showed that Intel could ship an x86 compatible CPU that was integer performance competitive with the best RISC chips available. The Pentium Pro pretty much ended the hopes of RISC partisans that the x86 would be replaced with a cleaner/prettier architecture: the Intel x86 performance parity combined with lower prices and backward compatibility to all of the DOS and Windows codebase made switching to RISC not viable.

Intel still had competition, however, because AMD was shipping x86 compatible CPUs of its own. In 1999 AMD shipped its Athlon line of chips which was more than competitive with Intel's, then current, Pentium III family of chips.

AMD 80x86 CPUs

Intel had a second source agreement with AMD which allowed AMD to manufacture 8086, 8088 and 80286 Intel CPU designs. Beginning with the 80386 Intel refused to deliver chip designs for AMD.

AMD eventually reverse engineered the Intel chip and the Am386 was the AMD reverse engineered version of the 80386. Because the reverse engineering was at the transistor level, the chips behaved identically. AMD did the same thing to reverse engineer the 80486.

TODO ...

Other 80x86 CPUs

Motorola 68000

The Motorola 68000 was one of the first commercially available 32-bit microprocessors. Logically it was a 32-bit machine (32-bit integer registers, 32-bit addresses) though physically it was not. Only 24-bits of physical addressing were provided so no more than 16 MB of physical DRAM could be addressed. In 1979 this was not a practical limit — the IBM 3033 mainframe of the time limited users to no more than 8 MB of physical DRAM (and most 64-bit processors don't support 64-bits worth of physical DRAM. More important to performance, the data bus to DRAM was only 16 bits wide (which kept costs down, though 32-bit loads would take two cycles) and the ALU was physically only 16-bits wide. Motorola would sometimes claim that it was a 16/32-bit CPU.

The 68000 was the CPU selected for many inexpensive Unix workstations including the Sun-1 (1982), the HP Series 200 workstations, the Apollo DN100 and the SGI Iris 1000 series workstations. The first Apple Macintosh computer used a 68000 as did the Commodore Amiga.

Introduced: 1979 (shipments in late 1980)
In-order
Integer Pipeline: 3 stages
User Visible Integer Registers: 16
On-chip Caches: No
FPU: No
1979: 4, 6, 8 MHz
1981: 10 MHz
1982: 12.5 MHz
68,000 transistors
Die Size: ~50 mm² (6.24 by 7.14 mm) @ 3.5 µm process

IEEE Spectrum, 30 June 2017:
Design Philosophy Chip Hall of Fame: Motorola MC68000 Microprocessor
Apr 1983 Byte Magazine:
Design Philosophy Behind Motorola's MC68000, by Thomas W. Starnes
Sep 1986 Byte Magazine:
A Comparison of MC68000 Family Processors, by Thomas L. Johnson

Motorola 68010

The 68010 was a slight tweak to the 68000. It mostly appeared the same to programmers. It was not popular.

Introduced: 1982

Motorola 68020

The 68020 was a fully 32-bit CPU: 32 bit addresses (both logical and physical) and 32 bit data paths. The ALU was 32-bits wide and thus twice as fast per clock as the 68000.

The Macintosh II family used the 68020 as did Sun-3 family of workstations.

The 68020 introduced more addressing modes for instructions (thus, in some sense, making the chip even more CISC-y than the original 68000). The more complicated addressing modes are often blamed for the 680x0 family's failure scale the clock speed as well as the 80x86 family and also for making a super-scalar implementation of the 680x0 family (needed to compete with the Intel Pentium) excessively challenging. By 1994 Intel had pushed the clock speed of the 2-issue super-scalar Pentium to 120 MHz. In 1994, when Motorola released the first 2-issue super-scalar 680x0 CPU — the 68060 — they had a chip that ran at 50 MHz, and never exceeded 75 MHz.

Introduced: 1984
In-order
Integer Pipeline: 3 stages
User Visible Integer Registers: 16
On-chip Caches: 256 bytes
FPU: 68881 co-processor chip
12.5 MHz - 33 MHz
200,000 transistors
Die Size: ~50 mm² @ 2.25 µm process

Motorola 68030

Basically a 68020 with an on-chip memory manager and a faster clock rate.

The Macintosh II family used the 68030 as did Sun-3 family of workstations (both also used the 68020). From the New York Times, Sept. 19, 1986:

Motorola Inc. stepped up its increasingly heated race with the Intel Corporation yesterday, introducing a new generation of microprocessors that it said could put the power of a mainframe computer on a single chip. The new chip, called the Motorola 68030, comes only two and a half years after the introduction of the company's current top-of-the-line microprocessor, the 68020. Motorola said the new chip would offer nearly twice the performance of its predecessor but would not be delivered until March 1987. Motorola's new entry comes just a week after Intel's top-of-the line microprocessor was incorporated for the first time in an I.B.M.-compatible personal computer, made by the Compaq Computer Corporation. "Our effort is to maintain a performance advantage of 300 to 400 percent over the competition," said Murry Goldman, a Motorola senior vice president.

The Intel 80386 adequately matched up against the 68030 (transistor count, frequency, external bus widths, both chips had 3-stage in-order integer pipelines), though the 68030 did have the advantage of a pair of small on-chip caches. This was not good for Motorola given the 80386's installed base advantage with DOS.

Introduced: 1987
In-order
Integer Pipeline: 3 stages
User Visible Integer Registers: 16
On-chip Caches: 256 bytes I-cache; 256 bytes D-cache
FPU: 68881 co-processor chip
16 MHz - 50 MHz
273,000 transistors

Motorola 68040

The 68040 was a new 680x0 micro-architecture and was fully pipelined and thus could, at peak, sustain one instruction per clock cycle. Apple used the 68040 in the Macintosh Quadra. Sun had introduced the SPARC based Sun-4 by this time and never released a 68040 based computer. The NeXTstation shipped in 1990 with a 68040.

The 68040 matched up against the Intel 80486 in the Mac vs PC competition and while the two chips were evenly matched in 1990, the 80486 had been pushed to 66 MHz (with a 33 MHz bus) by 1992 while the 68040 never exceeded 40 MHz.

Introduced: 1990
Pipelined, In-order
Integer Pipeline: 6 stages
User Visible Integer Registers: 16
On-chip Caches: 4 KB I-cache; 4 KB D-cache
FPU: On-chip
25 MHz - 40 MHz
~1.2M transistors

Wikipedia: Motorola 68040
NXP: User Manual

Motorola 68060

This was the final Motorola 68K microprocessor. Apple, The sole remaining large volume desktop computer using the 68K family, replaced the 68K with PowerPC rather than move to the 68060. Microcontroller applications using 68K CPUs slowly migrated to the Coldfire family. Coldfire was basically a simplified 68K (e.g. remove BCD instructions, reduce the supported addressing modes).

Interestingly, a few decades after the 68060 an out-of-order 68080 called the Apollo 68080 was produced by a single individual .

Introduced: 1994
Super-scalar (2-issue), Pipelined, In-order
Integer Pipeline: 6 stages
User Visible Integer Registers: 16
On-chip Caches: 8 KB I-cache; 8 KB D-cache
FPU: On-chip
50 MHz - 75 MHz
~2.5M transistors

Wikipedia: Motorola 68060
NXP: User Manual

AT&T Bellmac 32A

This chip was supposed to come out in 1980 as the Bellmac 32, but the frequency was an unacceptable 2 MHz. The 32A revision had acceptable performance but arrived two years later. 18 addressing modes and 169 'basic intructions'.

Introduced: 1982
User Visible Integer Registers: 16
84 pins (63 used)
2.5 µm domino CMOS
10 MHz (2 MHz on the Bellmac 32)
146,000 transistors
Die Size: 103 mm² (160,000 mil²)

A film about the Bellmac32

Western Electric WE32100

From the AT&T WE® 32-Bit Microprocessors and Peripherals Data Book (1987):

The WE 32100 Microprocessor (CPU) is a high- performance, single-chip, 32-bit central processing unit designed for efficient operation in a high-level language environment. It performs all the system address generation, control, memory access, and processing functions required in a 32-bit microcomputer system. It has separate 32-bit address and data buses. System memory is addressed over the 32-bit address bus by using either physical or virtual addresses. Data is read or written over the 32-bit bidirectional data bus in word (32-bit), halfword (16-bit), or byte (8-bit) widths. Extensive addressing modes result in a symmetric, versatile, and powerful instruction set. The WE 32100 Microprocessor is available in 10-, 14-, and 18-MHz versions...

Features:

Powerful and versatile instruction set

On-chip instruction cache

4 Gbytes (2³²) of direct memory addressing

Physical and virtual addressing

Coprocessor interface

DMA and multiprocessor environment support

15 levels of interrupts

Autovector and nonmaskable interrupt facilities

Four levels of execution privilege: kernel, executive, supervisor, and user

Synchronous or asynchronous interfacing to external devices

Nine 32-bit general-purpose registers

Seven 32-bit special-purpose registers

Memory-mapped I/O

Complete floating-point support via the WE 32106 Math Acceleration Unit

SAN FRANCISCO — AT&T's Bell Laboratories recently announced a second-generation, 32-bit microprocessor said to take advantage of the Unix operating system and planned to be used in a series of AT&T 3B computers.

The chip, called the WE 32100, is the successor to the WE 32000, which was formerly known as the Bellmac-32A. It reportedly contains 180,000 transistors, about 30,000 more than the earlier chip, and uses Cmos technology.

"We have integrated many support functions onto the chip," said Hing C. So, head of the Very Large Scale Integration Department at Bell Labs. "We have also added features such as an internal cache memory and have increased the speed of the chip."

The cache memory reportedly can hold 64 32-bit words, and repeatedly used program instructions can be stored there so there is no waiting time, a spokesman said.

The WE 32100 reportedly operates at speeds up to 14 MHz, is packaged in a 132-pin package and dissipates approximately 1.9W in worst-case conditions.

— Computerworld, March 19, 1984, p104

64 word (32-bit) cache

AT&T WE 32-Bit Microprocessors and Peripherals
http://www.hartetechnologies.com/manuals/3b2/3b2_Assembly_Lang_Prog_Manual.pdf

Western Electric WE32200

From the AT&T WE® 32-Bit Microprocessors and Peripherals Data Book (1987):

The WE 32200 Microprocessor (CPU) is a high-performance, single-chip, 32-bit central processing unit designed for efficient operation in a high-level language environment. It is protocol and upward object code compatible with the WE 32100 Microprocessor. The WE 32100 CPU runs object code without modification on the WE 32200 CPU. The WE 32200 CPU performs all the system address generation, control, memory access, and processing functions required in a 32-bit microcomputer system. It has separate 32-bit address and data buses. System memory is addressed over the 32-bit address bus by using either physical or virtual addresses. Data is read or written over the 32-bit bidirectional data bus in word (32-bit), halfword (16-bit), or byte (8-bit) widths, using arbitrary byte alignment for data and instructions. Dynamic bus sizing allows the WE 32200 CPU to communicate with both 16-bit and 32-bit memories in the same system. Extensive addressing modes result in a symmetric, versatile, and powerful instruction set. The WE 32200 Microprocessor is available in 24-MHz and higher frequency ...

Features:

32-bit virtual memory microprocessor with 4 Gbytes (2³²) virtual memory space and up to 4 Gbytes physical addressing space

Efficient execution of high-level language programs

Extensive and orthogonal instruction set with 25 addressing modes

Arbitrary byte alignment for data and instructions

Direct support for process-oriented operating systems, such as UNIX System V

WE 32100 Microprocessor object code and protocol compatible

Four levels of execution privilege: kernel, executive, supervisor, and user

Fifteen levels of interrupt

High-performance, on-chip, 64 x 32 bit instruction cache

Byte replication on writes

Dynamic bus sizing for 16-bit and 32-bit data and instructions

Seventeen 32-bit general-purpose registers

Eight 32-bit general-purpose privileged registers

Seven 32-bit special-purpose registers

Memory-mapped I/O

Complete ANSI/IEEE Standard 754 floating-point support via the WE 32206 MAU Coprocessor

Development system support signals

General-purpose coprocessor interface

Synchronous or asynchronous interfacing to external devices

High priority two-wire bus arbitration through relinquish and retry

From the Dec 2, 1986 New York Times:

The American Telephone and Telegraph Company introduced a family of computer chips yesterday that includes a faster microprocessor than any yet brought to market.

Industry analysts said that, despite the chip's speed, its prospects were somewhat limited because A.T.&T. lacked relationships with computer makers that would be likely to incorporate the chip into their computer designs.

The WE 32200 microprocessor and related chips would be used to make top-of-the-line microcomputer systems that could handle several users doing several jobs at once, A.T.&T. said. A microprocessor is the brain inside personal computers and engineering work stations. The chips are taking work away from minicomputers and mainframes because they do some of the same jobs far more cheaply.

The WE 32200 will cost $500 each in quantities of 100 and is scheduled to be available in production quantities in the second half of 1987, A.T.&T. said. The chip will run at speeds starting at 24 million cycles a second and will be available at 30 million cycles a second by the end of 1987, the company said.

Apr 1989, IEEE Micro:
The AT&T WE32200 design challenge, by Victor K.L. Huang et.al.

National Semiconductor Corporation (NSC) 32016

Apr 1983 Byte Magazine:
The National Semiconductor NS16000 Microprocessor Family, by Glenn Leedy

National Semiconductor Corporation (NSC) 32032

National Semiconductor Corporation (NSC) 32332

This shipped in 1985 and lost out to the 68020 and 80386.

National Semiconductor Corporation (NSC) 32532

By the time the 32532 shipped, in 1987, it had lost in the market. The NSC 32K family evolved into the Swordfish CPU family. From the May-1991 32532 data manual:

NS32532-20/NS32532-25/NS32532-30
High-Performance 32-Bit Microprocessor

General Description
The NS32532 is a high-performance 32-bit microprocessor in the Series 32000 family. It is software compatible with the previous microprocessors in the family but with a greatly enhanced internal implementation.

The high-performance specifications are the result of a four-stage instruction pipeline, on-chip instruction and data caches, on-chip memory management unit and a significantly increased clock frequency. In addition, the system interface provides optimal support for applications spanning a wide range, from low-cost, real-time controllers to highly sophisticated, general purpose multiprocessor systems.

The NS32532 integrates more than 370,000 transistors fabricated in a 1.25 mm double-metal CMOS technology. The advanced technology and mainframe-like design of the device enable it to achieve more than 10 times the throughput of the NS32032 in typical applications.

In addition to generally improved performance, the NS32532 offers much faster interrupt service and task switching for real-time applications.

Features

Software compatible with the Series 32000 family

32-bit architecture and implementation

4-GByte uniform addressing space

On-chip memory management unit with 64-entry translation look-aside buffer

4-Stage instruction pipeline

512-Byte on-chip instruction cache

1024-Byte on-chip data cache

High-performance bus

Separate 32-bit address and data lines
Burst mode memory accessing
Dynamic bus sizing

Extensive multiprocessing support

Floating-point support via the NS32381 or NS32580

1.25 mm double-metal CMOS technology

175-pin PGA package

370,000 transistors
Die Size: 11.5 mm x 14 mm

Transputer T414 & 425

Transputers were unusual CPUs because transputers were expected to run in (inexpensive) clusters. Each transputer came with 4 serial links that could communicate with 4 other transputers. In addition, the chips came with a simple scheduler as part of the chip. The upshot was that transputers were expected to be programmed as highly concurrent systems where code could run on a single transputer with the scheduler arranging for the concurrent portions to share the chip but the same code was intended to also run on a cluster of transputers connected via the serial links. Greater performance was to be achieved by linking larger clusters of tranputers together.

Introduced: 1984
In-order stack based CPU
2 KB on-chip RAM (not cache)
FPU: No
15, 20 MHz

May 1985 Byte Magazine:
The Transputer, by Paul Walker

Transputer T800

The T800 increased the on-chip RAM of the T414 from 2 KB to 4 KB and added an on-chip floating point unit (as Intel added a floating point unit to the 80486 in 1989).

Introduced: 1987
in-order stack based CPU
Full 32-bit CPU
20, 25 MHz
4 KB on-chip RAM (not cache)
On chip FPU

Transputer T9000

The T9000 added a cache and was intended to be 10× faster then the T800. It missed its performance target and was eventually cancelled. I am unclear if it was ever commercially released.

Introduced (?): 1993
Pipelined superscalar CPU
Full 32-bit CPU
50 MHz
16 KB cache
On chip FPU

Intel 80386

The 80386 was not Intel's first 32-bit CPU — that honor belongs to the i432. The 80386 was, however, a 32-bit extension of the 8086/8088/80286 line of 16-bit CPUs. Because the 80386 was backward compatible with these other CPUs and because these other CPUs were used in IBM PC and PC clones the 80386 came with a huge (for the time) built in installed base.

In addition to having a built in installed base, Intel also benefitted from unilaterally ending the 2nd source agreement that was in place with AMD for earlier x86 CPUs. This meant that Intel was the sole supplier of 80386 chips until AMD successfully reverse engineered the part (around 1991 with the Am386). Being the sole source was financially very valuable for Intel.

Introduced: 1985
First 32-bit x86 CPU
In-order
Integer Pipeline: 3 stages
General Registers: 8 (EAX, EBX, ECX, EDX, EBP, ESP, ESI, and EDI)
On-chip Caches: No
FPU: Off-chip 80387
12 MHz - 33 MHz
275,000 transistors
Die Size: 104 mm² @ 1.5 µm process

Wikipedia: 80386
Sep 26, 1989, PC Magazine:
Power Progrmming, by Ray Duncan

Intel 80486

The pipelined 80486 enabled Intel to compete with the integer performance of the lower cost RISC chips of the time: R3000 and various SPARC chips. It was also competitive against the Motorola 68040 (released in 1990). The Intel 32-bit x86 architecture was uglier than 68K, MIPS or SPARC but the integer performance was competitive, the price was lower than the RISC chips and it was backward compatible with existing DOS software. This was enough for Intel to maintain their IBM PC and clone dominance.

Introduced: 1989
Pipelined, In-order
Integer Pipeline: 5 stages
User Visible Registers: 8
On-chip Caches: 8 KB L1 (later, 16 KB)
FPU: On-chip
1989: 20 MHz & 25 MHz
1990: 33 MHz
1991: 50 MHz
1992: 66 MHz (33 MHz external bus)
1994: 100 MHz (33 MHz external bus)
1.2M transistors
Die Size: ~165 mm² @ 1 µm process

Fall 1989 Byte Magazine IBM Special Edition:
The 80486: A Hardware Perspective, by Ron Satore
Feb 1990, IEEE Micro:
The i486 CPU: Executing Instructions in One Clock Cycle, by John H. Crawford
Wikipedia: 80486

Intel Pentium

x86 clone vendors such as AMD, Cyrix, Rise, Centaur began naming their chips after the Intel chips they were intending to match. Intel discovered that numbers (e.g. 80486) could not be trademarked. Cyrix, for example, produced the Cyrix Cx486SLC in 1992 with the '486' signalling that the chip was intended to be used where Intel 80486 chips would be used.

To make this chip-to-chip matching more difficult Intel didn't give the successor to the 80486 the obvious 80586 name but instead called it a 'Pentium'. This was only partially successful as everyone knew that '586' was the logical successor to 486 and so Cyrix eventually released the Cyrix 5x86 and Cyrix 6x86. Which Intel chips they were supposed to compete with was obvious. NexGen produced the Nx586. AMD produced the K5 and then the K6.

The 80486 showed that a CISC chip could be pipelined and execute one instruction per clock cycle, just like the RISC chips. Pentium showed that a CISC chip could also be super-scalar by executing multiple instructions per clock cycle. The 680x0 family was no longer a competitor as the workstation vendors had all shifted to RISC chips and Apple was switching to PowerPC. Pentium allowed Intel to continue to match up competitively with RISC.

It is interesting to note that the Pentium die size of almost 300 mm² is about 3× that of the 80386. Intel may have had little choice if it wanted to remain performance competitive, but also Intel began producing chips on 200 mm wafers in 1992 so the cost of ~300 mm² Pentium chips compared to ~100 mm² 80386 chips manufactured on 150 mm wafers might be less bad than it appears.

Introduced: 1993
Super-scalar (2-issue), Pipelined, In-order
Integer Pipeline: 5 stages
User Visible Registers: 8
On-chip Caches:
1993 - 1995: 16 KB cache
1996: 32KB cache
FPU: On-chip
1993: 60 MHz & 66 MHz
1994: 75 MHz - 120 MHz
1995: 133 MHz - 200 MHz
1997: 120 MHz - 300 MHz
3.1M transistors
Die Size: ~293 mm² (18 mm × 16 mm) @ 0.8 µm process

Wikipedia: P5 (microarchitecture)

Intel Pentium Pro

With the Pentium Pro Intel demonstrated that it could compete with ALL the microprocessor competition. The Pentium Pro briefly held the integer performance crown, even against workstation RISC CPUs that sold in much more expensive computers.

Vendor	Chip	Speed	SpecInt 95
Intel	Pentium Pro	200 MHz	8.09
DEC	Alpha 21164	300 MHz	7.43

DEC released a 350 MHz Alpha 21164 to retake the performance crown in early 1996, but the Pentium Pro had effectively put an end to the dream that the (more elegant) RISC CPUs would maintain and grow a performance lead over the (ugly) x86 CISC architecture CPUs. By mid-2000 the top integer performance chips by architecture were:

Vendor	Chip	Speed	SpecInt 95
DEC	Alpha 21164	833 MHz	50.0
Intel	Pentium III	1000 MHz	46.8
AMD	Athlon	1000 MHz	42.9
HP	PA RISC 8600	552 MHz	42.6

The Pentium Pro was expensive for an x86 CPU because the large L2 cache was in a second chip. Intel corrected this with the relatively budget Pentium-II that shipped a few years later.

Introduced: 1995
Super-scalar (3-issue), Pipelined, Out-of-order
Integer Pipeline: 10-14 stages (up from five for the Pentium)
User Visible Registers: 8
On-chip Caches:
8+8 KB L1 D+I cache
256, 512, 1024 KB L2 cache (separate die, though)
FPU: On-chip
150, 180, 200 MHz
5.5M transistors
Die Size: [Consider both CPU + L2 cache dies]

Apr 1996 IEEE Micro:
Tuning the Pentium Pro Microarchitecture, by David B. Papworth

Intel Pentium II

The Pentium II was a slightly ungraded Pentium Pro. The upgrades included:

Slower but less expensive off-chip L2 caches
A larger L1 cache (16+16 KB vs 8+8 KB for Pentium Pro)
Better 16-bit performance (important for lots of existing code)

Additionally, the Pentium-II clocked faster than the Pentium Pro.

Introduced: 1997
Pipelined, super-scalar (3-issue) out-of-order CPU
7.5M transistors
16+16 KB L1 D+I cache
512 KB L2 cache
233 MHz - 450 MHz

Intel Pentium III

Pentium III could easily have been more Pentium IIs. Mostly because of branding Intel called this Pentium III.

Introduced: 1999
Pipelined, super-scalar (3-issue) out-of-order CPU
16+16 KB L1 D+I cache
256 or 512 KB L2 cache
450 MHz - 1,000 MHz (failed at 1,133, then eventually 1,400)
SSE vector instructions

Intel Pentium 4

Pentium 4 was a new x86 micro-architecture, NetBurst, rather than an extension of the Pentium Pro. Pentium 4 was designed to eventually hit 10 GHz (it didn't!) and as a result of this the chip was very good at running branchless code and code that could fit inside its small 8 KB L1 data cache. For branchy code, including many legacy applications, the Pentium 4 was only competitive with contemporaneous AMD chips if the Pentium 4 had a substantial clock speed advantage.

To achieve the high frequency required for acceptable performance the Pentium 4 had many more pipeline stages than previous Intel CPUs. Initially 20 stages (in Willamette and Northwood), then 31 stages (in Prescott and Cedar Mill).

Rather than an L1 Instruction cache, the Pentium 4 had a Trace Cache. This Trace Cache seems to be a hardware implementation of the traces constructed by the Compiler for the VLIW computer Multiflow Trace computer systems.

Introduced: 2000

AMD Am386

The Am386 was a transistor-for-transistor reverse engineered version of the Intel 80386. Prior to the 80386 Intel and AMD had a cross-licensing agreement (insisted on by customers to mitigate the risk of a fab just 'losing' the ability to successfully manufacture chips). With the 80386 Intel refused to provide AMD with the necessary information to manufacture the chip. In 1987 AMD filed a lawsuit against Intel. A year later AMD began reverse engineering Intel's 80386 and in 1991 AMD was able to ship reverse engineered 80386 chips.

Because of how AMD reverse engineered the chip it is an exact copy of Intel's 80386, but AMD was able to clock their 80386 up to 40 MHz while the Intel 80386 topped out at 33 MHz.

Introduced: 1991
Up to 40 MHz (vs Intel 80386 33 MHz)

AMD Am486

As with the Am386, the Am486 was a transistor-for-transistor reverse engineered version of an Intel CPU, in this case the Intel 80486.

AMD shipped their Am486 in 1993, four years after Intel's 80486.

Introduced: 1993
25 MHz - 120 MHz (vs Intel 80486 20 - 100 MHz)

AMD Am5x86

In spite of the name, the AMD Am5x86 was essentially an upclocked Am486 with more cache.

Introduced: 1995
133 MHz max frequency

AMD K5

AMD K6

AMD K6-2

AMD Athlon

An out-of-order, three-way superscalar x86 microprocessor with a 15-stage pipeline, organized to allow 600-MHz operation, can fetch, decode, and retire up to three x86 instructions per cycle to independent integer and floating-point schedulers. The schedulers can simultaneously dispatch up to nine operations to seven integer and three floating-point execution resources. The cache subsystem and memory interface minimize effective memory latency and provide high bandwidth data transfers to and from these execution resources. The processor contains separate instruction and data caches, each 64 KB and two-way set associative. The data cache is banked and supports concurrent access by two loads or stores, each up to 64-b in length. The processor contains logic to directly control an external L2 cache. The L2 data interface is 64-b wide and supports bit rates up to 2/3 the processor clock rate. The system interface consists of a separate 64-b data bus. The die, shown in Fig. 1, is 1.84 cm and contains 22 million transistors. Table I shows the technology features. C4 solder-bump flip-chip technology is used to assemble the die into a ceramic 575-pin ball grid array (BGA). Measurements are from initial silicon evaluation unless otherwise stated.

A Seventh-Generation x86 Microprocessor from IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999

AMD Athlon64

AMD K8 (Hammer)

AMD K10 (Barcelona)

AMD Bulldozer

AMD Pildriver

AMD Steamroller

AMD Excavator

AMD Zen

AMD Zen2

AMD Zen3

AMD Zen4

Chips and Technologies Super386

Cyrix Cx486

NexGen Nx586

Cyrix Cx586

Cyrix Cx686

Rise mP6

Transmeta Crusoe

Transmeta was a company that built an x86 compatible CPU using a VLIW processor as the base hardware with software to translate x86 instructions to instructions for the native VLIW. Because the external software doesn't see the VLIW, Transmeta could change the underlying hardware between generations as long as the new hardware also shipped with new translation software.

Transmeta targeted the low power x86 market (e.g. tablets and power optimized laptops). It eventually failed when, in 2003, Intel began shipping Pentium-M x86 chips explicitly targetting low power.

Crusoe was the first generation Transmeta chip.

Introduced: 2000
TM3200: 333 MHz - 400 MHz
TM5400: 500 MHz - 800 MHz
TM3200: 64 KB L1 I-cache, 32 KB L1 D-cache
TM5400: 64 KB L1 I-cache, 64 KB L1 D-cache, 256 KB L2 cache
Microprocessor Report article: Transmeta Breaks x86 Low-Power Barrier

Via C3

Transmeta Efficeon

Via C7

Via Nano

MIPS R2000

MIPS produced the R2000, their first CPU, in 1986. Unlike Sun with SPARC chips and SPARC workstations, MIPS produced chips but not products that used those chips.

Introduced: 1986
110,000 transistors
80 mm²
8.3, 12.5 and 15 MHz
Scalar

MIPS R3000

This chip was used by a number of vendors including DEC (for the DECstation line ... which at the time was competing with the DEC VAXstation line and would soon also be competing with the AXP line of products built around the DEC Alpha chip) and Silicon Graphics.

Introduced: 1988
115,000 transistors
48 mm²
20 - 33.3 MHz
Scalar

MIPS R4000

With the R4000, MIPS extended the MIPS architecture to 64-bits. MIPS extended the instruction pipeline to eight stages, which allowed higher clock speeds.

Introduced: 1991
1.2 million transistors
8 KB I-cache, 8 KB D-Cache
100 MHz
On-chip IEEE754 FPU

MIPS R8000

MIPS R10000

The MIPS R10000 came out in 1997, which was SGI's year of highest revenue: $3.7B (as a comparison point, Sun Microsystems had $8.6B in revenue that year. By 2000 Sun revenue had grown to $15.7B).

In 1998 SGI decided that it could not longer compete with x86 and decided that newer machines would be based on Intel's (expected to be available in 1999) Itanium chips. CNet author Michael Kanellos contributed this on April 9, 1998:

Silicon Graphics scraps MIPS plans
Silicon Graphics (SGI) has quietly scrapped ambitious plans for its MIPS processors and is now following a much more limited road map calling for fewer design improvements, according to sources close to SGI.

The last major MIPS release for servers and workstations is scheduled for 1999, sources said.
MIPS roadmap
Model Speed Status

R10000 250 MHz available now

R12000 300 MHz due mid 1998

R14000 400 MHz due 2H 1999

"Beast" N/A canceled

"Capitan" N/A canceled

SGI will continue to boost processor speeds and incorporate the chip in its high-end computers, but will likely start the phasing out the processor in 2001 as 64-bit chips from Intel become more pervasive and powerful.

The R10000 was the last new SGI designed MIPS core. Derivative cores were released in later years:

Year	Chip
1998	R12000	0.25 µm; 270, 300 and 360 MHz
2000	R12000A	0.18 µm; 400 MHz
2001	R14000	0.13 µm; 500 MHz
2002	R14000A	0.13 µm; 600 MHz
2003	R16000	0.11 µm; 600 - 700 MHz
2004	R16000A	800 - 900MHz

In May 2001 SGI released its first Itanium based workstation, the SGI 750. The SGI 750 had terrible performance (because the first Itanium chip had terrible performance) and was discontinued by the end of 2001. For all of 2002 SGI had no Itanium product to sell, though it had made clear to customers that the MIPS line was dead. In January 2003 SGI released the Itanium2 based Altix 3300 and Altix 3700 computers.

In 2003 SGI left its Mountain View headquarters and leased the buildings to Google.

In 2006 SGI filed for bankruptcy.

Year	Revenue
FY 1995	$2.2B
FY 1996	$2.8B
FY 1997	$3.7B
FY 1998	$3.1B
FY 1999	$2.7B
FY 2000	$2.3B
FY 2001	$1.9B
FY 2002	$1.3B	($46.3)
FY 2003	$1.0B	($129.7)
FY 2004	$0.8B
FY 2005	$0.7B

Fujitsu MB86900

The first SPARC microprocessor, used in the Sun-4 workstation. The implementation was built on two 20,000 gate Fujitsu gate array chips!

Introduced: 1986

LSI L64801

SPARC microprocessor, used in the SPARCStation workstation.

Introduced: 1989

Cyprus CY7C601

SPARC microprocessor, used in the SPARCStation workstation.

Introduced: 1990
Scalar, Pipelined, In-order
Integer Pipeline: 4 stages
User Visible Registers: 32 integer registers
FPU: Off-chip
25, 33, 40 MHz

SuperSPARC I

This CPU was used in the SPARCstation 10 and 20. This was Sun's first superscalar CPU, but Sun screwed up the design and it clocked MUCH lower then expected/desired. This chip came out one year before Intel's 2-issue super-scalar Pentium, but the Pentium clocked much faster.

Also, DEC released the 100+ MHz 2-issue superscalar Alpha 21064 in 1992, but, fortunately for Sun, the Alpha had no installed base.

Sun marketing had quite a challenge for a few years because of the low SuperSparc frequency (and thus performance).

Introduced: 1992
Super-scalar (3-issue), Pipelined, In-order
Integer Pipeline: 8 stages (very high for frequency ...)
On-chip Caches:
16+20 KB L1 D+I cache
FPU: On-chip
33 - 40 MHz @ 0.8 µm (this was a disappointment)
3.1M transistors
Die Size: 256 mm² (16x16)

HP NS1

With the NS1 HP got a PA-RISC CPU implementation on a single chip, though the CPU needed support chips

Before to the NS1 was the 1986 CPU TS1. The TS1 was the first PA-RISC CPU, but it wasn't a microprocessor. From openpa.net:

The TS-1 was the first PA-RISC production processor, introduced in 1986. It integrated version 1.0 of PA-RISC on six 8.4×11.3" boards of TTL and was used in HP 9000 840 servers, the first PA-RISC computers.

The TS1 was an 8 MHz CPU with a 3-stage integer pipeline more comparable to the VAX minicomputers of the time than to the 68020, 80386 or MIPS R2000.

HP's Precision architecture is RISC-y, but less 'ideological' about RISC-ness than MIPS and SPARC. This may be because HP had more experience designing commercial CPU architectues than the MIPS and SPARC folks did. Or it may be because HP had some very specific/concrete loads in mind. In any event, from "Hewlett-Packard Precision Architecture: The Processor":

The basic types of operations in most instruction sets fall into three categories: data transformation operations, data movement operations, and control operations. In general, one instruction performs one of these operations. A combined instruction performs more than one of these operations in one instruction. In HP Precision Architecture, almost every instruction performs a combination of two of these operations in a single cycle, with relatively simple hardware.

(emphasis added)

and ...

HP Precision Architecture is frequently referred to as a reduced instruction set computer (RISC) architecture. Indeed, the execution model of the architecture is RISC-based, since it exhibits the features of single-cycle execution and register-based execution, where load and store instructions are the only instructions for accessing the memory system. The architecture also uses the RISC concept of cooperation between software and hardware to achieve simpler implementations with better overall per formance.

HP Precision Architecture, however, goes beyond RISC in many ways, even in its execution model. For example, RISC machines emphasize reducing the number of instructions in the instruction set to simplify the implementation and improve execution time. Only the most frequently used, basic operations are encoded into instructions. How ever, frequency alone is not sufficient, since some instructions may occur frequently because of inefficient code generation, arbitrary software conventions, or an inefficient architecture.

In designing the next-generation architecture for Hewlett-Packard computers, the intrinsic functions needed in different computing environments like data base, computation intensive, real-time, network, program development, and artificial intelligence environments were determined. These intrinsic functions are supported efficiently in the architecture. Minimizing the actual number of instructions is not as important as choosing instructions that can be executed in a single cycle with relatively simple hardware. Complex, but necessary, operations that take more than one cycle to execute are broken down into more primitive operations, each operation to be executed in one instruction. If it is not practical to break these complex operations into more primitive operations, they are defined as assist instructions, by means of the architecture's instruction extension capabilities. If more than one useful operation can be executed in one cycle, HP Precision Architecture defines combined operations in a single instruction, resulting in a more efficient use of the execution resources and in im proved code compaction.

HP Precision Architecture's execution model has other noteworthy features like its heavy use of maximal-length immediates as operands for the execution engine, and its efficient address modification mechanisms for the rapid access of data structures. The architecture also includes some uncommon functions for efficiently supporting the movement and manipulation of unaligned strings of bytes or bits, and primitives for the optimization of high-level language programs.

Hewlett-Packard Journal August 1986:
Hewlett-Packard Precision Architecture: The Processor, by Ruby B. Lee

HP NS2

HP PCX

HP PA-7000

HP PA-7100

2-issue superscalar 5 stage pipeline 100 MHz (125 MHz on the PA-7150)

HP PA-7200

140 MHz

HP PA-8000

By 1995 most of the major workstation CPU vendors were developing out-of-order CPUs. Intel had even delivered one (which had surprisingly good integer performance):

Architecture	Chip	Year
Intel x86	Pentium Pro	1995
HP PA-RISC	PA-8000	1996
MIPS	R10000	1996
SPARC	SPARC64	1996
IBM POWER	POWER3	1998
DEC Alpha	21264	1998

The sole exception was Sun developed SPARC CPUs. HAL Computer Systems delivered an out-of-order SPARC CPU for Fujitsu in 1995, but Sun didn't have an out-of-order CPU in its lineup until the SPARC T4 in 2011.

In many respects the PA-8000 appears similar to Intel's Pentium Pro:

Both are out-of-order (and the 1st out-of-order CPUs in their families)
Three instructions/clock max for the Pentium Pro and four for the PA-8000
180 - 200 MHz
Large off-chip caches

One interesting difference is that the Pentium Pro shipped with small on-chip caches to compliment the off-chip cache and PA-8000 shipped with a larger off-chip cache, no on-chip caches and a deeper out-of-order reorder queue. HP explaines this in "The HP PA-8000 RISC CPU" from 1997:

Why did we design the processor without on-chip caches? The main reason is performance. Competing designs incorporate small on-chip caches to enable higher clock frequencies. Small on-chip caches support benchmark performance but fade in large applications, so we decided to make better use of the die area. The sophisticated instruction reorder buffer allowed us to hide the effects of a pipelined two-state cache latency. In fact, our simulations demonstrated only a 5% performance improvement if the cache was on chip and had a single-cycle latency. A flat cache hierarchy also eliminates the design complexity associated with a two-level cache.

Introduced: 1996
Super-scalar (4-issue), Pipelined, Out-of-order
On-chip caches: None
Off-chip cache: Up to 4 MB L1 (separate die)
FPU: On-chip
Up to 180 MHz
3.8M transistors
Die Size: 338 mm² (not counting separate cache die)

HP PA-8200

HP PA-8500

HP PA-8600

HP PA-8700

HP PA-8800

HP PA-8900

Intel i960MX

Intel i960CA

Motorola 88100

Apr1989 IEEE Micro:
The Design of the 88000 RISC Family, by Charles Melear

Motorola 88110

AMD 29000

AMD 29030

AMD 29040

AMD 29050

Intel i860XR

May 1989 Byte Magazine:
Intel's Cray-on-a-Chip
Jan 1991 Byte Magazine:
Personal Supercomputing with the Intel i860
IEEE Spectrum (02-Jul-2022):
The First Million-Transistor Chip: The Engineers' Story, by Tekla S. Perry
i860 64-Bit Microprocessor Programmer's Reference Manual

Intel i860XP

DEC Alpha 21064 (EV4)

In 1977 DEC launched the 32-bit VAX family of computers. VAX was wildly successful and drove DEC's growth over the next decade. In the mid-1980s, however, single chip RISC CPUs from Sun Microsystems and MIPS demostrated that simple (and much more importantly, inexpensive!) chips could rival expensive VAX CPUs.

Intel faced similar threats to its x86 line of chips, but Intel was able to compete on integer performance with the 80486, which demonstrated that x86 could be successfully pipelined, the Pentium, which demonstrated that x86 could be made super-scalar and finally with the Pentium Pro, which demonstrated that x86 could implement and Out-of-Order CPU.

VAX was not so fortunate. The complex VAX addressing modes led DEC engineers to conclude that VAX could not be made performance competitive with RISC CPUs and DEC concluded that VAX must be replaced with a RISC chip for DEC to remain performance competitive.

DEC had previously explored RISC CPUs with the internal Prism project (1985 - 1988) and with the DECStation product line released in 1989 that was built around MIPS CPUs. DEC eventually decided that a DEC CPU was the thing to do and in 1992 DEC delivered the first Alpha (named DECchip at the time) CPU, the 21064.

Introduced: 1992
0.75 micron design process
Pipelined (super-pipelined)
2-issue Superscalar
8 KB L1 I-cache, 8 L1 KB D-Cache
1992: 100 MHz - 150 MHz
1993: 200 MHz
1995: 300 MHz (21064A)

Aug 1992 Byte Magazine:
RISC Enters a New Generation, by Richard L. Sites

DEC Alpha 21164 (EV5)

Introduced: 1995
4-issue Superscalar
1995: 226 MHz - 333 MHz

DEC Alpha 21264 (EV6)

Introduced: 1998
Out-of-Order Execution

DEC Alpha 21364 (EV 7)

In 1998 DEC was acquired by Compaq. In 2002 Compaq was acquired by Hewlett Packard.

Intel Merced

Dec 1997, Byte Magazine:
Beyond Pentium II
Jun 1998, Byte Magazine:
Inside IA-64

Intel McKinley

Intel Madison

Intel Montecito

Intel Montvale

Intel Itanium 9300 (Tukwila)

Intel Itanium 9500 (Poulson)

Intel Itanium 9700 (Kittson)

Intel iAPX 432

In the late 1970s several companies were working on "object oriented processors." One was the minicomputer company Data General. The "Fountainhead Project" CPU that was the antagonist to the Eclipse project in Tracy Kidder's "Soul of a New Machine" was an "object oriented processor." Intel's i432 was another (and, in fact, Data General went to the trouble of writing a paper explaining how the Data General computer was much faster than Intel's i432 CPU).

The i432 was intended to replace the 8086/8088 with a modern 32-bit architecture but the x86 architecture took off when IBM selected the 8088 for the IBM PC and the 80286 was a reasonable successor to that chip with backward compatibility that the i432 did not have. The 80286 was eventually succeeded by the even more successful 32-bit 80386.

The i432 is 'object oriented' in the sense that data is expected to be in well defined regions and access checks are done on every data access — per-object data scoping is enforced by hardware!

The i432 had hardware support for garbage collection because the expectation was that the code running on the i432 would use garbage collection rather than explicit memory management.

The i432 encoded instructions to be packed efficiently by implementing a variable length instruction encoding and making the most common instructions the shortest. Insanely, the instructions were sized by bits rather than bytes. Instructions ranged from 6 bits to 344 bits in length and instructions could begin on any bit boundary. This would have made parallel instruction decoding a challenge had the i432 architecture survived long enough for this to be an issue.

A useful model to think about the i432 is that the i432 is the spiritual opposite of the RISC CPUs that became common at the end of the 1980s.

The i432 combined high price (3 chips with up to 1 million transistors) and poor performance. In his paper "A Performance Evaluation of the Intel 80286" (1982) David Patterson writes this:

The bottom performance line as measured by these four small programs is that the newest version of the 432 (8 MHz with 4 wait states) is almost as fast as a 5 MHz 8086, while the 80286 leads the 432 by almost an order of magnitude.

Underlining by me.

Things began more optimistically. From the June 1981 issue of Byte Magazine (page 210) as the i432 neared coming to market.:

Update On 32-Blt Microprocessors: The International Solid-State Circuits Conference (ISSCC) met in New York last February and heard presentations on two 32-bit microprocessors and some disclosures on a third.

Intel released further details on its 32-bit iAPX432 processor. It is Intel's first departure from previous architecture and instruction sets, so there is no software compatibility with its 8086 (16-bit) and 8085 (8-bit) microprocessors. Each of the iAPX432's three integrated circuits has four lines of sixteen pins. There are two general processors and an I/O (input/output) processor. The iAPX432 can link to 8086s and existing peripheral and memory integrated circuits. Intel is boasting performance of up to 2 MIPS (million instructions persecond).

It took five years to engineer the iAPX432, and the company estimates that $25 million was spent on the project. Intel expects to sell at least 10,000 sets in the first year of production, which is projected for 1982. The initial price for the set will be $1500. Intel started shipping evaluation sets in February and is offering a board-level evaluation kit for $4250.

Intel claims that each of the three integrated circuits contains about 200,000 transistors. Two chips operate as a pipeline pair: the 43201 processor, which contains the instruction decoder, and the 43202, which is the microexecution unit. The 43203 is the I/O processor. It provides an interface from the I/O subsystem to the protected-access environment of the central system. Each I/O subsystem uses an 8- or 16-bit microprocessor to control I/O, independent of the central system. An address space of more than 4 gigabytes (4×10⁹ bytes) and a virtual memory-address space of a terabyte (10¹² bytes) is supported.

A protection scheme is provided to limit access to programs. The iAPX432 can perform floating-point operations on 32-, 64-, and 80-bit numbers. Hardware failures can be detected by interconnecting identical iAPX432 processors in a self-checking arrangement.

The system uses compiled Ada code as its machine language. The language interpreter is contained in a 64 K-byte microcode ROM (read-only memory).

Intel has also released an Ada cross-compiler for the iAPX432. The compiler runs on a DEC (Digital Equipment Corporation) VAX-11/780 or an IBM 370. It costs $30,000. A $50,000 hardware link is needed to download the compiled code to Intel's $4250 development board.

With the iAPX432, Intel appears to have a two-year jump on its competition. At the conference, Hewlett-Packard (HP) disclosed that it is in the early stages of development on a 32-bit microprocessor. HP claims to have built and tested a single chip with 450,000 transistors (which is about what Intel has in its set of three integrated circuits).

It was not to be, however, and the combination of high cost and low performance prevented the CPU from getting any market traction at all. The architecture came to an end in the mid-1980s. From Byte Magazine, June 1985:

Intel has also stopped all manufacturing, marketing, and support activities for its 432 microprocessor. The 432 was Intel's first 32-bit chip set. but it was never used in any large volume computers. Intel is reportedly working on two other 32-bit chip designs, including the Intel 80386, which will be compatible with its 80286 and earlier designs. Intel will begin shipping samples of the 80386 late this year.

Introduced: 1981
32-bit CPU (Intel's first 32-bit CPU)
Multi-chip CPU

Wikipedia article
Introduction to the iAPX 432 Architecture
Computer Architecture News, June 1982:
A Performance Evaluation of the Intel iAPX 432
Performance Effects of Architectural Complexity in the Intel 432
by Robert P. Colwell et.al.

HP Focus

August 1983 Hewlett-Packard Journal

NCR/32

The NCR/32 was unusual because it was intended to be used to implement other, existing CPU architectures such as the IBM System/370. This turned out to not be a large market.

From the NCR Microelectronics Short-Form Catalog-1985:

NCR/32 Processor Family Features

32-bit system architecture
13.3 Megahertz frequency
Effective emulation of mid-range mainframes
Externally microprogrammable
Real and virtual memory operation
Large direct memory addressing
Interface provided to slower peripherals
On-chip error check and correction
Functional Description
The NCR/32 VLSI Processor family combines the latest advances in semi-conductor technology with experience gained in three generations of computer mainframe design to provide a comprehensive microprogrammabie 32-bit system architecture. With external microprogram capability, an extremely flexible microinstruction set, and a powerful set of internal registers, the NCR/32 offers flexibility and high performance advantages not available with other microprocessors.
Along with an existing set of VLSI family support devices, the NCR/32 offers effective emulation of register, stack and descriptor-based system architectures, as well as execution of high-level languages directly from microcode. The NCR/32 is well suited for applications requiring direct addressing of a large memory space, high numeric precision, and very-high-speed execution such as bit-mapped graphics, robotics, artificial intelligence, and relational databases.

An example of using this chip as the base for a specific actual architectural implementation is Barry Fagin's research project that configured an NCR/32000 with custom microcode to implement a Prolog engine.

And from the January 1984 issue of Byte Magazine article "1984, the Year of the 32-bit Microprocessor," by Richard Mateosian:

The NCR NCR/32. This microprocessor chip set is quite different from all of the other microprocessors discussed in this article. It is designed to be externally microprogrammed to emulate other computers, principally medium-sized IBM mainframes like the System 370. The chip set consists of:
*) the NCR 32-000 CPC, the central processing unit. It contains 40,000 transistors and is fabricated in a 3-micron silicide NMOS process. It runs with a 13.3-MHz clock, with internal machine cycles occupying two clock cycles (150 nanoseconds). The 16-bit microinstructions, read from a 128K-byte external storage unit, select 95-bit words from an internal ROM to control 179 operations, mostly register-to-register arithmetic and logical operations on 4-bit, 8-bit, 16-bit, 32-bit, and field data types. Microinstructions are executed in a three-stage pipeline (fetch, interpret, execute). Eight 16-bit jump registers support a rich set of conditional operations at the microcode level, and special set-up microinstructions facilitate IBM System 370 emulation
*) the NCR 32-010 ATC, the memory management unit. In addition to address translation and access protection, this chip provides memory-refresh control, error-checking and correction (ECC) logic, a time-of-day register, an interval timeout interrupt, and an interrupt on writes to one specified virtual address. Sixteen translation registers support mapping of 32-bit or 24-bit virtual addresses into 24-bit physical addresses, using page sizes of 1K, 2K, or 4K bytes.
*) the NCR 32-020 EAC, the "booster" chip for arithmetic operations. It supports IBM-compatible single- and double-precision binary and floating point arithmetic, packed and unpacked decimal storage, and format conversions. A single-precision floating-point addition takes approximately 1.6 microseconds.
*) the NCR 32-500 SIC, which interfaces the 24-megabyte/second prcessor memory bus to slower peripherals and to other systems. The configuration of an NCR/32 system is shown in figure 3. No benchmark data has been published, but NCR estimates performance of the NCR/32 at approximately four times that of a 10-MHz 68000.

If the 4× the performance of a 10 MHz 68000 estimate is correct then the NCR/32 would have roughly the same performance as the contemporary 16 MHz 68020.

The market for the chip was small, however, and no successor chips were made.

Jan 1984, Byte Magazine:
1984, the Year of the 32-bit Microprocessor, by Richard Mateosian
Compiling Prolog Into Microcode: A Case Study Using the NCR/32-000

Zilog Z80000

The Zilog Z80000 was intended to be a 32-bit extension of the 16-bit Z8000. Implementation problems prevented the chip from shipping.

From the "Z80,000 CPU Preliminary Technical Manual" (1984):

1.1 INTRODUCTION

The Z80,000 CPU is an advanced 32-bit microprocessor that integratea the architecture of a mainframe computer into a single chip. A subset of the Z80,000 architecture was originally implemented in a 16-bit version, the Z8000 microprocessor. The Z80,000 bus structure permits the use of Z8000 family peripherals, such as the Z8030 SCC and Z8036 CIO. While maintaining compatibility with Z8000 family software and hardware, the Z80,000 CPU offers greater power and flexibility in both its architecture and interface capability. Operating systems and compilers are easily developed in the Z80,000 CPU's sophisticated environment, and the hardware interface provides for connection in a wide variety of system configurations.

Memory management is integrated in the CPU, providing access to more than 4 billion bytes of logical address space without external support components. The Z80,000 CPU also includes a cache memory, which complements the pipelined design to achieve high performance with moderate memory speeds.

This chapter presents an overview of the features of the Z80,000 CPU that offer extraordinary flexibility to microprocessor system designers in tailoring the power of the CPU to their specialized applications. The chapters that follow describe these features in detail.

1.2 ARCHITECTURE

The CPU features a general-purpose register file with sixteen 32-bit registers. The instruction set offers a regular combination of nine general addressing modes with operations on numerous data types, including bits, bit fields, bytes (8 bits), words (16 bits) , long words (32 bits) , and variable-length strings. The memory management, exception handling, and system and normal mode features support the development of reliable software systems.

1 .2.1 Registers The Z80,000 CPU includes sixteen 32-bit general-purpose registers. The registers can be used as data accumulators, index values, or memory pointers. Two of the registers, the Frame Pointer and Stack Pointer, are used for procedure linkage with the Call, Enter, Exit, and Return instructions.

The Z80,000 registers also include the 32-bit Program Counter and 16-bit Flag and Control Word. These two registers, together called the Program Status, are automatically saved during trap and interrupt processing. Nine other special-purpose registers are used for memory management, system configuration, and other CPU control.

1.2.2 Address Spaces

The CPU uses 32-bit logical addresses, permitting direct access to 4G bytes of memory. The logical addresses are translated by the memory management mechanism to the physical addresses used to access memory and peripherals.

The CPU supports three modes of address representation — compact, segmented, and linear — selected by two control bits in the Flag and Control Word register. Applications with an address space smaller than 64K bytes can take advantage of the dense code and efficient use of base registers with the 16-bit compact addresses. Although programs executing in compact mode can only manipulate 16-bit addresses, the logical address is extended to 32 bits by concatenating the 16 most-significant bits of the Program Counter register. Compact mode is equivalent to the Z8000 non-segmented mode.

Segmented mode supports two segment sizes — 64K bytes and 16M bytes. Up to 32,768 of the small segments and 128 of the large segments are available. In segmented mode, address calculations do not affect the segment number, only the offset within the segment. Allocating individual objects such as program modules, stacks, or large data structures to separate segments allows applications to benefit from the logical structure of a segmented memory space.

The 32-bit addresses in linear mode provide uniform and unstructured access to 4G bytes of memory. Some applications benefit from the flexibility of linear addressing by allocating objects to arbitrary positions in the address space.

And from Byte Magazine, June 1985:

Problems in debugging complex microprocessor chips have caused new problems at Zilog and Intel. Zilog admitted that sampling of its Z80000 32-bit processor. announced in the summer of 1983. has been delayed until early 1986. Zilog had originally planned to start shipping the Z80000 in late 1984

Introduced: not really
In-order
Integer Pipeline: 6 stages
User Visible Integer Registers: 16
On-chip Cache: 256 bytes (16 entries × 16 bytes; fully associative)
FPU: No
25 MHz (intended)
91,000 transistors
Die Size: ~100 mm² (10 by 10 mm) @ 2.0 µm nMOS process

Z80,000 CPU Preliminary Technical Manual

ShBoom

SIGForth '91':
Sh-BOOM: The Sound of the RISC Market Changing, by George William Shaw II

AT&T Hobbit

The AT&T Hobbit processor was created because Apple liked the AT&T Research CRISP CPU and wanted a commercial version. The performance of a 16 MHz CRISP CPU against a 5 MHz VAX 11/780 and an 8 MHz MIPS R2000 M/500 Development System, as reported in "The Hardware Architecture of the CRISP Microprocessor" (1987) by David R. Ditzel, Hubert R. McLellan and Alan D. Berenbaum was this:

Benchmark	VAX-780	R2000	CRISP	CRISP/VAX	CRISP/R2000
ackerman	20.9 sec	1.6 sec	1.1 sec	19.0	1.5
word count	55.0 sec	5.2 sec	4.2 sec	13.1	1.2
quicksort	36.2 sec	4.0 sec	3.4 sec	10.6	1.2
tty driver	17.4 sec	2.2 sec	1.2 sec	14.5	1.8
symbol table	14.6 sec	1.3 sec	1.2 sec	12.2	1.1
buffer release	9.9 sec	0.9 sec	0.8 sec	12.4	1.1
arithmetic	12.8 sec	2.7 sec	1.6 sec	8.0	1.7

This is not entirely fair to the R2000 as the M/500 was a development system that was clocked slower than the production R2000 would be. It was, however, the competitor's RISC CPU that AT&T could acquire to benchmark at the time.

The CRISP CPU was 172,163 transistors implemented on a 1.75µ CMOS process while the R2000 was around 110,000 transistors on a 2.0 1.75µ process.

CRISP did not have explicit registers. Instead, all operations were performed against memory. A stack cache ensured that access to values that would have been in registers on a register machine were cached on-chip rather than requiring DRAM access. Another cache for decoded instructions allowed many instructions to execute in one clock cycle even though the instructions were logically accessing 'memory'.

Finally, CRISP provided instructions in only three lengths: 2 bytes, 6 bytes and 10 bytes:

The instruction encoding is designed with two primary considerations. First, the instruction length must be easily determined. Therefore, the length is encoded in the first two bits of each instruction. Since all instructions are multiples of two bytes, this unit is referred to as an instruction parcel. Second, static and dynamic code size should be made as small as possible without interfering with performance issues. Instructions that require two 32-bit addresses or operands, can use the five parcel form shown in Figure 1. The three parcel form can be used to provide a single 32-bit operand or two 16-bit operands. The single parcel format has a 5-bit opcode field which defines the most frequent combinations of operations and addressing modes occurring in the three and five parcel forms. This highly encoded single parcel form typically accounts for 80 percent of all instructions.

Five Parcel

opcode(6)

smode(4)

Idmode(4)

src(32)

dst(32)

opcode(6)

smode(4)

Idmode(4)

src(32)

Three Parcel

opcode(6)

smode(4)

1111

src(16)

dst(16)

One Parcel

opcode(5)

src(5)

dst(5)

opcode(5)

src(10)

Hobbit was basically CRISP and was released in 1992. The product for which Apple had intended to use the Hobbit CPU was also released in 1992 — but it came out with an ARM CPU instead. Apple had found the Hobbit CPUs to “rife with bugs, ill-suited for our purposes, and overpriced,” according to Larry Tesler, then at Apple and in charge of the Newton when it switched to ARM.

AT&T released a second version of the original Hobbit, but it was mostly a tweaked version of the original chip. From page 105 of January, 1994 issue of Byte Magazine:

The AT&T Hobbit Enters Its Second Generation

The AT&T Hobbit chip sets betray their corporate heritage. These are chips designed first and foremost for telecommunications applications. AT&T Microelectronics first offered a set of chips for PDAs (personal digital assistants) in 1992. The 92K Hobbit family, the chips that are used in the Eo Personal Communicator, has five parts: a CPU, a system controller, a bus controller, a video-display controller, and a peripheral-bus controller.

The price seemed high at $99 for the chip set, but it was complete. Late last year, AT&T introduced two new chip sets designed to broaden the line, with trade-offs in performance, system size, cost, battery life, and feature sets.

The ATT92020S processor provides higher performance — it uses a 6-KB prefetch buffer as opposed to the 3-KB buffer on the 92010 — and requires less power than the original 92010 CPU, It also works with all the existing 92010 support chips except for the ISA controller. ISA support doesn’t figure very highly in the new Hobbit offerings.

The Hobbit was competing against ARM chips in the PDA market and quickly lost.

Mitsubishi Gmicro/100

Several vendors produced various TRON CPU implementations. This was similar to how SPARC chips were not just manufactured, but designed by multiple companies to a common specification.

Mitsubishi was one of those vendors and the Mitsubishi family of TRON chips also went by the M32 moniker. The Mitsubishi Gmicro/100 was followed up with the Gmicro/400.

Eventually, Mitsubishi moved its development efforts to the M32R family of CPUs.

IEEE August 1991:
The Gmicro/100 32-bit microprocessor, by Tadahiro Yoshida

Hitachi Gmicro/200

Similarly to the SPARC architecture, several vendors produced various TRON CPU implementations of a common specification. In addition, some of the vendors came together to work jointly on the Gmicro family of TRON CPUs.

Hitachi was one of those vendors and the Hitachi family of TRON chips also went by the H32 moniker. The Hitachi Gmicro/200, released in 1988, was the first TRON CPU. Hitachi followed it up with the Gmicro/500.

One problem that the TRON CPU project faced, especially in the desktop and workstation markets, was competition from the x86 chips (e.g. 80486) and the emerging inexpensive RISC chips (e.g. SPARC and MIPS). In 1990 the competitive situation looked like this (if the TX3 met the Q490 ship date):

CPU	MHz	Cache
TRON T3	33	8+8 KB
Intel 80486	33	8 KB
MIPS R3000	33	—
SPARC CY7C601	40	—

Toshiba's T3 appears quite competitive, but only competitive. Remaining competitive would require on-going investment in development. And it is unclear if the T3 ever shipped! In 1991 MIPS introduced the R4000 which pushed the clock speed to 100 MHz and added 8+8 KB of on-chip cache. In 1991 Intel released a 80486 at 50 MHz. And in 1992 DEC produced the first Alpha chip at 100+ MHz. Without a commitment to continue agressive development, the TRON chips would rapidly become uncompetitive. And much of the TRON focus was on embedded applications anyway (note that several TRON CPUs lack an MMU.

Momentum behind the TRON CPU faded fairly quickly and the major vendors eventually supported (different) non-TRON RISC chips. Hitachi moved its development efforts to the SH (or Super-H) family which was focused on embedded systems.

From "Microprocessors and Microsystems" Vol 13 No 8 October 1989:

TRON microprocessors will also be in competition with new RISC chips, for unlike RISC architectures the TRON CPU has a compiler-oriented CISC structure that allows it to compile high-level program code efficiently.

TRON microprocessor manufacturers include Toshiba, developing its TX series, and the GMICRO group of Hitachi, Fujitsu and Mitsubishi. Oki Electric, which is now participating in development of the GMICRO series, is also working on its own TRON CPU. Matsushita similarly is producing its own TRON chip (see Table 1 over).

The most ambitious design is Toshiba's TX3, a 1.2M transistor, 33 MIPS (peak) microprocessor due late in 1990, intended for high-end workstations. Already available is the TX1 embedded controller, which is pin compatible with the TX3. The TXl can also be used as an ASIC macro cell.

Toshiba is also developing three peripheral chips for the TX series: a 50 MHz clock generator, an interrupt controller/timer that can handle up to eight interrupts, and a four channel direct memory access controller (DMAC) with 50 and 25 Mbyte s^-1 block and single transfer modes.

The GMICROgroup plans to unveil a 900k transistor, 20 MIPS TRON CPU, the GMICRO/300, in the current quarter. Unlike the TX3, which will basically have a floating-point unit (FPU) built into it, the GMICRO/300 will use an external FPU. To allow for efficient interaction with the FPU, and easy development of software, the GMICRO/300 will have 22 coprocessor instructions, in addition to 11 decimal instructions. The other peripheral chips in the GMICROseries are a cache controller/memory, a four channel DMAC, an interrupt request controller that can handle up to seven interrupts, a tag memory with a 27 µs access time, and a 40-48 MHz clock pulse generator.

The GMICRO series began in 1987 with the appearance of the 730k transistor, 10 MIPS GMICRO/200, followed recently by the GMICRO/100 embedded controller with 330k transistors (operating speed 10 MIPS (max.)). High-speed versions of all three GMICRO microprocessors are planned by equipping them with 33 MHz clocks.

Oki Electric has joined the GMICRO group to market the series products and to undertake the development of GMICROevaluation tools that operate in the BTRON environment.

Prior to joining GMICRO, Oki was reported to be developing the 032 chip, but no launch date has been announced. Unlike the other developers who are planning different chips for different applications, Oki intends to use its CPU across a variety of applications from embedded controllers to communications terminals and lower-performance PC devices. Similarly, Matsushita anticipates wide application of its MN10400, a 400 k transistor, 20 MIPS chip due this quarter.

			Gmicro
CPU	TX1	TX3	100	200	300	MNI0400	O32
MIPS
Avg	5	15	5	7	12	8	10
Peak	12.5	33	10	10	20	20	15
Clock(MHz)	25	33	20	20	20	20	33
Instructions
basic	92	102	91	100	100	95	102
fp		25
decimal		11			11
coproc				22	22
MMU	No	Yes	No	Yes	Yes	No	Yes
Cache
I	—	8KB	—	1KB	2KB	1KB	1KB
D	—	8KB	—	—	2KB	—	1KB
stack	—	—	—	128 B	—	—	—
Transistors	450K	1,200K	330K	730K	900K	400K	700K
CMOS	1 µm	0.8 µm	1 µm	1 µm	1 µm	1.2 µm	0.8 µm
Samples	4Q88	4Q90	2Q89	3Q88	3Q89	3Q89	—

IEEE August 1991:
The Gmicro/100 32-bit microprocessor, by Tadahiro Yoshida

Fujitsu Gmicro/300

Several vendors produced various TRON CPU implementations. This was similar to how SPARC chips were not just manufactured, but designed by multiple companies to a common specification.

Fujitsu was one of those vendors and the Fujitsu family of TRON chips also went by the F32 moniker. Eventually Fujitsu moved its development efforts to SPARC, developing the SPARC64 line of high performance SPARC CPUs.

Gmicro/400

Gmicro/500

Toshiba TX1

Toshiba's first TRON CPU implementation. This was followed by the TX2

Toshiba TX2

Oki Electronics O32

Oki Electronics' TRON CPU implementation(s).

ARM Arm1

Arm1 was the first ARM CPU. It never shipped with a product. Only one chip design was fabbed implementing Arm1 and only a few hundred chips actually produced.

The Arm1 did not include integer hardware divide or multiply (!). Comparing this chip to earlier 16-bit chips such as the 8086 is tricky because while the 8086 had a similar transitor count, the 8086:

Was constrained to a 40-pin package (for cost reasons) rather than an 82-pin package.
Was assembly source compatible with the 8080
Had hardware integer multiply

Comparing a clean-sheet design from 1985 with no hardware integer multiply, twice the pin count and no hardware floating point option (as the 8087 was for the 8086) with a chip that shipped in 1978 with that extra functionality and those constraints is tough. The 8086 was designed without CAD tools. The Arm1 was designed with VLSI Technology's custom design tools.

Introduced: 1985 (evaluation systems only; VC2588 (Autumn) chip)
In-order, pipelined
Integer Pipeline: 3 stages
User Visible Integer Registers: 25 (16 user; 9 supervisor)
On-chip Caches: No
FPU: No
6 MHz
24,800 transistors
Die Size: ~50 mm² (7 mm × 7 mm) @ 3.0 µm process
Pins: 82

ARM Arm2

This was the first ARM chip that shipped in products. A (slighly) improved revision of the Arm1, this chip added an integer hardware multiply instruction as well as some instructions to support a generic co-processor interface. The chips were produced in a smaller technology node than the Arm1 (2 µm vs 3 µm) and clocked higher (8 - 12 MHz vs 6 MHz). The British Acorn Archimedes personal computer was designed around the Arm2 CPU.

Introduced: 1986
In-order, pipelined
Integer Pipeline: 3 stages
User Visible Integer Registers: 27 (18 user; 9 supervisor)
On-chip Caches: No
FPU: No
1986: 8 MHz (1 W)
1987: 10, 12 MHz (2 W)
27,000 transistors
Die Size: ~34 mm² (5.8 mm × 5.8 mm) @ 2.0 µm process
Pins: 82

Apple A6 (Swift)

This was Apple's first custom designed ARM core.

Dual core, 1.3 GHz, 32KB+32KB L1, 2MB shared L2, 32 nm

Apple A7 (Cyclone)

This was Apple's first 64-bit ARM core. Apple was one of the first ARM vendors to ship a 64-bit ARMv8 CPU.

Dual core, 1.3 - 1.4 GHz, 64 KB+64 KB L1, 1MB shared L2, 4 MB shared w/SoC L3 28 nm

Apple A8 (Typhoon)

Apple A10 (Hurricane)

Apple A11 (Monsoon)

A12 (Vortex)

Apple A13 (Lightning)

Apple A14, M1 (Firestorm)

Apple A15, M2 (Avalanche)

Elbrus 2000

Elbrus 2C+ TSMC 90nm 0.5 GHz 2-core

Elbrus 4C

TSMC 65nm 380 mm^2 0.5 GHz 38.4 GB/sec 3xDDR3 8 MB L2 cache 4-core

Elbrus 8C

TSMC 28nm 321 mm^2 1.3 GHz 8-core 512 KB L2/core 16 MB L3 shared

Elbrus 8SV

PowerPC 750 (IBM and Motorola)

A 250MHz 5-W PowerPC microprocessor with on-chip L2 cache controller

Exponential X⁷⁰⁴

In 1997 Exponential Technology released a BiCMOS CPU seemingly designed to answer the question:

Can a high frequency CPU with a tiny cache avoid being performance limited by DRAM bandwidth and latency?

The answer was, "Well, this CPU cannot."

Comparing the X⁷⁰⁴ to two contemporaneous CPUs we find this:

	Exponential x⁷⁰⁴	PowerPC 750	Pentium-II
Attribute
Frequency	533 MHz	266 MHz	300 MHz
Technology	0.5 µm BiCMOS	0.25 µm CMOS	0.35 µm CMOS
Transistors	2.7M	6.35M	7.5M
Area	150 mm²	67 mm²	113 mm²
TDP	85 Watts	6-8 Watts	18 Watts
L1 I-Cache	2 KB (direct)	32 KB (8-way)	16 KB (4-way)
L1 D-Cache	2 KB (direct)	32 KB (8-way)	16 KB (4-way)
L2 Cache	32 KB	1 MB off-chip	½MB off-chip
L3 Cache	1 MB off-chip
Inst Issue	3-issue Super-Scalar	?-issue Out-of-Order	3-issue Out-of-Order
SpecInt95	~12	~12	12.2

Intel and IBM made slightly different trade-offs when allocating transistors for on-chip caches and Out-or-order instruction depth (Intel's is deeper though it doesn't show here) but get to roughly the same performance. Intel has a much larger chip (113 mm² vs 67 mm²) and this is mostly due to Intel's larger design rule (0.35 µm vs 0.25 µm is 1.96× advantage for the smaller transistors) and a bit due to more transistors (7.5M vs 6.35M). Some of the excess transistors are no doubt due to the "x86 tax."

Exponential designed for frequency above all else and selected BiCMOS to achieve this. The BiCMOS required 10× the power and allowed less than ½ the transistors. The result was that the x⁷⁰⁴ did not have the transistor (or power!) budget for large set associative L1 caches nor the transistor budget for Out-of-order execution. In addition, the instructions lost on memory access stalls due to cache miss were 2× that of the PowerPC 750 (or Pentium-II) so the smaller and less associative caches hurt Exponential more. The result was the 533 MHz Exponential x⁷⁰⁴ performed roughly the same as the 266 MHz PowerPC 750 but with 10× the power requirement and over 2× the die size. With the larger die size came a higher price.

The LA Times had a short writeup of Exponential's chip unveiling:

Exponential Unveils Fast Chip: A San Jose start-up company announced the speediest microprocessor yet, a chip able to run Macintosh software at up to 533 megahertz, more than twice as fast as current chips. Exponential Technology Inc. said its X-704 chip should be available in volume next spring. The company is one of several chip makers to unveil new products at the Microprocessor Forum this week in San Jose. The four-day conference marks the 25th anniversary of the microprocessor. Exponential started in 1993, with financial help from Apple Computer Inc. George Taylor, Exponential's founder and chief technology officer, says the new chips will cost about $1,000 each, which would put them into high-end computers used mostly by graphic designers and creators of multimedia. Industry analysts said Exponential's chip could give Apple's Macintosh computers a boost.

Apple released the G3 Macintoshes in 1997 with a starting price of $1,999 for a 233 MHz CPU. A $1,000 chip was not going to be designed in at that price point. Exponential was out of business by May of 1997.

A 533-MHz BiCMOS Superscalar RISC Microprocessor

Fairchild Clipper C100

Intergraph Clipper C300

Fairchild was purchased by National Semi in 1987 and the Clipper was sold to Intergraph.

The National Semiconductor Corporation has agreed to sell the rights to the Clipper microprocessor product line to the Intergraph Corporation. The sale, for what industry officials said was about $10 million, indicates that National is quickly moving to sell parts of the Fairchild Semiconductor Corporation, which it agreed to acquire from Schlumberger Ltd. last month for $122 million in stock.

National was expected to divest itself of Fairchild's microprocessor business because it already had its own. Intergraph, a Huntsville, Ala., company that makes systems for computer-aided design, uses the high-speed Clipper chip in a work station. The company is so dependent on the chip that it had considered buying a stake in Fairchild as part of a management buyout effort. Intergraph is expected to offer jobs to about 100 Fairchild employees.

— New York Times, Sept. 18, 1987

Intergraph Clipper C400

The effective demise of the Intergraph Corp Clipper RISC chip is reported by our sister publication ClieNT Server News. Intergraph turned its California-based Advanced Processor Division over to Sun Microsystems Inc on January 1 and the 70-employee unit along with general manager Howard Sachs has become part of Sparc Technology Business unit which is working on marrying the Windows NT operating system to the Sparc chip. No further Clipper development is planned.

http://bitsavers.trailing-edge.com/components/fairchild/clipper/Design_and_Implementation_Trade-offs_in_the_Clipper_C400_Architecture.pdf

DEC MicroVAX 78032

The first VAX minicomputer, the VAX 11/780, was announced in October of 1977. The 11/780 CPU (KA780) was built out TTL logic on a number (29?) of individual boards. It was not, in any way, a microprocessor.

In 1985 DEC released the first VAX CPU implemented as a microprocessor. The 78032 was intended for the MicroVAX line of VAX computers and implemented only a subset of the full VAX architecture.

Around the same time, the V-11 CPU shrank the entire VAX instruction set to a single board (down from the many boards of the 11/780), but the V-11 required many chips for a functioning CPU.

The 78032 and V-11 were followed up by the CVAX CPU which implemented the full VAX instruction set, though the floating point required a separate FPU.

DEC CVAX 78034

DEC Rigel

DEC NVAX

32/64-bit Microprocessors Timeline

Last Updated: 3-Apr-2023

(Best Viewed on Wide Monitor; Not so Good on Mobile)

CISC-y	68K	WE 32K	NSC32000	x86 (Intel)	x86 (AMD)	x86 (Other)	VAX	V Series
RISC-y	29K	Transputer	MIPS	SPARC	PA-RISC	POWER	Alpha	i960	88K	i860	Clipper
ARM	Arm	Arm (Apple)	Arm (Other)
Other	Elbrus	Onesies	Itanium