(Best Viewed on Wide Monitor; Not so Good on Mobile)
AMD 29000 FamilyMotorola 680x0 Family
Motorola was late to the 16-bit microprocessor party, so it decided to
arrive in style. The hybrid 16-bit/32-bit MC68000 packed in 68,000
transistors, more than double the number of Intel's 8086. Internally
it was a 32-bit processor, but a 32-bit address and/or data bus would
have made it prohibitively expensive, so the 68000 used 24-bit address
and 16-bit data lines.
— Chip Hall of Fame: Motorola MC68000 Microprocessor (IEEE Spectrum, 30 June 2017)
In 1979, the 16-bit microprocessor competition included:
Year
Vendor
Chip
Transistors
Comment
1976
TI
TMS9900
8,000
1978
Intel
8086
29,000
1979
Intel
8088
29,000
8086 with 8-bit bus
1979
Zilog
Z8000
17,500
Not Z80 compatible
Motorola decided to build a 32-bit architecture to compete in the 16-bit
microprocessor space. The first physical implementation would be
16-bit in places, but the base would laid for a true 32-bit chip in the
near future. And all the software written for the 68000 would (in theory)
be able to run on future fully 32-bit implementations.
The 68000 was a success, but came with tradeoffs. The chip came on a 68-pin package,
which was more expensive than the 40-pin package that the Intel 8088 used. This
mattered when IBM was selecting a CPU for its IBM PC product.
Additionally, the followup 68020 added a number of addressing modes that would
create challenges for high performance implementations later on.
At the "Oral History Panel on the Development and Promotion of the Motorola 68000"
panel session held on July 23, 2007 in Austin, Texas, Tom Gunter says this when asked
what decisions could have been made differently:
Well the main thing that we did is we added complexity to the machine, especially in transitions.
As we went up the chain a little bit to 68010, 68020, we created a monster in terms of all the addressing
modes that we had. We thought that adding more addressing modes was the way you made a machine
more powerful, totally contrary to the principle of RISC later.
Motorola 88000 Family
Motorola introduced the 68000 family of CPUs in 1979, but by the mid-1980s
scaling the performance competitively against Intel's x86 family and the
newer RISC chips was becoming a problem. DEC would face a similar problem
scaling VAX performance at around the same time and DEC's solution was to
build a RISC CPU, the DEC Alpha. Motorola's solution was the same, and Motorola
produced the 88000 RISC architecture.
Unfortunately for Motorola, most of the 680x0 workstation vendors had already
switched to other RISC chips by the time Motorola delivered the first 88000
CPUs and while Apple considered the 88000 (as well as many other RISC CPUs)
and even developed an engineering prototype around the chip, Apple eventually
decided to go with an IBM POWER derivative, the PowerPC, to be developed jointly
by IBM, Apple and Motorola. This effectively ended the 88000 family.
DEC Alpha Family
DEC was extremely late to the RISC CPU game but eventually
delivered the Alpha architecture. Richard L Sites provides
a nice writeup (in
Alpha AXP Architecture with
a copy here) discussing the architecture and how architectual
choices were made for the Alpha:
This paper discusses the architecture from a number
of points of view. It begins by making the distinction
between architecture and implementation. The paper then
states the overriding architectural goals and discusses
a number of key architectural decisions that were derived
directly from these goals. The key decisions distinguish
the Alpha AXP architecture from other architectures. The
remaining sections of the paper discuss the architecture
in more detail, from data and instruction formats through
the detailed instruction set. The paper concludes with a
discussion of the designed-in future growth of the
architecture. An Appendix explains some of the key technical
terms used in this paper. These terms are highlighted with
an asterisk in the text.
It is interesting to compare the DEC choices for Alpha as
described in "Alpha AXP Architecture" with HP's choices for
PA-RISC in "Hewlett-Packard Precision Architecture: The Processor."
MCST Elbrus-2000 Family
The Elbrus name has been used for a number of Soviet (and now Russian) computers,
not all of which have the same ISA or architecture.
Starting in the mid to late 1980s the Soviet Institute of Fine Mechanics and Computer Engineering
organization began working on a VLIW computer. This computer 'arrived' as
the 512-bit word Elbrus-3 in the early 1990s, but the collapse of the
Russian economy following the dissolusion of the Soviet Union meant that
there was no customer and the computer never made it into production.
Work on the architecture continued starting in 1992 at the descendent company
"Moscow Center of SPARC Technologies" (though this architecture had nothing to
do with SPARC) and the Elbrus-2000 CPU was essentially an Elbrus-3 implemented
as a microprocessor.
Over time further chips were released with higher frequencies, higher core counts,
more bandwidth, etc.
Intel i860 Family
Intel has been aware that CPUs without the x86 architectures legacy
could be simpler and cleaner than those required to be backward
compatible with x86.
The i432 was Intel's first attempt to replace x86 that made it to market.
The i432 failed and Intel shipped the 80286 followed by the 32-bit 80386.
But the x86 architecture still had a lot of legacy bits that would be nice
to jetison.
By the mid to lat 1980s Intel could see RISC chips such as SPARC and MIPS
providing high performance without being encumbered by x86 legacy support.
Intel could make a RISC chip too, and Intel's next attempt to get away from
x86 was the 80860, a 64-bit RISC chip whose first implemenation
was superscalar and which came with an on-chip FPU. On paper, the chip looked
like a formidable competitor to the 80486 and also to the contemporaneous SPARC
and MIPS CPUs.
However, dropping the legacy x86 cruft mean that the 80860 was not backward
compatible with x86 and thus could not run existing DOS software. Intel was
(wisely) unwilling to abandon x86 and bet everything on the new 80860 and allowed
the 80486 to compete with the 80860. The 80486 was fast enough and could run
existing software and the 80860 thus had only two generations of implementation.
Intel i960 FamilyARM Family
Acorn Computer, founded in 1978, produced a number of personal computers, primarily
sold in the United Kingdom. Initial computers were built around the 8-bit 6502 CPU,
the same CPU used in the Apple II and Commodore 64 computers.
Intel Itanium Family
Intel had created several potential replacements for x86 before 1990. The
replacements all failed to replace the x86, though some of them found niche
success.
The i432 architecture in the late 1970s failed on launch, delivering a bad combination
of expensive and slow.
In the mid-1980s Intel decided to try again and produced the i960 chip in early 1986.
Then, in 1989, with RISC architectures all the rage and Intel released its own RISC, the i860.
The i860 wound up competing with the 80486 from Intel and the 80486 had a huge installed
base and pushed the clock speed much higher than the i860 ever managed.
Intel did not give up on replacing the x86 architecture. In the late 1990s several
concerns from HP and Intel converged:
Hewlett-Packard was realizing the future PA-RISC development would get
more and more expensive, but the PA-RISC market was not growing. Eventually
further development costs for a niche CPU would be economically non-viable.
There was growing concern that further Out-of-Order performance scaling would
be poor. A variation of VLIW was seen as a way to get around this. HP
was exploring VLIW architectures.
Intel was unhappy about the existence of x86 clone vendors, especially AMD.
The x86 architecture was limited to 32-bit addresses and this was expected
to be a problem in the near future. A 64-bit machine would address this problem.
One solution to many of these problems was:
For Intel and HP to jointly develop a new CPU architecture. Intel would
be responsible for manufacturing the chips.
The architecture would be VLIW instead of Out-of-Order Superscalar.
The architecture would be 64-bit rather than 32-bit.
The instruction set could be protected by patents that would prevent clones.
The result was Itanium, announced in 1994. The first chip, Merced, was to ship in 1999, but
eventually slipped to 2001. In addition, the first chip delivered substantially worse
performance than desired.
Meanwhile, AMD had been working on a 64-bit extension to x86. AMD had no other
realistic choice if it wanted to remain in the high-end CPU business: Intel would
not license the Itanium architecture since one of the points to Itanium was to prevent
clones and a brand new 64-bit architecture from AMD would not generate enough industry
support. This 64-bit extension work was announced in 1999, the specfication
released in 2000 and the first chip implementing the 64-bit x86 architecture, Opteron,
released in 2003.
The combination of poor Itanium performance and good Opteron performance created
a large dis-incentive to Itanium adoption. For customers, x86 backward compatibility
was a desireble feature and multiple vendors (Intel and AMD) were also a good thing.
Without a performance advantage, Itanium offered nothing to existing x86 customers
that 64-bit x86 chips did not and Itanium only made substantial sales into the Unix
workstation and server market. Eventually, HP was the only substantial Itanium
customer and the sales volume was not large enough for Intel to want to remain in
the business. The last Itanium shipments were in 2021.
MIPS Family
From 1981 to 1984 John L. Hennessy's research group at Stanford worked
at creating a RISC (Reduced Instruction Set Computer) CPU that would
be competitive with commercial CPU offerings. An early paper on the Stanford
MIPS implementation ("MIPS: A Microprocessor Architcture (1982)") provided
the following comparison against a Motorola 68000 CPU when running the
Puzzle Benchmark:
Motorola 68000
MIPS
Transistor Count
65,000
25,000
Clock speed
8 MHz
8 MHz
Data path width
16 bits
16 bits
Static Instruction Count
1300
647
Static Instruction Bytes
5360
2588
Execution Time (sec)
26.5
6.5
This research led to the formation of MIPS Computer Systems and the
MIPS commercial CPUs beginning with the R2000 in 1986. The chip's
primary competition at the time of introduction was Intel's 80386
and Motorola's 68020. The R2000 chip and future MIPS chips were used
in numerous workstations, but as the workstaton market consolidated
MIPS lost customers.
In 1992 MIPS Computer Systems was acquired by SGI (Silicon Graphics). The
New York Times reported on March 13, 1992:
Silicon Graphics Inc., the leading maker of the computer work stations that
engineers, architects and movie artists use to fashion three-dimensional
images, said today that it was buying MIPS Computer Systems Inc. in a
stock swap valued at about $406.1 million.
The merger will unite two of Silicon Valley's leading electronics companies
to create an enterprise with revenues approaching $1 billion.
MIPS designs the microprocessors that serve as the brains for many of the
most powerful desktop computers, including the work stations made by
Silicon Graphics. Once hailed as the next Intel Corporation for its
advanced designs, MIPS has suffered from inconsistent profits and
employee defections, including the departure last month of the company's
president, Charles M. Boesenberg.
Customers have been departing as well. Prime Computer and Groupe Bull of
France recently stopped making computers based on MIPS chips. And Digital
Equipment, MIPS's largest customer, has said it will begin producing a
competing microprocessor of its own.
Silicon Graphics is so dependent on the MIPS microprocessor that it has
already gone so far as to develop its own version of the chip, and was
reported to be in talks with Toshiba of Japan about manufacturing it.
...
The close relationship between Silicon Graphics and MIPS dates to the
early 1980's, when the founders of both companies were all professors
at Stanford University. Silicon Graphics was the first customer for
MIPS's chip designs.
In addition to SGI workstations, MIPS chips saw some success in the
game station (the Sony PlayStation 2 and Nintendo 64 were built on MIPS CPUs)
and embedded markets.
At the end of the 1990s SGI moved away from the MIPS architecture in favor
of Intel's Itanium which ended the MIPS architecture's future as a high
end workstation and server chip. SGI went bankrupt in 2009 and the MIPS
business, by then almost entirely focused on embedded markets was sold off.
Except ... that China had picked up the MIPS architecture with the Loongson
family of chips, beginning with the Godson-1 in 2002. Loongson is still being
developed today for Chinese internal use.
NEC V FamilyNational Semiconductor 32000 Family
National Semiconductor had produced microprocessors since the early 1970s.
The multi-chip 16-bit IMP-16 came out in 1973. In 1974 National
released a single chip implementation, the PACE (and in later yeas the
INS8900). Neither sold well and the major 16-bit microprocessors in the
late 1970s were the TMS9900, 8086/8088 and Z8000.
National was not ready to give up and in 1982 produced the 32016. The 32016
was not backward compatible with the earlier IMP-16 and PACE chips, though
with little installed base for those chips this was not important.
Unfortunately for National, the 68000 had been out for a few years before the
32016 and the 32016 sold poorly. Things did not improve with later chips
in the family and National discontinued the line after the 32532.
As the 680x0 family petered out in the late 1980s, Motorola simplified the
680x0 family into the Coldfire family of chips and targeted the embedded market
with the Coldfire. National did something similar with the 32000 family
and the result was the Swordfish family of CPU/DSP chips. Showing some awareness
of the CPU environment at the time, National advertised these chips as RISC and
the Swordfish chips turned the 32000 instructions into internal VLIW instructions
(much like Intel eventually turned x86 instructions into something much closer
to RISC internal instructions).
HP PA-RISC Family
HP delivered its first minicomputer, the HP 2100, in 1966. Initially intended
to support HP instruments, the company discovered that businesses were purchasing
the computers for normal business computer applications and HP found itself
to be a general minicomputer vendor.
Over time, HP found itself with a number of incompatible computer architectures
(much like IBM before the System/360 project and much like DEC before VAX) and
HP decided to consolidate them into a single, new architecture. This was
PA-RISC.
IBM POWER FamilySun SPARC Family
SPARC Microprocessor Oral History Panel Session One Origin and EvolutionInmos Transputer FamilyJapan Tron FamilyDEC VAX Family
DEC (Digital Equipment Corporation) introduced its 32-bit VAX minicomputer
in 1977. The first member of the computer family, the VAX 11/780, had
a CPU consisting of 21 distinct cards (and so was not a microprocessor).
In 1985 DEC introduced the first microprocessor implementation of the VAX
architecture with the MicroVAX 78032. Eventually even the mainstream
VAX minicomputers were built with microprocessor CPUs.
VAX performance failed to scale competitively against RISC CPUs and even
against x86 CPUs and DEC eventually replaced the VAX architecture with
DEC's homegrown Alpha RISC architecture.
Western Electric 32000 FamilyFairchild Clipper Family
Clipper was not Fairchild's first CPU — that would be the 8-bit Fairchild F8
released in 1975. Fairchild does not seem to have produced a 16-bit CPU and in
the mid-1980s released the Clipper C100. The chip did not have commercial success,
competing with established chips such as the 80386 and 68030 as well as RISC chips
such as SPARC and MIPS.
The single large Clipper customer, Intergraph, purchased the Clipper division from
Fairchild after the C100 had shipped. A few more generations of Clipper were developed
before the architecture was abandoned.
Intel 80x86 CPUs
The Intel 80386 was a 32-bit microprocessor backward compatible with
the 16-bit 80286 (introduced in 1982). The 80286 was backward compatible
with the 16-bit 8086 chip (introduced in 1978). And the 8086 was
assembly source compatible with the earlier 8-bit 8080, which meant
that 8080 assembly programs could be re-assembled to run on the 8086
even though 8080 object code would not run on the 8086.
The 8080 was introduced in 1974, so by the mid-1980s Intel was building
a 32-bit microprocessor that had binary compatibility with 16-bit CPUs
from 1978 and assembly compatibility with one of the earliest 8-bit processors
ever.
Some of the weirdness and limitations can be explained by this lineage.
Through the last years of the 1980s the 80386 and its descendent chip the
80486 were competing — to the extent that they were competing with any other
chips — with the Motorola 680x0 family of chips. This competition manifested
itself as IBM PCs running DOS vs Apple Macintosh computers.
By 1990 the 680x0 family had failed to scale performance competitively with
the x86 family and in 1991 Apple announced that it was switching the Macintosh
computer line away from the 680x0 to the (new-ish) PowerPC architecture.
Also, by 1990 RISC CPUs had appeared — PowerPC was one such RISC —
and Intel had to worry about RISC CPUs such as MIPS and SPARC as well as PowerPC
competing for x86 chip sales.
The Intel Pentium Pro, introduced in 1995, showed that Intel could ship an x86
compatible CPU that was integer performance competitive with the best RISC chips
available. The Pentium Pro pretty much ended the hopes of RISC partisans that
the x86 would be replaced with a cleaner/prettier architecture: the Intel x86
performance parity combined with lower prices and backward compatibility to
all of the DOS and Windows codebase made switching to RISC not viable.
Intel still had competition, however, because AMD was shipping x86 compatible
CPUs of its own. In 1999 AMD shipped its Athlon line of chips which was more than
competitive with Intel's, then current, Pentium III family of chips.
AMD 80x86 CPUs
Intel had a second source agreement with AMD which allowed AMD
to manufacture 8086, 8088 and 80286 Intel CPU designs. Beginning
with the 80386 Intel refused to deliver chip designs for AMD.
AMD eventually reverse engineered the Intel chip and the Am386 was
the AMD reverse engineered version of the 80386. Because the
reverse engineering was at the transistor level, the chips behaved
identically. AMD did the same thing to reverse engineer the 80486.
TODO ...
Other 80x86 CPUsMotorola 68000
The Motorola 68000 was one of the first commercially available 32-bit
microprocessors. Logically it was a 32-bit machine (32-bit integer registers,
32-bit addresses) though physically it was not. Only 24-bits of physical
addressing were provided so no more than 16 MB of physical DRAM could be addressed.
In 1979 this was not a practical limit — the IBM 3033 mainframe of the time limited users
to no more than 8 MB of physical DRAM (and most 64-bit processors don't support 64-bits
worth of physical DRAM. More important to performance, the data bus to DRAM was only
16 bits wide (which kept costs down, though 32-bit loads would take two cycles) and the
ALU was physically only 16-bits wide. Motorola would sometimes claim that it was a 16/32-bit
CPU.
The 68000 was the CPU selected for many inexpensive Unix workstations including the
Sun-1 (1982), the HP Series 200 workstations, the Apollo DN100 and
the SGI Iris 1000 series workstations. The first Apple Macintosh computer
used a 68000 as did the Commodore Amiga.
Introduced: 1979 (shipments in late 1980)
In-order
Integer Pipeline: 3 stages
User Visible Integer Registers: 16
On-chip Caches: No
FPU: No
1979: 4, 6, 8 MHz
1981: 10 MHz
1982: 12.5 MHz
68,000 transistors
Die Size: ~50 mm2 (6.24 by 7.14 mm) @ 3.5 µm process
Motorola 68010
The 68010 was a slight tweak to the 68000. It mostly appeared the same to programmers.
It was not popular.
Introduced: 1982
Motorola 68020
The 68020 was a fully 32-bit CPU: 32 bit addresses (both logical and physical) and 32 bit
data paths. The ALU was 32-bits wide and thus twice as fast per clock as the 68000.
The Macintosh II family used the 68020 as did Sun-3 family of workstations.
The 68020 introduced more addressing modes for instructions (thus, in some sense,
making the chip even more CISC-y than the original 68000). The more complicated
addressing modes are often blamed for the 680x0 family's failure scale the clock
speed as well as the 80x86 family and also for making a super-scalar implementation
of the 680x0 family (needed to compete with the Intel Pentium) excessively challenging.
By 1994 Intel had pushed the clock speed of the 2-issue super-scalar Pentium to 120
MHz. In 1994, when Motorola released the first 2-issue super-scalar 680x0 CPU —
the 68060 — they had a chip that ran at 50 MHz, and never exceeded 75 MHz.
Motorola 68030
Basically a 68020 with an on-chip memory manager and a faster clock rate.
The Macintosh II family used the 68030 as did Sun-3 family of workstations
(both also used the 68020).
From the New York Times, Sept. 19, 1986:
Motorola Inc. stepped up its increasingly heated race with the Intel
Corporation yesterday, introducing a new generation of microprocessors
that it said could put the power of a mainframe computer on a single
chip. The new chip, called the Motorola 68030, comes only two and a half
years after the introduction of the company's current top-of-the-line
microprocessor, the 68020.
Motorola said the new chip would offer nearly twice the performance of
its predecessor but would not be delivered until March 1987.
Motorola's new entry comes just a week after Intel's top-of-the line
microprocessor was incorporated for the first time in an I.B.M.-compatible
personal computer, made by the Compaq Computer Corporation.
"Our effort is to maintain a performance advantage of 300 to 400 percent
over the competition," said Murry Goldman, a Motorola senior vice president.
The Intel 80386 adequately matched up against the 68030 (transistor count,
frequency, external bus widths, both chips had 3-stage in-order integer
pipelines), though the 68030 did have the advantage of a pair of small
on-chip caches. This was not good for Motorola given the 80386's installed
base advantage with DOS.
Motorola 68040
The 68040 was a new 680x0 micro-architecture and was fully pipelined and
thus could, at peak, sustain one instruction per clock cycle. Apple
used the 68040 in the Macintosh Quadra. Sun had introduced the
SPARC based Sun-4 by this time and never released a 68040 based
computer. The NeXTstation shipped in 1990 with a 68040.
The 68040 matched up against the Intel 80486 in the Mac vs PC competition
and while the two chips were evenly matched in 1990, the 80486 had been
pushed to 66 MHz (with a 33 MHz bus) by 1992 while the 68040 never exceeded
40 MHz.
Introduced: 1990
Pipelined, In-order
Integer Pipeline: 6 stages
User Visible Integer Registers: 16
On-chip Caches: 4 KB I-cache; 4 KB D-cache
FPU: On-chip
25 MHz - 40 MHz
~1.2M transistors
Wikipedia: Motorola 68040
NXP: User ManualMotorola 68060
This was the final Motorola 68K microprocessor. Apple, The sole remaining large volume desktop
computer using the 68K family, replaced the 68K with PowerPC rather than move to the 68060.
Microcontroller applications using 68K CPUs slowly migrated to the
Coldfire family.
Coldfire was basically a simplified 68K (e.g. remove BCD instructions, reduce the
supported addressing modes).
Interestingly, a few decades after the 68060 an out-of-order 68080
called the Apollo
68080 was produced by a single individual .
AT&T Bellmac 32A
This chip was supposed to come out in 1980 as the Bellmac 32, but
the frequency was an unacceptable 2 MHz. The 32A revision had
acceptable performance but arrived two years later.
18 addressing modes and 169 'basic intructions'.
Western Electric WE32100
From the AT&T WE® 32-Bit Microprocessors and Peripherals Data Book (1987):
The WE 32100 Microprocessor (CPU) is a high-
performance, single-chip, 32-bit central processing
unit designed for efficient operation in a high-level
language environment. It performs all the system
address generation, control, memory access, and
processing functions required in a 32-bit
microcomputer system. It has separate 32-bit
address and data buses. System memory is
addressed over the 32-bit address bus by using
either physical or virtual addresses. Data is read
or written over the 32-bit bidirectional data bus in
word (32-bit), halfword (16-bit), or byte (8-bit)
widths. Extensive addressing modes result in a
symmetric, versatile, and powerful instruction set.
The WE 32100 Microprocessor is available in 10-,
14-, and 18-MHz versions...
Features:
Powerful and versatile instruction set
On-chip instruction cache
4 Gbytes (232) of direct memory addressing
Physical and virtual addressing
Coprocessor interface
DMA and multiprocessor environment support
15 levels of interrupts
Autovector and nonmaskable interrupt facilities
Four levels of execution privilege: kernel, executive, supervisor, and user
Synchronous or asynchronous interfacing to external devices
Nine 32-bit general-purpose registers
Seven 32-bit special-purpose registers
Memory-mapped I/O
Complete floating-point support via the WE 32106 Math Acceleration Unit
SAN FRANCISCO — AT&T's Bell Laboratories recently announced a
second-generation, 32-bit microprocessor said to take advantage of
the Unix operating system and planned to be used in a series of
AT&T 3B computers.
The chip, called the WE 32100, is the successor to the WE 32000,
which was formerly known as the Bellmac-32A. It reportedly contains
180,000 transistors, about 30,000 more than the earlier chip, and uses
Cmos technology.
"We have integrated many support functions onto the chip," said
Hing C. So, head of the Very Large Scale Integration Department at Bell
Labs. "We have also added features such as an internal cache memory
and have increased the speed of the chip."
The cache memory reportedly can hold 64 32-bit words, and repeatedly
used program instructions can be stored there so there is no waiting
time, a spokesman said.
The WE 32100 reportedly operates at speeds up to 14 MHz, is packaged
in a 132-pin package and dissipates approximately 1.9W in worst-case
conditions.
Western Electric WE32200
From the AT&T WE® 32-Bit Microprocessors and Peripherals Data Book (1987):
The WE 32200 Microprocessor (CPU) is a high-performance, single-chip, 32-bit central
processing unit designed for efficient operation
in a high-level language environment. It is
protocol and upward object code compatible
with the WE 32100 Microprocessor. The WE
32100 CPU runs object code without
modification on the WE 32200 CPU. The WE
32200 CPU performs all the system address
generation, control, memory access, and
processing functions required in a 32-bit
microcomputer system. It has separate 32-bit
address and data buses. System memory is
addressed over the 32-bit address bus by using
either physical or virtual addresses. Data is
read or written over the 32-bit bidirectional data
bus in word (32-bit), halfword (16-bit), or byte
(8-bit) widths, using arbitrary byte alignment for
data and instructions. Dynamic bus sizing
allows the WE 32200 CPU to communicate with
both 16-bit and 32-bit memories in the same
system. Extensive addressing modes result in a
symmetric, versatile, and powerful instruction
set. The WE 32200 Microprocessor is available
in 24-MHz and higher frequency ...
Features:
32-bit virtual memory microprocessor with 4 Gbytes
(232) virtual memory space and up to 4
Gbytes physical addressing space
Efficient execution of high-level language programs
Extensive and orthogonal instruction set with 25 addressing modes
Arbitrary byte alignment for data and instructions
Direct support for process-oriented operating systems, such as UNIX System V
WE 32100 Microprocessor object code and protocol compatible
Four levels of execution privilege: kernel, executive, supervisor, and user
Fifteen levels of interrupt
High-performance, on-chip, 64 x 32 bit instruction cache
Byte replication on writes
Dynamic bus sizing for 16-bit and 32-bit data and instructions
Seventeen 32-bit general-purpose registers
Eight 32-bit general-purpose privileged registers
Seven 32-bit special-purpose registers
Memory-mapped I/O
Complete ANSI/IEEE Standard 754 floating-point support via the WE 32206 MAU Coprocessor
Development system support signals
General-purpose coprocessor interface
Synchronous or asynchronous interfacing to external devices
High priority two-wire bus arbitration through relinquish and retry
From the Dec 2, 1986 New York Times:
The American Telephone and Telegraph Company introduced a family
of computer chips yesterday that includes a faster microprocessor
than any yet brought to market.
Industry analysts said that, despite the chip's speed, its prospects
were somewhat limited because A.T.&T. lacked relationships with computer
makers that would be likely to incorporate the chip into their computer designs.
The WE 32200 microprocessor and related chips would be used to make
top-of-the-line microcomputer systems that could handle several users
doing several jobs at once, A.T.&T. said. A microprocessor is the brain
inside personal computers and engineering work stations. The chips are
taking work away from minicomputers and mainframes because they do some
of the same jobs far more cheaply.
The WE 32200 will cost $500 each in quantities of 100 and is scheduled to
be available in production quantities in the second half of 1987, A.T.&T.
said. The chip will run at speeds starting at 24 million cycles a second
and will be available at 30 million cycles a second by the end of 1987,
the company said.
National Semiconductor Corporation (NSC) 32032National Semiconductor Corporation (NSC) 32332
This shipped in 1985 and lost out to the 68020 and 80386.
National Semiconductor Corporation (NSC) 32532
By the time the 32532 shipped, in 1987, it had lost in the market.
The NSC 32K family evolved into the Swordfish CPU family.
From the May-1991 32532 data manual:
General Description
The NS32532 is a high-performance 32-bit microprocessor
in the Series 32000 family. It is software compatible with
the previous microprocessors in the family but with a greatly
enhanced internal implementation.
The high-performance specifications are the result of a four-stage
instruction pipeline, on-chip instruction and data caches,
on-chip memory management unit and a significantly increased clock
frequency. In addition, the system interface provides optimal
support for applications spanning a wide range, from low-cost,
real-time controllers to highly sophisticated, general purpose
multiprocessor systems.
The NS32532 integrates more than 370,000 transistors fabricated
in a 1.25 mm double-metal CMOS technology. The
advanced technology and mainframe-like design of the device
enable it to achieve more than 10 times the throughput
of the NS32032 in typical applications.
In addition to generally improved performance, the
NS32532 offers much faster interrupt service and task
switching for real-time applications.
Features
Software compatible with the Series 32000 family
32-bit architecture and implementation
4-GByte uniform addressing space
On-chip memory management unit with 64-entry translation look-aside buffer
4-Stage instruction pipeline
512-Byte on-chip instruction cache
1024-Byte on-chip data cache
High-performance bus
Separate 32-bit address and data lines
Burst mode memory accessing
Dynamic bus sizing
Extensive multiprocessing support
Floating-point support via the NS32381 or NS32580
1.25 mm double-metal CMOS technology
175-pin PGA package
370,000 transistors
Die Size: 11.5 mm x 14 mm
Transputer T414 & 425
Transputers were unusual CPUs because transputers were expected
to run in (inexpensive) clusters. Each transputer came with 4 serial
links that could communicate with 4 other transputers. In addition,
the chips came with a simple scheduler as part of the chip. The upshot
was that transputers were expected to be programmed as highly concurrent
systems where code could run on a single transputer with the scheduler
arranging for the concurrent portions to share the chip but the same code
was intended to also run on a cluster of transputers connected via the
serial links. Greater performance was to be achieved by linking larger
clusters of tranputers together.
Transputer T800
The T800 increased the on-chip RAM of the T414 from 2 KB to 4 KB
and added an on-chip floating point unit (as Intel added a floating
point unit to the 80486 in 1989).
Introduced: 1987
in-order stack based CPU
Full 32-bit CPU
20, 25 MHz
4 KB on-chip RAM (not cache)
On chip FPU
Transputer T9000
The T9000 added a cache and was intended to be 10×
faster then the T800. It missed its performance target
and was eventually cancelled. I am unclear if it was
ever commercially released.
Introduced (?): 1993
Pipelined superscalar CPU
Full 32-bit CPU
50 MHz
16 KB cache
On chip FPU
Intel 80386
The 80386 was not Intel's first 32-bit CPU — that honor belongs to the i432.
The 80386 was, however, a 32-bit extension of the 8086/8088/80286 line of 16-bit
CPUs. Because the 80386 was backward compatible with these other CPUs and because
these other CPUs were used in IBM PC and PC clones the 80386 came with a huge (for the time)
built in installed base.
In addition to having a built in installed base, Intel also benefitted from
unilaterally ending the 2nd source agreement that was in place with AMD for
earlier x86 CPUs. This meant that Intel was the sole supplier of 80386 chips until
AMD successfully reverse engineered the part (around 1991 with the Am386).
Being the sole source was financially very valuable for Intel.
Introduced: 1985
First 32-bit x86 CPU
In-order
Integer Pipeline: 3 stages
General Registers: 8 (EAX, EBX, ECX, EDX, EBP, ESP, ESI, and EDI)
Intel 80486
The pipelined 80486 enabled Intel to compete with the integer performance of the lower cost
RISC chips of the time: R3000 and various SPARC chips. It was also competitive against
the Motorola 68040 (released in 1990). The Intel 32-bit x86 architecture was uglier
than 68K, MIPS or SPARC but the integer performance was competitive, the price
was lower than the RISC chips and it was backward compatible with existing DOS software.
This was enough for Intel to maintain their IBM PC and clone dominance.
Intel Pentium
x86 clone vendors such as AMD, Cyrix, Rise, Centaur began naming their chips
after the Intel chips they were intending to match. Intel discovered that numbers
(e.g. 80486) could not be trademarked. Cyrix, for example, produced
the Cyrix Cx486SLC in 1992 with the '486' signalling that the chip was intended
to be used where Intel 80486 chips would be used.
To make this chip-to-chip matching more difficult Intel didn't give the
successor to the 80486 the obvious 80586 name but instead called it a 'Pentium'.
This was only partially successful as everyone knew that '586' was the logical
successor to 486 and so Cyrix eventually released the Cyrix 5x86 and Cyrix 6x86.
Which Intel chips they were supposed to compete with was obvious. NexGen produced
the Nx586. AMD produced the K5 and then the K6.
The 80486 showed that a CISC chip could be pipelined and execute one instruction
per clock cycle, just like the RISC chips. Pentium showed that a CISC chip could
also be super-scalar by executing multiple instructions per clock cycle. The
680x0 family was no longer a competitor as the workstation vendors had all shifted
to RISC chips and Apple was switching to PowerPC. Pentium allowed Intel to continue
to match up competitively with RISC.
It is interesting to note that the Pentium die size of almost 300 mm2
is about 3× that of the 80386. Intel may have had little choice if it wanted
to remain performance competitive, but also Intel began producing chips on 200
mm wafers in 1992 so the cost of ~300 mm2 Pentium chips compared to
~100 mm2 80386 chips manufactured on 150 mm wafers might be less bad
than it appears.
Intel Pentium Pro
With the Pentium Pro Intel demonstrated that it could compete with ALL
the microprocessor competition. The Pentium Pro briefly held the integer
performance crown, even against workstation RISC CPUs that sold in much
more expensive computers.
Vendor
Chip
Speed
SpecInt 95
Intel
Pentium Pro
200 MHz
8.09
DEC
Alpha 21164
300 MHz
7.43
DEC released a 350 MHz Alpha 21164 to retake the performance crown in early
1996, but the Pentium Pro had effectively put an end to the dream
that the (more elegant) RISC CPUs would maintain and grow a performance lead
over the (ugly) x86 CISC architecture CPUs. By mid-2000 the top integer
performance chips by architecture were:
Vendor
Chip
Speed
SpecInt 95
DEC
Alpha 21164
833 MHz
50.0
Intel
Pentium III
1000 MHz
46.8
AMD
Athlon
1000 MHz
42.9
HP
PA RISC 8600
552 MHz
42.6
The Pentium Pro was expensive for an x86 CPU because the large L2 cache
was in a second chip. Intel corrected this with the relatively budget
Pentium-II that shipped a few years later.
Introduced: 1995
Super-scalar (3-issue), Pipelined, Out-of-order
Integer Pipeline: 10-14 stages (up from five for the Pentium)
Intel Pentium II
The Pentium II was a slightly ungraded Pentium Pro. The upgrades included:
Slower but less expensive off-chip L2 caches
A larger L1 cache (16+16 KB vs 8+8 KB for Pentium Pro)
Better 16-bit performance (important for lots of existing code)
Additionally, the Pentium-II clocked faster than the Pentium Pro.
Introduced: 1997
Pipelined, super-scalar (3-issue) out-of-order CPU
7.5M transistors
16+16 KB L1 D+I cache
512 KB L2 cache
233 MHz - 450 MHz
Intel Pentium III
Pentium III could easily have been more Pentium IIs. Mostly
because of branding Intel called this Pentium III.
Introduced: 1999
Pipelined, super-scalar (3-issue) out-of-order CPU
16+16 KB L1 D+I cache
256 or 512 KB L2 cache
450 MHz - 1,000 MHz (failed at 1,133, then eventually 1,400)
SSE vector instructions
Intel Pentium 4
Pentium 4 was a new x86 micro-architecture, NetBurst, rather than an extension
of the Pentium Pro. Pentium 4 was designed to eventually hit 10 GHz
(it didn't!) and as a result of this the chip was very good at running
branchless code and code that could fit inside its small 8 KB L1 data
cache. For branchy code, including many legacy applications, the Pentium
4 was only competitive with contemporaneous AMD chips if the Pentium 4 had a substantial
clock speed advantage.
To achieve the high frequency required for acceptable performance the
Pentium 4 had many more pipeline stages than previous Intel CPUs. Initially
20 stages (in Willamette and Northwood), then 31 stages (in Prescott and Cedar Mill).
Rather than an L1 Instruction cache, the Pentium 4 had a Trace Cache. This
Trace Cache seems to be a hardware implementation of the traces constructed
by the Compiler for the VLIW computer Multiflow Trace computer systems.
Introduced: 2000
AMD Am386
The Am386 was a transistor-for-transistor reverse engineered version
of the Intel 80386. Prior to the 80386 Intel and AMD had a cross-licensing
agreement (insisted on by customers to mitigate the risk of a fab
just 'losing' the ability to successfully manufacture chips). With the
80386 Intel refused to provide AMD with the necessary information to
manufacture the chip. In 1987 AMD filed a lawsuit against Intel. A year
later AMD began reverse engineering Intel's 80386 and in 1991 AMD was able
to ship reverse engineered 80386 chips.
Because of how AMD reverse engineered the chip it is an exact copy of Intel's
80386, but AMD was able to clock their 80386 up to 40 MHz
while the Intel 80386 topped out at 33 MHz.
Introduced: 1991
Up to 40 MHz (vs Intel 80386 33 MHz)
AMD Am486
As with the Am386, the Am486 was a transistor-for-transistor reverse engineered version
of an Intel CPU, in this case the Intel 80486.
AMD shipped their Am486 in 1993, four years after Intel's 80486.
Introduced: 1993
25 MHz - 120 MHz (vs Intel 80486 20 - 100 MHz)
AMD Am5x86
In spite of the name, the AMD Am5x86 was essentially an upclocked Am486
with more cache.
Introduced: 1995
133 MHz max frequency
AMD K5
AMD K6
AMD K6-2
AMD Athlon
An out-of-order, three-way superscalar x86 microprocessor
with a 15-stage pipeline, organized to allow 600-MHz operation,
can fetch, decode, and retire up to three x86 instructions per
cycle to independent integer and floating-point schedulers.
The schedulers can simultaneously dispatch up to nine operations
to seven integer and three floating-point execution resources.
The cache subsystem and memory interface minimize effective memory
latency and provide high bandwidth data transfers to and from these
execution resources. The processor contains separate instruction and
data caches, each 64 KB and two-way set associative. The data
cache is banked and supports concurrent access by two loads or
stores, each up to 64-b in length. The processor contains logic
to directly control an external L2 cache. The L2 data interface
is 64-b wide and supports bit rates up to 2/3 the processor
clock rate. The system interface consists of a separate 64-b
data bus.
The die, shown in Fig. 1, is 1.84 cm and contains 22
million transistors. Table I shows the technology features. C4
solder-bump flip-chip technology is used to assemble the die
into a ceramic 575-pin ball grid array (BGA). Measurements
are from initial silicon evaluation unless otherwise stated.
A Seventh-Generation x86 Microprocessor from IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999
AMD Athlon64
AMD K8 (Hammer)
AMD K10 (Barcelona)
AMD Bulldozer
AMD Pildriver
AMD Steamroller
AMD Excavator
AMD Zen
AMD Zen2
AMD Zen3
AMD Zen4
Chips and Technologies Super386
Cyrix Cx486
NexGen Nx586
Cyrix Cx586
Cyrix Cx686
Rise mP6
Transmeta Crusoe
Transmeta was a company that built an x86 compatible CPU using a VLIW
processor as the base hardware with software to translate x86 instructions
to instructions for the native VLIW. Because the external software
doesn't see the VLIW, Transmeta could change the underlying hardware between
generations as long as the new hardware also shipped with new translation
software.
Transmeta targeted the low power x86 market (e.g. tablets and power optimized
laptops). It eventually failed when, in 2003, Intel began shipping
Pentium-M x86 chips explicitly targetting low power.
MIPS R2000
MIPS produced the R2000, their first CPU, in 1986. Unlike Sun with
SPARC chips and SPARC workstations, MIPS produced chips but not products
that used those chips.
Introduced: 1986
110,000 transistors
80 mm2
8.3, 12.5 and 15 MHz
Scalar
MIPS R3000
This chip was used by a number of vendors including DEC (for the DECstation
line ... which at the time was competing with the DEC VAXstation line and
would soon also be competing with the AXP line of products built around the
DEC Alpha chip) and Silicon Graphics.
Introduced: 1988
115,000 transistors
48 mm2
20 - 33.3 MHz
Scalar
MIPS R4000
With the R4000, MIPS extended the MIPS architecture to 64-bits. MIPS
extended the instruction pipeline to eight stages, which allowed higher
clock speeds.
Introduced: 1991
1.2 million transistors
8 KB I-cache, 8 KB D-Cache
100 MHz
On-chip IEEE754 FPU
MIPS R8000MIPS R10000
The MIPS R10000 came out in 1997, which was SGI's year of highest
revenue: $3.7B (as a comparison point, Sun Microsystems had
$8.6B in revenue that year. By 2000 Sun revenue had grown to
$15.7B).
In 1998 SGI decided that it could not longer compete with x86 and
decided that newer machines would be based on Intel's (expected to be
available in 1999) Itanium chips. CNet author Michael Kanellos contributed
this on April 9, 1998:
Silicon Graphics scraps MIPS plans
Silicon Graphics (SGI) has quietly scrapped ambitious plans for its MIPS
processors and is now following a much more limited road map calling for
fewer design improvements, according to sources close to SGI.
The last major MIPS release for servers and workstations is
scheduled for 1999, sources said.
MIPS roadmap
Model
Speed
Status
R10000
250 MHz
available now
R12000
300 MHz
due mid 1998
R14000
400 MHz
due 2H 1999
"Beast"
N/A
canceled
"Capitan"
N/A
canceled
SGI will continue to boost processor speeds and incorporate the chip in
its high-end computers, but will likely start the phasing out the processor
in 2001 as 64-bit chips from Intel become more pervasive and powerful.
The R10000 was the last new SGI designed MIPS core. Derivative cores were
released in later years:
Year
Chip
1998
R12000
0.25 µm; 270, 300 and 360 MHz
2000
R12000A
0.18 µm; 400 MHz
2001
R14000
0.13 µm; 500 MHz
2002
R14000A
0.13 µm; 600 MHz
2003
R16000
0.11 µm; 600 - 700 MHz
2004
R16000A
800 - 900MHz
In May 2001 SGI released its first Itanium based workstation, the SGI 750.
The SGI 750 had terrible performance (because the first Itanium chip had
terrible performance) and was discontinued by the end of 2001. For all of
2002 SGI had no Itanium product to sell, though it had made clear to customers
that the MIPS line was dead. In January 2003 SGI released the Itanium2 based
Altix 3300 and Altix 3700 computers.
In 2003 SGI left its Mountain View headquarters and leased the buildings to Google.
In 2006 SGI filed for bankruptcy.
Year
Revenue
FY 1995
$2.2B
FY 1996
$2.8B
FY 1997
$3.7B
FY 1998
$3.1B
FY 1999
$2.7B
FY 2000
$2.3B
FY 2001
$1.9B
FY 2002
$1.3B
($46.3)
FY 2003
$1.0B
($129.7)
FY 2004
$0.8B
FY 2005
$0.7B
Fujitsu MB86900
The first SPARC microprocessor, used in the Sun-4 workstation. The implementation
was built on two 20,000 gate Fujitsu gate array chips!
Introduced: 1986
LSI L64801
SPARC microprocessor, used in the SPARCStation workstation.
Introduced: 1989
Cyprus CY7C601
SPARC microprocessor, used in the SPARCStation workstation.
Introduced: 1990
Scalar, Pipelined, In-order
Integer Pipeline: 4 stages
User Visible Registers: 32 integer registers
FPU: Off-chip
25, 33, 40 MHz
SuperSPARC I
This CPU was used in the SPARCstation 10 and 20. This was Sun's first
superscalar CPU, but Sun screwed up the design and it clocked MUCH
lower then expected/desired. This chip came out one year before Intel's
2-issue super-scalar Pentium, but the Pentium clocked much faster.
Also, DEC released the 100+ MHz 2-issue superscalar Alpha 21064 in 1992, but,
fortunately for Sun, the Alpha had no installed base.
Sun marketing had quite a challenge for a few years because of the low SuperSparc
frequency (and thus performance).
Introduced: 1992
Super-scalar (3-issue), Pipelined, In-order
Integer Pipeline: 8 stages (very high for frequency ...)
On-chip Caches:
16+20 KB L1 D+I cache
FPU: On-chip
33 - 40 MHz @ 0.8 µm (this was a disappointment)
3.1M transistors
Die Size: 256 mm2 (16x16)
HP NS1
With the NS1 HP got a PA-RISC CPU implementation on a single chip, though the
CPU needed support chips
Before to the NS1 was the 1986 CPU TS1. The TS1 was the first PA-RISC CPU,
but it wasn't a microprocessor. From openpa.net:
The TS-1 was the first PA-RISC production processor, introduced in 1986.
It integrated version 1.0 of PA-RISC on six 8.4×11.3" boards of TTL and
was used in HP 9000 840 servers, the first PA-RISC computers.
The TS1 was an 8 MHz CPU with a 3-stage integer pipeline more comparable to
the VAX minicomputers of the time than to the 68020, 80386 or MIPS R2000.
HP's Precision architecture is RISC-y, but less 'ideological' about
RISC-ness than MIPS and SPARC. This may be because HP had more experience
designing commercial CPU architectues than the MIPS and SPARC folks did.
Or it may be because HP had some very specific/concrete loads in mind.
In any event, from "Hewlett-Packard Precision Architecture: The Processor":
The basic types of operations in most instruction sets fall into three
categories: data transformation operations, data movement operations,
and control operations. In general, one instruction performs one of
these operations. A combined instruction performs more than one of
these operations in one instruction. In HP Precision Architecture,
almost every instruction performs a combination of two of these operations
in a single cycle, with relatively simple hardware.
(emphasis added)
and ...
HP Precision Architecture is frequently referred to as a reduced instruction
set computer (RISC) architecture. Indeed, the execution model of the architecture
is RISC-based, since it exhibits the features of single-cycle execution and
register-based execution, where load and store instructions are the only
instructions for accessing the memory system. The architecture also uses the
RISC concept of cooperation between software and hardware to achieve simpler
implementations with better overall per formance.
HP Precision Architecture, however, goes beyond RISC in many ways, even in
its execution model. For example, RISC machines emphasize reducing the number
of instructions in the instruction set to simplify the implementation and
improve execution time. Only the most frequently used, basic operations
are encoded into instructions. How ever, frequency alone is not sufficient,
since some instructions may occur frequently because of inefficient code
generation, arbitrary software conventions, or an inefficient architecture.
In designing the next-generation architecture for Hewlett-Packard computers,
the intrinsic functions needed in different computing environments like data
base, computation intensive, real-time, network, program development, and
artificial intelligence environments were determined. These intrinsic functions
are supported efficiently in the architecture. Minimizing the actual number
of instructions is not as important as choosing instructions that can be
executed in a single cycle with relatively simple hardware. Complex, but
necessary, operations that take more than one cycle to execute are broken
down into more primitive operations, each operation to be executed in one
instruction. If it is not practical to break these complex operations into
more primitive operations, they are defined as assist instructions, by means
of the architecture's instruction extension capabilities. If more than one
useful operation can be executed in one cycle, HP Precision Architecture
defines combined operations in a single instruction, resulting in a more
efficient use of the execution resources and in im proved code compaction.
HP Precision Architecture's execution model has other noteworthy features
like its heavy use of maximal-length immediates as operands for the execution
engine, and its efficient address modification mechanisms for the rapid access
of data structures. The architecture also includes some uncommon functions
for efficiently supporting the movement and manipulation of unaligned strings
of bytes or bits, and primitives for the optimization of high-level language programs.
HP NS2HP PCXHP PA-7000HP PA-7100
2-issue superscalar
5 stage pipeline
100 MHz (125 MHz on the PA-7150)
HP PA-7200
140 MHz
HP PA-8000
By 1995 most of the major workstation CPU vendors were developing
out-of-order CPUs. Intel had even delivered one (which had surprisingly
good integer performance):
Architecture
Chip
Year
Intel x86
Pentium Pro
1995
HP PA-RISC
PA-8000
1996
MIPS
R10000
1996
SPARC
SPARC64
1996
IBM POWER
POWER3
1998
DEC Alpha
21264
1998
The sole exception was Sun developed SPARC CPUs. HAL Computer Systems
delivered an out-of-order SPARC CPU for Fujitsu in 1995, but Sun didn't
have an out-of-order CPU in its lineup until the SPARC T4 in 2011.
In many respects the PA-8000 appears similar to Intel's Pentium Pro:
Both are out-of-order (and the 1st out-of-order CPUs in their families)
Three instructions/clock max for the Pentium Pro and four for the PA-8000
180 - 200 MHz
Large off-chip caches
One interesting difference is that the Pentium Pro shipped with small on-chip
caches to compliment the off-chip cache and PA-8000 shipped with a larger
off-chip cache, no on-chip caches and a deeper out-of-order reorder queue.
HP explaines this in "The HP PA-8000 RISC CPU" from 1997:
Why did we design the processor without on-chip caches?
The main reason is performance. Competing designs incorporate
small on-chip caches to enable higher clock frequencies.
Small on-chip caches support benchmark performance but fade
in large applications, so we decided to make better use of
the die area. The sophisticated instruction reorder buffer
allowed us to hide the effects of a pipelined two-state cache
latency. In fact, our simulations demonstrated only a 5%
performance improvement if the cache was on chip and had
a single-cycle latency. A flat cache hierarchy also eliminates
the design complexity associated with a two-level cache.
Introduced: 1996
Super-scalar (4-issue), Pipelined, Out-of-order
On-chip caches: None
Off-chip cache: Up to 4 MB L1 (separate die)
FPU: On-chip
Up to 180 MHz
3.8M transistors
Die Size: 338 mm2 (not counting separate cache die)
Intel i860XPDEC Alpha 21064 (EV4)
In 1977 DEC launched the 32-bit VAX family of computers. VAX was wildly
successful and drove DEC's growth over the next decade. In the mid-1980s,
however, single chip RISC CPUs from Sun Microsystems and MIPS demostrated that
simple (and much more importantly, inexpensive!) chips could rival expensive
VAX CPUs.
Intel faced similar threats to its x86 line of chips, but Intel was able to
compete on integer performance with the 80486, which demonstrated that
x86 could be successfully pipelined, the Pentium, which demonstrated that
x86 could be made super-scalar and finally with the Pentium Pro, which demonstrated
that x86 could implement and Out-of-Order CPU.
VAX was not so fortunate. The complex VAX addressing modes led DEC engineers to
conclude that VAX could not be made performance competitive with RISC CPUs and
DEC concluded that VAX must be replaced with a RISC chip for DEC to remain
performance competitive.
DEC had previously explored RISC CPUs with the internal Prism project (1985 - 1988)
and with the DECStation product line released in 1989 that was built around MIPS
CPUs. DEC eventually decided that a DEC CPU was the thing to do and in 1992 DEC
delivered the first Alpha (named DECchip at the time) CPU, the 21064.
Intel McKinleyIntel MadisonIntel MontecitoIntel MontvaleIntel Itanium 9300 (Tukwila)Intel Itanium 9500 (Poulson)Intel Itanium 9700 (Kittson)Intel iAPX 432
In the late 1970s several companies were working on "object oriented processors."
One was the minicomputer company Data General. The "Fountainhead Project" CPU
that was the antagonist to the Eclipse project in Tracy Kidder's "Soul of a New Machine"
was an "object oriented processor." Intel's i432 was another (and, in fact, Data
General went to the trouble of writing a paper explaining how the Data General computer
was much faster than Intel's i432 CPU).
The i432 was intended to replace the 8086/8088 with a modern 32-bit architecture
but the x86 architecture took off when IBM selected the 8088 for the IBM PC
and the 80286 was a reasonable successor to that chip with backward compatibility
that the i432 did not have. The 80286 was eventually succeeded by the even more
successful 32-bit 80386.
The i432 is 'object oriented' in the sense that data is expected to be in well defined
regions and access checks are done on every data access — per-object data
scoping is enforced by hardware!
The i432 had hardware support for garbage collection because the expectation was
that the code running on the i432 would use garbage collection rather than explicit
memory management.
The i432 encoded instructions to be packed efficiently by implementing a variable
length instruction encoding and making the most common instructions the shortest.
Insanely, the instructions were sized by bits rather than bytes. Instructions
ranged from 6 bits to 344 bits in length and instructions could begin on any bit boundary.
This would have made parallel instruction decoding a challenge had the i432
architecture survived long enough for this to be an issue.
A useful model to think about the i432 is that the i432 is the spiritual opposite
of the RISC CPUs that became common at the end of the 1980s.
The i432 combined high price (3 chips with up to 1 million transistors) and poor
performance. In his paper "A Performance Evaluation of the Intel 80286" (1982)
David Patterson writes this:
The bottom performance line as measured by these four small programs is
that the newest version of the 432 (8 MHz with 4 wait states) is almost as fast as
a 5 MHz 8086, while the 80286 leads the 432 by almost an order of magnitude.
Underlining by me.
Things began more optimistically. From the June 1981 issue of Byte Magazine (page 210)
as the i432 neared coming to market.:
Update On 32-Blt Microprocessors: The International
Solid-State Circuits Conference (ISSCC) met in New York last
February and heard presentations on two 32-bit microprocessors
and some disclosures on a third.
Intel released further details on its 32-bit iAPX432
processor. It is Intel's first departure from previous architecture
and instruction sets, so there is no software compatibility with its 8086
(16-bit) and 8085 (8-bit) microprocessors. Each of the iAPX432's three integrated
circuits has four lines of sixteen pins. There are two general processors and an I/O
(input/output) processor. The iAPX432 can link to 8086s and existing peripheral and
memory integrated circuits. Intel is boasting performance of up to 2 MIPS (million
instructions persecond).
It took five years to engineer the iAPX432, and the company estimates that $25
million was spent on the project. Intel expects to sell at least 10,000 sets in the first
year of production, which is projected for 1982. The initial price for the set will be
$1500. Intel started shipping evaluation sets in February and is offering a board-level
evaluation kit for $4250.
Intel claims that each of the three integrated circuits contains about 200,000
transistors. Two chips operate as a pipeline pair: the 43201 processor, which contains
the instruction decoder, and the 43202, which is the microexecution unit. The
43203 is the I/O processor. It provides an interface from the I/O subsystem to the
protected-access environment of the central system. Each I/O subsystem uses an
8- or 16-bit microprocessor to control I/O, independent of the central system. An
address space of more than 4 gigabytes (4×109 bytes) and
a virtual memory-address space of a terabyte (1012 bytes) is supported.
A protection scheme is provided to limit access to programs. The iAPX432 can
perform floating-point operations on 32-, 64-, and 80-bit numbers. Hardware failures
can be detected by interconnecting identical iAPX432 processors in a self-checking
arrangement.
The system uses compiled Ada code as its machine language. The language
interpreter is contained in a 64 K-byte microcode ROM (read-only memory).
Intel has also released an Ada cross-compiler for the iAPX432. The compiler runs
on a DEC (Digital Equipment Corporation) VAX-11/780 or an IBM 370. It costs $30,000.
A $50,000 hardware link is needed to download the compiled code to Intel's
$4250 development board.
With the iAPX432, Intel appears to have a two-year jump on its competition. At
the conference, Hewlett-Packard (HP) disclosed that it is in the early stages of
development on a 32-bit microprocessor. HP claims to have built and tested a
single chip with 450,000 transistors (which is about what Intel has in its set of
three integrated circuits).
It was not to be, however, and the combination of high cost and low performance
prevented the CPU from getting any market traction at all. The architecture came
to an end in the mid-1980s. From Byte Magazine, June 1985:
Intel has also stopped all manufacturing, marketing, and support activities
for its 432 microprocessor. The 432 was Intel's first 32-bit chip set. but
it was never used in any large volume computers. Intel is reportedly working
on two other 32-bit chip designs, including the Intel 80386, which will be
compatible with its 80286 and earlier designs. Intel will begin shipping
samples of the 80386 late this year.
NCR/32
The NCR/32 was unusual because it was intended to be used
to implement other, existing CPU architectures such as the
IBM System/370. This turned out to not be a large market.
From the NCR Microelectronics Short-Form Catalog-1985:
NCR/32 Processor FamilyFeatures
32-bit system architecture
13.3 Megahertz frequency
Effective emulation of mid-range mainframes
Externally microprogrammable
Real and virtual memory operation
Large direct memory addressing
Interface provided to slower peripherals
On-chip error check and correction
Functional Description
The NCR/32 VLSI Processor family combines the latest advances in
semi-conductor technology with experience gained in three generations of
computer mainframe design to provide a comprehensive microprogrammabie
32-bit system architecture. With external microprogram capability, an
extremely flexible microinstruction set, and a powerful set of internal
registers, the NCR/32 offers flexibility and high performance advantages not
available with other microprocessors.
Along with an existing set of VLSI
family support devices, the NCR/32 offers effective emulation of register,
stack and descriptor-based system architectures, as well as execution of
high-level languages directly from microcode. The NCR/32 is well suited
for applications requiring direct addressing of a large memory space,
high numeric precision, and very-high-speed execution such as bit-mapped
graphics, robotics, artificial intelligence, and relational databases.
An example of using this chip as the base for a specific actual architectural
implementation is Barry Fagin's research project that configured an NCR/32000
with custom microcode to implement a Prolog engine.
And from the January 1984 issue of Byte Magazine article "1984, the Year
of the 32-bit Microprocessor," by Richard Mateosian:
The NCR NCR/32. This microprocessor chip set is quite different from
all of the other microprocessors discussed in this article. It is designed
to be externally microprogrammed to emulate other computers, principally
medium-sized IBM mainframes like the System 370. The chip set consists of:
*) the NCR 32-000 CPC, the central processing unit. It contains 40,000
transistors and is fabricated in a 3-micron silicide NMOS process. It
runs with a 13.3-MHz clock, with internal machine cycles occupying two
clock cycles (150 nanoseconds). The 16-bit microinstructions, read from a
128K-byte external storage unit, select 95-bit words from an internal ROM
to control 179 operations, mostly register-to-register arithmetic and
logical operations on 4-bit, 8-bit, 16-bit, 32-bit, and field data types.
Microinstructions are executed in a three-stage pipeline (fetch, interpret,
execute). Eight 16-bit jump registers support a rich set of conditional
operations at the microcode level, and special set-up microinstructions
facilitate IBM System 370 emulation
*) the NCR 32-010 ATC, the memory management unit. In addition to address
translation and access protection, this chip provides memory-refresh control,
error-checking and correction (ECC) logic, a time-of-day register, an
interval timeout interrupt, and an interrupt on writes to one specified
virtual address. Sixteen translation registers support mapping of 32-bit
or 24-bit virtual addresses into 24-bit physical addresses, using page
sizes of 1K, 2K, or 4K bytes.
*) the NCR 32-020 EAC, the "booster" chip for arithmetic operations. It
supports IBM-compatible single- and double-precision binary and floating
point arithmetic, packed and unpacked decimal storage, and format
conversions. A single-precision floating-point addition takes approximately
1.6 microseconds.
*) the NCR 32-500 SIC, which interfaces the 24-megabyte/second prcessor
memory bus to slower peripherals and to other systems. The configuration
of an NCR/32 system is shown in figure 3. No benchmark data has been
published, but NCR estimates performance of the NCR/32 at approximately
four times that of a 10-MHz 68000.
If the 4× the performance of a 10 MHz 68000 estimate is correct then
the NCR/32 would have roughly the same performance as the contemporary 16 MHz 68020.
The market for the chip was small, however, and no successor chips were made.
Zilog Z80000
The Zilog Z80000 was intended to be a 32-bit extension of the 16-bit Z8000.
Implementation problems prevented the chip from shipping.
From the "Z80,000 CPU Preliminary Technical Manual" (1984):
1.1 INTRODUCTION
The Z80,000 CPU is an advanced 32-bit microprocessor that integratea the architecture of a
mainframe computer into a single chip. A subset of the Z80,000 architecture was originally
implemented in a 16-bit version, the Z8000 microprocessor. The Z80,000 bus structure permits the
use of Z8000 family peripherals, such as the Z8030 SCC and Z8036 CIO. While maintaining
compatibility with Z8000 family software and hardware, the Z80,000 CPU offers greater power
and flexibility in both its architecture and interface capability. Operating systems and
compilers are easily developed in the Z80,000 CPU's sophisticated environment, and the
hardware interface provides for connection in a wide variety of system configurations.
Memory management is integrated in the CPU, providing access to more than 4 billion bytes of
logical address space without external support components. The Z80,000 CPU also includes a cache
memory, which complements the pipelined design to achieve high performance with moderate memory
speeds.
This chapter presents an overview of the features of the Z80,000 CPU that offer extraordinary
flexibility to microprocessor system designers in tailoring the power of the CPU to their
specialized applications. The chapters that follow describe these features in detail.
1.2 ARCHITECTURE
The CPU features a general-purpose register file with sixteen 32-bit registers. The instruction
set offers a regular combination of nine general addressing modes with operations on numerous data
types, including bits, bit fields, bytes (8 bits), words (16 bits) , long words (32 bits) , and
variable-length strings. The memory management, exception handling, and system and normal mode
features support the development of reliable software systems.
1 .2.1 Registers
The Z80,000 CPU includes sixteen 32-bit general-purpose registers. The registers can be used as
data accumulators, index values, or memory pointers. Two of the registers, the Frame Pointer
and Stack Pointer, are used for procedure linkage with the Call, Enter, Exit, and Return instructions.
The Z80,000 registers also include the 32-bit Program Counter and 16-bit Flag and Control Word.
These two registers, together called the Program Status, are automatically saved during trap and
interrupt processing. Nine other special-purpose registers are used for memory management, system
configuration, and other CPU control.
1.2.2 Address Spaces
The CPU uses 32-bit logical addresses, permitting direct access to 4G bytes of memory. The logical
addresses are translated by the memory management mechanism to the physical addresses used to access
memory and peripherals.
The CPU supports three modes of address representation — compact, segmented, and linear —
selected by two control bits in the Flag and Control Word register. Applications with an address space
smaller than 64K bytes can take advantage of the dense code and efficient use of base registers
with the 16-bit compact addresses. Although programs executing in compact mode can only manipulate
16-bit addresses, the logical address is extended to 32 bits by concatenating the 16 most-significant
bits of the Program Counter register. Compact mode is equivalent to the Z8000 non-segmented mode.
Segmented mode supports two segment sizes — 64K bytes and 16M bytes. Up to 32,768 of the small
segments and 128 of the large segments are available. In segmented mode, address calculations do
not affect the segment number, only the offset within the segment. Allocating individual
objects such as program modules, stacks, or large data structures to separate segments allows
applications to benefit from the logical structure of a segmented memory space.
The 32-bit addresses in linear mode provide uniform and unstructured access to 4G bytes of
memory. Some applications benefit from the flexibility of linear addressing by allocating objects to
arbitrary positions in the address space.
And from Byte Magazine, June 1985:
Problems in debugging complex microprocessor chips have caused new
problems at Zilog and Intel. Zilog admitted that sampling of its
Z80000 32-bit processor. announced in the summer of 1983. has been
delayed until early 1986. Zilog had originally planned to start
shipping the Z80000 in late 1984
AT&T Hobbit
The AT&T Hobbit processor was created because Apple liked the
AT&T Research CRISP CPU and wanted a commercial version. The
performance of a 16 MHz CRISP CPU against a 5 MHz VAX 11/780 and an
8 MHz MIPS R2000 M/500 Development System, as reported in
"The Hardware Architecture of the CRISP Microprocessor" (1987) by
David R. Ditzel, Hubert R. McLellan and Alan D. Berenbaum was this:
Benchmark
VAX-780
R2000
CRISP
CRISP/VAX
CRISP/R2000
ackerman
20.9 sec
1.6 sec
1.1 sec
19.0
1.5
word count
55.0 sec
5.2 sec
4.2 sec
13.1
1.2
quicksort
36.2 sec
4.0 sec
3.4 sec
10.6
1.2
tty driver
17.4 sec
2.2 sec
1.2 sec
14.5
1.8
symbol table
14.6 sec
1.3 sec
1.2 sec
12.2
1.1
buffer release
9.9 sec
0.9 sec
0.8 sec
12.4
1.1
arithmetic
12.8 sec
2.7 sec
1.6 sec
8.0
1.7
This is not entirely fair to the R2000 as the M/500 was a development system that
was clocked slower than the production R2000 would be. It was, however, the
competitor's RISC CPU that AT&T could acquire to benchmark at the time.
The CRISP CPU was 172,163 transistors implemented on a 1.75µ CMOS process while
the R2000 was around 110,000 transistors on a 2.0 1.75µ process.
CRISP did not have explicit registers. Instead, all operations were performed against memory.
A stack cache ensured that access to values that would have been in registers
on a register machine were cached on-chip rather than requiring DRAM access. Another
cache for decoded instructions allowed many instructions to execute in one clock
cycle even though the instructions were logically accessing 'memory'.
Finally, CRISP provided instructions in only three lengths: 2 bytes, 6 bytes and 10 bytes:
The instruction encoding is designed with two primary considerations.
First, the instruction length must be easily determined. Therefore,
the length is encoded in the first two bits of each instruction. Since
all instructions are multiples of two bytes, this unit is referred to
as an instruction parcel. Second, static and dynamic code size should
be made as small as possible without interfering with performance issues.
Instructions that require two 32-bit addresses or operands, can use the five parcel
form shown in Figure 1. The three parcel form can be used to
provide a single 32-bit operand or two 16-bit operands. The single parcel
format has a 5-bit opcode field which defines the most frequent combinations
of operations and addressing modes occurring in the three and five parcel forms. This highly
encoded single parcel form typically accounts for 80 percent of all instructions.
Five Parcel
11
opcode(6)
smode(4)
Idmode(4)
src(32)
dst(32)
10
opcode(6)
smode(4)
Idmode(4)
src(32)
Three Parcel
10
opcode(6)
smode(4)
1111
src(16)
dst(16)
One Parcel
0
opcode(5)
src(5)
dst(5)
0
opcode(5)
src(10)
Hobbit was basically CRISP and was released in 1992. The product
for which Apple had intended to use the Hobbit CPU was also
released in 1992 — but it came out with an ARM CPU instead.
Apple had found the Hobbit CPUs to “rife with bugs, ill-suited for our purposes,
and overpriced,” according to Larry Tesler, then at Apple and
in charge of the Newton when it switched to ARM.
AT&T released a second version of the original Hobbit, but it
was mostly a tweaked version of the original chip. From page 105 of
January, 1994 issue of Byte Magazine:
The AT&T Hobbit Enters Its Second Generation
The AT&T Hobbit chip sets betray their corporate heritage. These are
chips designed first and foremost for telecommunications applications. AT&T
Microelectronics first offered a set of chips for PDAs (personal digital
assistants) in 1992. The 92K Hobbit family, the chips that are used in the
Eo Personal Communicator, has five parts: a CPU, a system controller, a bus
controller, a video-display controller, and a peripheral-bus controller.
The price seemed high at $99 for the chip set, but it was complete. Late last
year, AT&T introduced two new chip sets designed to broaden the line, with
trade-offs in performance, system size, cost, battery life, and feature sets.
The ATT92020S processor provides higher performance — it uses a
6-KB prefetch buffer as opposed to the 3-KB buffer on the 92010 —
and requires less power than the original 92010 CPU, It also works with
all the existing 92010 support chips except for the ISA controller. ISA
support doesn’t figure very highly in the new Hobbit offerings.
The Hobbit was competing against ARM chips in the PDA market and quickly lost.
Mitsubishi Gmicro/100
Several vendors produced various TRON CPU implementations. This was
similar to how SPARC chips were not just manufactured, but designed
by multiple companies to a common specification.
Mitsubishi was one of those vendors and the Mitsubishi family of TRON
chips also went by the M32 moniker. The Mitsubishi Gmicro/100 was
followed up with the Gmicro/400.
Eventually, Mitsubishi moved its development efforts to the M32R
family of CPUs.
Hitachi Gmicro/200
Similarly to the SPARC architecture, several vendors produced various TRON
CPU implementations of a common specification. In addition, some of the
vendors came together to work jointly on the Gmicro family of TRON CPUs.
Hitachi was one of those vendors and the Hitachi family of TRON
chips also went by the H32 moniker. The Hitachi Gmicro/200, released
in 1988, was the first TRON CPU. Hitachi followed it up with the Gmicro/500.
One problem that the TRON CPU project faced, especially in the desktop and
workstation markets, was competition from the x86 chips (e.g. 80486) and the
emerging inexpensive RISC chips (e.g. SPARC and MIPS).
In 1990 the competitive situation looked like this (if the TX3 met
the Q490 ship date):
CPU
MHz
Cache
TRON T3
33
8+8 KB
Intel 80486
33
8 KB
MIPS R3000
33
—
SPARC CY7C601
40
—
Toshiba's T3 appears quite competitive, but only competitive. Remaining
competitive would require on-going investment in development. And it is unclear
if the T3 ever shipped! In 1991 MIPS introduced the R4000 which pushed the clock
speed to 100 MHz and added 8+8 KB of on-chip cache. In 1991 Intel released a 80486
at 50 MHz. And in 1992 DEC produced the first Alpha chip at 100+ MHz. Without a commitment
to continue agressive development, the TRON chips would rapidly become uncompetitive.
And much of the TRON focus was on embedded applications anyway (note that several TRON
CPUs lack an MMU.
Momentum behind the TRON CPU faded fairly quickly and the major vendors
eventually supported (different) non-TRON RISC chips. Hitachi moved its development
efforts to the SH (or Super-H) family which was focused on embedded systems.
From "Microprocessors and Microsystems" Vol 13 No 8 October 1989:
TRON microprocessors will also be
in competition with new RISC chips,
for unlike RISC architectures the
TRON CPU has a compiler-oriented
CISC structure that allows it to compile
high-level program code efficiently.
TRON microprocessor manufacturers include
Toshiba, developing its TX series, and the GMICRO group of
Hitachi, Fujitsu and Mitsubishi. Oki
Electric, which is now participating in
development of the GMICRO series, is
also working on its own TRON CPU.
Matsushita similarly is producing its
own TRON chip (see Table 1 over).
The most ambitious design is
Toshiba's TX3, a 1.2M transistor, 33
MIPS (peak) microprocessor due late
in 1990, intended for high-end
workstations. Already available is the
TX1 embedded controller, which is
pin compatible with the TX3. The TXl
can also be used as an ASIC macro cell.
Toshiba is also developing three
peripheral chips for the TX series: a
50 MHz clock generator, an interrupt
controller/timer that can handle up
to eight interrupts, and a four channel
direct memory access controller
(DMAC) with 50 and 25 Mbyte s-1
block and single transfer modes.
The GMICROgroup plans to unveil a
900k transistor, 20 MIPS TRON CPU,
the GMICRO/300, in the current quarter.
Unlike the TX3, which will
basically have a floating-point unit
(FPU) built into it, the GMICRO/300
will use an external FPU. To allow for
efficient interaction with the FPU,
and easy development of software,
the GMICRO/300 will have 22
coprocessor instructions, in addition
to 11 decimal instructions. The other
peripheral chips in the GMICROseries
are a cache controller/memory, a four
channel DMAC, an interrupt request
controller that can handle up to
seven interrupts, a tag memory with a
27 µs access time, and a 40-48 MHz
clock pulse generator.
The GMICRO series began in 1987
with the appearance of the 730k transistor,
10 MIPS GMICRO/200, followed recently by the GMICRO/100
embedded controller with 330k transistors (operating speed 10 MIPS
(max.)). High-speed versions of all
three GMICRO microprocessors are
planned by equipping them with
33 MHz clocks.
Oki Electric has joined the GMICRO
group to market the series products
and to undertake the development
of GMICROevaluation tools that operate
in the BTRON environment.
Prior to joining GMICRO, Oki was
reported to be developing the 032
chip, but no launch date has been
announced. Unlike the other developers
who are planning different
chips for different applications, Oki
intends to use its CPU across a variety
of applications from embedded controllers
to communications terminals
and lower-performance PC devices.
Similarly, Matsushita anticipates wide
application of its MN10400, a 400 k
transistor, 20 MIPS chip due this quarter.
Fujitsu Gmicro/300
Several vendors produced various TRON CPU implementations. This was
similar to how SPARC chips were not just manufactured, but designed
by multiple companies to a common specification.
Fujitsu was one of those vendors and the Fujitsu family of TRON
chips also went by the F32 moniker. Eventually Fujitsu moved its
development efforts to SPARC, developing the SPARC64 line of high
performance SPARC CPUs.
Gmicro/400 Gmicro/500Toshiba TX1
Toshiba's first TRON CPU implementation. This was followed
by the TX2
Toshiba TX2Oki Electronics O32
Oki Electronics' TRON CPU implementation(s).
ARM Arm1
Arm1 was the first ARM CPU. It never shipped with a product. Only one chip design
was fabbed implementing Arm1 and only a few hundred chips actually produced.
The Arm1 did not include integer hardware divide or multiply (!). Comparing
this chip to earlier 16-bit chips such as the 8086 is tricky because
while the 8086 had a similar transitor count, the 8086:
Was constrained to a 40-pin package (for cost reasons) rather than an 82-pin package.
Was assembly source compatible with the 8080
Had hardware integer multiply
Comparing a clean-sheet design from 1985 with no hardware integer multiply,
twice the pin count and no hardware floating point option (as the 8087 was
for the 8086) with a chip that shipped in 1978 with that extra functionality
and those constraints is tough. The 8086 was designed without CAD tools.
The Arm1 was designed with VLSI Technology's custom design tools.
Introduced: 1985 (evaluation systems only; VC2588 (Autumn) chip)
In-order, pipelined
Integer Pipeline: 3 stages
User Visible Integer Registers: 25 (16 user; 9 supervisor)
On-chip Caches: No
FPU: No
6 MHz
24,800 transistors
Die Size: ~50 mm2 (7 mm × 7 mm) @ 3.0 µm process
Pins: 82
ARM Arm2
This was the first ARM chip that shipped in products. A (slighly) improved
revision of the Arm1, this chip added an integer hardware multiply instruction
as well as some instructions to support a generic co-processor interface. The
chips were produced in a smaller technology node than the Arm1 (2 µm
vs 3 µm) and clocked higher (8 - 12 MHz vs 6 MHz). The British
Acorn Archimedes
personal computer was designed around the Arm2 CPU.
Introduced: 1986
In-order, pipelined
Integer Pipeline: 3 stages
User Visible Integer Registers: 27 (18 user; 9 supervisor)
On-chip Caches: No
FPU: No
1986: 8 MHz (1 W) 1987: 10, 12 MHz (2 W)
27,000 transistors
Die Size: ~34 mm2 (5.8 mm × 5.8 mm) @ 2.0 µm process
Pins: 82
Apple A6 (Swift)
This was Apple's first custom designed ARM core.
Dual core, 1.3 GHz, 32KB+32KB L1, 2MB shared L2, 32 nm
Apple A7 (Cyclone)
This was Apple's first 64-bit ARM core. Apple was one of the first
ARM vendors to ship a 64-bit ARMv8 CPU.
Dual core, 1.3 - 1.4 GHz, 64 KB+64 KB L1, 1MB shared L2, 4 MB shared w/SoC L3 28 nm
Apple A8 (Typhoon)Apple A10 (Hurricane)Apple A11 (Monsoon)A12 (Vortex)Apple A13 (Lightning)Apple A14, M1 (Firestorm)Apple A15, M2 (Avalanche)Elbrus 2000Elbrus 2C+
TSMC 90nm
0.5 GHz
2-core
Elbrus 4C
TSMC 65nm
380 mm^2
0.5 GHz
38.4 GB/sec 3xDDR3
8 MB L2 cache
4-core
Elbrus 8C
TSMC 28nm
321 mm^2
1.3 GHz
8-core
512 KB L2/core
16 MB L3 shared
Elbrus 8SVPowerPC 750 (IBM and Motorola)
A 250MHz 5-W PowerPC microprocessor with on-chip L2 cache controllerExponential X704
In 1997 Exponential Technology released a BiCMOS CPU
seemingly designed to answer the question:
Can a high frequency CPU with a tiny cache avoid being
performance limited by DRAM bandwidth and latency?
The answer was, "Well, this CPU cannot."
Comparing the X704 to two contemporaneous CPUs we find this:
Exponential x704
PowerPC 750
Pentium-II
Attribute
Frequency
533 MHz
266 MHz
300 MHz
Technology
0.5 µm BiCMOS
0.25 µm CMOS
0.35 µm CMOS
Transistors
2.7M
6.35M
7.5M
Area
150 mm2
67 mm2
113 mm2
TDP
85 Watts
6-8 Watts
18 Watts
L1 I-Cache
2 KB (direct)
32 KB (8-way)
16 KB (4-way)
L1 D-Cache
2 KB (direct)
32 KB (8-way)
16 KB (4-way)
L2 Cache
32 KB
1 MB off-chip
½MB off-chip
L3 Cache
1 MB off-chip
Inst Issue
3-issue Super-Scalar
?-issue Out-of-Order
3-issue Out-of-Order
SpecInt95
~12
~12
12.2
Intel and IBM made slightly different trade-offs when allocating
transistors for on-chip caches and Out-or-order instruction depth (Intel's is
deeper though it doesn't show here) but get to roughly the same performance.
Intel has a much larger chip (113 mm2 vs 67 mm2)
and this is mostly due to Intel's larger design rule (0.35 µm vs
0.25 µm is 1.96× advantage for the smaller transistors) and
a bit due to more transistors (7.5M vs 6.35M). Some of the excess
transistors are no doubt due to the "x86 tax."
Exponential designed for frequency above all else and selected BiCMOS
to achieve this. The BiCMOS required 10× the power and allowed
less than ½ the transistors. The result was that the
x704 did not have the transistor (or power!) budget for large
set associative L1 caches nor the transistor budget for Out-of-order
execution. In addition, the instructions lost on memory access
stalls due to cache miss were 2× that of the PowerPC 750
(or Pentium-II) so the smaller and less associative caches hurt
Exponential more. The result was the 533 MHz Exponential x704
performed roughly the same as the 266 MHz PowerPC 750 but with 10×
the power requirement and over 2× the die size. With the larger
die size came a higher price.
The LA Times had a short writeup of Exponential's chip unveiling:
Exponential Unveils Fast Chip: A San Jose start-up company announced
the speediest microprocessor yet, a chip able to run Macintosh software
at up to 533 megahertz, more than twice as fast as current chips.
Exponential Technology Inc. said its X-704 chip should be available
in volume next spring. The company is one of several chip makers to
unveil new products at the Microprocessor Forum this week in San Jose.
The four-day conference marks the 25th anniversary of the microprocessor.
Exponential started in 1993, with financial help from Apple Computer
Inc. George Taylor, Exponential's founder and chief technology officer,
says the new chips will cost about $1,000 each, which would put them
into high-end computers used mostly by graphic designers and creators
of multimedia. Industry analysts said Exponential's chip could give
Apple's Macintosh computers a boost.
Apple released the G3 Macintoshes in 1997 with a starting price of $1,999
for a 233 MHz CPU.
A $1,000 chip was not going to be designed in at that price point. Exponential
was out of business by May of 1997.
A 533-MHz BiCMOS Superscalar RISC MicroprocessorFairchild Clipper C100
Clipper was not Fairchild's first CPU — that would be the 8-bit Fairchild F8
released in 1975. Fairchild does not seem to have produced a 16-bit CPU and in
the mid-1980s released the Clipper C100. The chip did not have commercial success,
competing with established chips such as the 80386 and 68030 as well as RISC chips
such as SPARC and MIPS.
The single large Clipper customer, Intergraph, purchased the Clipper division from
Fairchild after the C100 had shipped. A few more generations of Clipper were developed
before the architecture was abandoned.
Intergraph Clipper C300
Fairchild was purchased by National Semi in 1987 and the Clipper was
sold to Intergraph.
The National Semiconductor Corporation has agreed to sell the rights to
the Clipper microprocessor product line to the Intergraph Corporation.
The sale, for what industry officials said was about $10 million,
indicates that National is quickly moving to sell parts of the Fairchild
Semiconductor Corporation, which it agreed to acquire from Schlumberger
Ltd. last month for $122 million in stock.
National was expected to divest itself of Fairchild's microprocessor business
because it already had its own. Intergraph, a Huntsville, Ala., company that
makes systems for computer-aided design, uses the high-speed Clipper chip in
a work station. The company is so dependent on the chip that it had considered
buying a stake in Fairchild as part of a management buyout effort. Intergraph
is expected to offer jobs to about 100 Fairchild employees.
— New York Times, Sept. 18, 1987
Intergraph Clipper C400
The effective demise of the Intergraph Corp Clipper RISC chip is reported
by our sister publication ClieNT Server News. Intergraph turned its
California-based Advanced Processor Division over to Sun Microsystems
Inc on January 1 and the 70-employee unit along with general manager
Howard Sachs has become part of Sparc Technology Business unit which
is working on marrying the Windows NT operating system to the Sparc
chip. No further Clipper development is planned.
http://bitsavers.trailing-edge.com/components/fairchild/clipper/Design_and_Implementation_Trade-offs_in_the_Clipper_C400_Architecture.pdf
DEC MicroVAX 78032
The first VAX minicomputer, the VAX 11/780, was announced in October of 1977.
The 11/780 CPU (KA780) was built out TTL logic on a number (29?) of individual boards.
It was not, in any way, a microprocessor.
In 1985 DEC released the first VAX CPU implemented as a microprocessor. The 78032
was intended for the MicroVAX line of VAX computers and implemented only a subset
of the full VAX architecture.
Around the same time, the V-11 CPU shrank the entire VAX instruction set to a single
board (down from the many boards of the 11/780), but the V-11 required many chips
for a functioning CPU.
The 78032 and V-11 were followed up by the CVAX CPU which implemented the full VAX
instruction set, though the floating point required a separate FPU.
DEC CVAX 78034DEC RigelDEC NVAX