Designed to Wear Out / Designed to Last – Pt. 3

Here we are going to take a look at current technologies’ strengths and weaknesses in relation to the past.

Increasing Active Elements

For perspective, there has been a monumental increase in the number of active elements available in an integrated circuit compared to decades ago. This has allowed an increase in device complexity and speed.

Because of this complexity, there has been a necessary increase in the complexity of testing and test tools. Modern chips have many redundant operating sections and a JTAG ( test bus ). This allows all sections on a chip to be 100% functional tested. Sections found to be bad are mapped out. This has become necessary because, at current densities, not all of the elements on a newly manufactured IC are functional. Failed functional units are mapped out using methods such as read-only-memory elements or other means. Without this practice, the cost of a working chip would be too high. Above a certain threshold of failed sections, the chip is simply discarded. This practice started as densities rose in memory chips. There were always failed banks of memory elements, so building redundant elements and mapping them in or out made sense. Otherwise the chip yield would be too low. Chips with mapped out sections could be sold as lower density chips, assuring some revenue, especially at the time a product was new and yields were low.

Increasing Pin Count

An increase in complexity also means an increase in die size and the number of pins to a single IC package. When I started in this business, 40 pins was the largest number. The largest number these days is the 3647 FCLGA package. That indeed is 3,647 pins !

Decreasing Repairability

Inevitably, this pin count increase means that replacement of a single XSI (extreme scale integration) chip is nearly impossible, and certainly not feasible. As far as chip lifetime or reliability, this is unknown, other than the MTBF claimed in the datasheet. Electromigration is the likely cause of long term chip failure in these instances. The circuit board on which you find high count ICs is a throwaway item.

The Pause in Moores’ Law

No doubt, something will come along to allow Moores’ law to again provide speed and density improvements, but for the last 5 years (as of April, 2020), we have been stuck at a particular minimum feature size of a transistor on a modern processor chip. There is a good deal of work on making more efficient designs of logic elements, but currently nothing close to the multiplicative improvements of Moores’ law.

I mention Moores’ Law in the faint hope that cooler heads will prevail upon us to, at least, make complex ICs reusable/reprogrammable.

Note: A short list of possible improvements

  1. 3D Stacking of elements
  2. New materials
  3. Innovative logic configurations

Designed to Wear Out / Designed to Last – Pt. 2

Designed to Last

There are many examples of technologies which are designed to function for decades into multiple centuries.

What goes into an item that can make it last decades or even centuries ? Here at LCM, a lot of the original logic we are running in our collection, was not rated as long life when originally used in our machines. Expected lifetime data had the best of them lasting for a few decades, maximum.

This has generated a set of rules we abide by: If it looks like stone or amber, it can last a long time. If it looks like plastic (or is plastic), it is not going to last. One particular exception is epoxy packages. Certain epoxy resins are similar in structure to tree resins, which so far, hold the record ( millions of years ) for preserving ancient insects, pollen, and plants. )

  1. Semiconductors and Integrated Circuits – Life data in databooks for IC expected lifetimes from the era in which they were created, typically have these devices failing more than a decade ago. Yet here, they are still performing their function. Staying functional tends to favor ceramic packages (stone) and specific epoxy packages (amber)(See “Notes on Epoxy Packages for Semiconductors” below.
  2. How Much Heat – Systems designed to have a higher internal ambient operating temperature, tended to have a much higher failure rate. Datasheets at the time ( and today ) show a direct correlation between ambient operating temperature and operating life.
  3. Cooling Fans – These sit right in the middle of the life curve. Although the bearing has a low wear rate, 30 years seem to be the upper limit. ( It would be cool if someone could come up with a frictionless magnetic bearing.)
  4. A Special Note About Fans and Heat: A lot of our power supplies have an intimate relationship with fans, meaning, if the fan fails, the power supply will fail, as well. When we re-engineer a power supply to replace an older or failed unit, we specify that the new supply can keep operating even though the fan has failed. This is accomplished by the fact that the replacement power supply components have a higher efficiency ( thus generating less heat performing their function ) and can tolerate heat better than the old power supply components. ( This has the added advantage of lower air conditioning costs )

Winners In The Longevity Game

The absolute winners we have found and utilized in our systems are what are known as “bricks”. These are power supply modules which are fully integrated. They come in various sizes which determine their power ratings. Full bricks top out around a kilowatt. Half-Bricks are around 500 watts. Quarter-Bricks around 250 watts.

Lifetimes (MTBF) at full load and temperature for “bricks” go from around 40 years to around an astounding 500 years. ( That is not a typo. The part in question is a Murata UHE-5/5000-Q12-C. The whole UHE series has this rating. Price $61.90) These devices, as you may have already guessed, are epoxy encapsulated.

Designed-In Longevity

This refers to what we have encountered upon restoring the machines in our collection. There is a definite intent at work when one examines the component choices. For example, DECs PDP-10 KL series power supplies have filter capacitors at four times the necessary capacitance. The amount of capacitance declines with age pretty linearly till the end of lifetime (around 14 years). This means these particular components will still allow power supply function 4 times longer. That’s at least 3 times beyond the machine’s commercial life rating ( 5 -7 years). We got these machines 15 to 20 years after their last turn-on, and they ran for most of a year before we had cascading failures of the filter capacitors.

Notes on Epoxy Packages for Semiconductors

Epoxy packaging for semiconductors became popular in the mid-1960’s. It replaced ceramic, as it was less expensive. Epoxy, unfortunately, can be made with different resins and other ingredients that give it different material characteristics. There is a correlation between cost and moisture intrusion. Lower cost, more moisture. This gave epoxy a bad name as it was used to make ICs’ more competitive in the market. This led to a number of market loss moments for certain manufacturers, as the moisture intrusion occurred at a predictable rate depending on ambient humidity for a particular region of the country.

It was a multi-faceted problem. The moisture intrusion occurred where the IC lead connects with the package. Moisture intrusion into the epoxy and poor metal quality ( tin alloy ) of the lead frame causes corrosion of the lead, which allows moisture into the IC cavity and changes the bulk resistivity of the IC die. The electrical specs go off a cliff and the IC fails.

( Note: If the lead frame had been made from a different alloy or the epoxy was a higher grade, this failure had little or no chance of occurring. I find it hard to justify the cost differential given the ultimate cost to the end users and the manufacturer )

(I was a field engineer in the mid to late 1970’s and spent many an hour replacing ICs with this problem.)

The only epoxy packages that have made it to the present day used better materials and thus are still functional. There are thousands functioning ICs on circuit boards in the Living Computers collection heading toward their 50th anniversary and a spares inventory of thousands.

Mechanical Switches

Whether toggle, pushbutton, slide, micro, or rotary, mechanical switches are all over the lifetime map. In the commercial world, switches are rated at the maximum number of actuations at a specified current. For the most part, the switches I have encountered meet or exceed the actuation specification. There is a limitation, though.

Aging of Beryllium Copper

If the internal mechanical design use a beryllium copper flat or coil spring, it has almost surely failed by the time we at the museum have encountered this type of switch. Beryllium copper goes from supple and springy to brittle after 30 years or so.

The result, as you may have guessed, is a non-functioning machine due to switches that operate intermittently or not at all. ( We had a whole line of memory cabinets, with hundreds of bright shiny toggle switches for memory mapping, that wouldn’t function till we replaced the switches.

Circuit Breakers

These fall into the beryllium copper spring family, so they are pretty much failed, or in addition to not closing, their trip point has typically shifted so you get a premature trip or a trip well above the trip point ( sometimes no trip and the protected circuit burns up ).

Slide Switches

We’ve had one surprising winner in the switch longevity department, and that is slide switches. With few exceptions ( usually due to mechanical damage to the sliding element ), an intact slide switch can be quickly resurrected with cleaning and a little light oil.

Rotary Switches

The runner up in the longevity game is the rotary switch. Typically all a wonky rotary switch needs is a spray from an alcohol cleaner and contact lubricant, and it functions, no matter what the age. ( we have hardware going back to the 1920’s whose rotary switches are still functional ) You can consider the switch failed if the contact wafer is cracked or broken. My guess as to longevity involves the phenolic wafer getting significant mechanical strength by being riveted.

Relays

These devices are essentially an electrically actuated switch. Their most common configuration is some kind of leaf spring. If the spring is beryllium copper based, you of course, have a failed relay 30 years hence. Rotary relays tend to be fairly reliable, but require a little more maintenance to keep them going. Mercury wetted relays are fairly reliable ( we still have some running in a couple of pieces of hardware ), but are not recommended because of their mercury content along with the mercury being contained in a fragile glass envelope.

Designed to Wear Out / Designed to Last – Intro

This is an article series whose purpose is to shine a comprehensive light on one important aspect of technology that only gets passing mention: Our ability to determine how long a component or system can function based on the engineering decisions made in the interests of monetary consideration and/or reliability. This is especially germane today, as decisions about discarding a technology item at “end of life” now impinge on how much toxic waste we are loading the environment with. ( I have yet to find an “obsolete” cellphone or computer that didn’t work perfectly when discarded, for other than mechanical or liquid immersion damage. ) The definition of obsolete depends on who you ask. The new smartphone you buy today is only a few percent actual new technology compared to the old smartphone your are discarding.

So we seem to have uncovered the operating model that most companies producing and selling technology have adopted:

The model for tech from approximately the 1920’s to today involves having key components in your product which have a known MTBF ( mean time before failure ). Thus you can predict a known replacement rate for your product. Start a second product stream offering replacement parts for the ones you know are going to predictably fail.

Unless or until the majority of your users are technically savvy enough to realize this is a way to keep selling products in a saturated market, one is assured of continuing and predictable sales and profit

Vacuum Tubes

We start with old technology.

One could argue that vacuum tubes were inherently unreliable, so they made them as robust as possible and just accepted their limitations. This turns out to be a bit wide of the truth.

In a step-wise fashion, the manufacturing process evolution for the vacuum tube went something like this:

  1. The first vacuum tubes were handmade with limited production.
  2. As soon as production ramped up to feed demand, it was inevitable that the manufacturer would tweek the processes used to make a vacuum tube to minimize the amount of time it takes to assemble the finished product from raw materials, to ready to assemble, and assemble (using hands, jigs, and other production equipment).
  3. As soon as all of relevant costs have been driven out of the system of producing vacuum tubes, the manufacturer rightly looks to other means to make a profit. (Process tweaks and labor input reductions) Sooner or later the ecology of manufacturers who produce vacuum tubes reaches an equilibrium which, despite their best efforts, doesn’t allow any greater profitability.
  4. Product lifetime now looms large. Sell more product over time, make more profit.

( Note: this describes the manufacturing process evolution for most “tech” products. )

Ways To Sell More Vacuum Tubes

  1. I only learned about this particular lifetime determinant recently. It seems that in order to get the filament temperature high enough for electron emission, the tungsten had to be alloyed with thorium, which raises the melting point to that of a tungsten/thorium alloy (well above the temperature where the alloy emits electrons. The amount of alloying is directly proportional to the time a DC current-over-time applied in a plating tank. The tube life is directly proportional to the time it takes for the thorium to boil off. More thorium – More lifetime. As the final boil-off occurs, the filament temperature is rises above the melting point of tungsten which results in the filament overheating and burning out. Vacuum tubes manufactured for the US Military were larded up with thorium, and thus met the extended lifetime specifications the US military demanded.
  2. getter is a deposit of reactive material that is placed inside a vacuum system, for the purpose of completing and maintaining the vacuum. ( https://en.wikipedia.org/wiki/Getter ) The quality and amounts of various elements is adjustable by the manufacturer and will determine vacuum tube life. I addition, you can get a short lifetime tube just by eliminating the getter. (We have plenty of examples without getters)
  3. In order assure that parts were available as they wore out “normally”, the tube manufacturers put a tube tester (stocked with replacement tubes ) in every convenient location throughout a geographic area. (Drug and hardware stores typically)
  4. Make and sell more end products using the same (or slightly improved) technology. This is accomplished by the “Longer, Lower, Wider” paradigm used by the auto industry (and now applied to radios, televisions, etc)starting around the 1940’s. This involves making a greater variety of the same (or slightly modified) product in a different package year to year. Anyone who has read those old ads can see this methodology in action.

Some of the environmental results of the vacuum tube era were:

  1. Mounds of glass along with refined metals (tungsten, thorium, steel, and others) and minerals (mica) added to landfills. It is unclear if any of the glass was recycled.
  2. Some of the metals and minerals were leached into the ground by rain.
  3. Large amounts of scrap-metal ( chassis ) and wood ( cabinets ) ended up in landfill and were either recycled or rotted into the ground.

The XKL Toad-1 System

The XKL Toad-1 System (hereafter “the Toad”) is an extended clone of the DECSYSTEM-20, the third generation of the PDP-10 family of computers from Digital Equipment Corporation. What does that mean? To answer that requires us to step back into the intertwined history of DEC, BBN,1 SAIL,2 and other parts of Stanford University’s computing community and environs.

It’s a long story. Get comfortable. I think it will be worth your time.

The PDP-10

The PDP-10 family (which includes the earlier PDP-6) is a typical mainframe computer of the mid-1960s. Like many science oriented computers prior to IBM’s System/360 line, the PDP-10 architecture addressed binary words which were 36 bits long, rather than individual characters as was common in business oriented systems. In instructions, memory addresses took up half of the 36 bit word; 18 bits is enough to address 262,144 locations, or 256KW, a very large memory in the days when each bit of magnetic core cost about $1.00.3 Typical installations had 64KW or 96KW attached. The KA-10 CPU in the first generation PDP-104 could not handle any more memory than that.

Another important feature of the PDP-10 was timesharing, a facility by which multiple users of the computer each was given the illusion that each was alone in interacting with the system. The PDP-6 was in fact the first commercially available system to feature interactive timesharing as a standard facility rather than as an added cost item.

TENEX

In the late 1960s, virtual memory was an important topic of research: How to make use of the much larger, less expensive capacity of direct access media such as disks and drums to extend the address space of computers, instead of the very expensive option of adding more core.5 One company which was looking at the issue was Bolt, Beranek, & Newman, who were interested in demand paged virtual memory, that is, viewing memory as made up of chunks, or pages, accessed independently, and what an operating system which had access to such facilities would look like.

To facilitate this research, BBN created a pager which they attached to a DEC PDP-10, and began writing an operating system which they called TENEX for “PDP-10 Executive”6 TENEX was very different from Tops-10, the operating system provided by DEC, but was interactive in the same way as the older OS. The big difference was that more programs could run at the same time, because only the currently executing portions of each program needed to be present in the main (non-virtual) memory of the computer.

TENEX was popular operating system, especially in university settings, so many PDP-10s had the BBN pager attached. In fact, the BBN pager was also used on a PDP-10 system which ran neither TENEX nor Tops-10, to wit, the WAITS system at SAIL.7

The DECsystem-10

The second generation of the PDP-10 underwent a name change, to the DECsystem-10, as well as gaining a faster new processor, the KI-10. This changed the way memory was handled, by adding a pager which divided memory up into 512 word blocks (“pages”). Programs were still restricted to 18 bits of address like previous generations, but the CPU could now handle 22 bits of address in the pager, so the physical memory could be up to four megawords (4MW), which is 16 times as much as the KA-10.

This pager was not compatible with, and was much less capable than, the BBN device, although DEC provided a version of TENEX modified to work with the KI pager for customers willing to pay extra. Some customers considered this to be too little, too late.

SAIL and the Super FOONLY

In the late 1960s, computer operating systems were an object of study in the broader area of artificial intelligence research. This was true of the Stanford Artificial Intelligence Laboratory, for example, where the PDP-6 timesharing monitor8 had been heavily modified to make it more useful for AI researchers. When the PDP-10 came out three years later, SAIL acquired one, attached a BBN pager, and connected it to the PDP-6, modifying the monitor (now named Tops-10) to run on both CPUs, with the 10 handing jobs off to the 6 if they called for equipment attached to the latter. By 1972, the monitor had diverged so greatly from Tops-10 that it received a new name, WAITS.

But the hardware was old and slow, and a faster system was desired. The KI-10 processor was underpowered from the perspective of the SAIL researchers, so they began designing their own PDP-10 compatible system, the Super FOONLY.9 This design featured a BBN style pager and used very fast semiconductors10 in its circuitry. It also expanded the pager address to 22 bits, like the KI-10, so was capable of addressing up to 4MW of memory. Finally, unlike the DEC systems, this system was build around the use of a fast microcoded processor which implemented the PDP-10 architecture as firmware rather than as special purpose hardware.

DECSYSTEM-20 and TOPS-20

DEC was aware of the discontent with their new system among customers; to remedy the situation, they purchased the design of the SuperFOONLY from Stanford, and hired a graduate student from SAIL to install and maintain the SUDS drawing system at DEC’s facilities in Massachusetts. The decision was made to keep the KI-10 pager design in the hardware, and implement the BBN style pager in microcode.

Because of the demand for TENEX from a large part of their customer base, DEC also decided to port the BBN operating system to the new hardware based on the SAIL design. DEC added certain features to the new operating system which had been userland code libraries in TENEX, such as command processing, so that a single style of command handling was available to all programmers.

When DEC announced the new system as the DECSYSTEM-20, with its brand new operating system called TOPS-20, they fully expected customers who wanted to use the new hardware would flock to it, and would port all of their applications from Tops-10 to TOPS-20, even though the new OS did not support many older peripherals on which the existing applications relied. The customers rebelled, and DEC was forced to port Tops-10 to the new hardware, offering different microcode to support the older OS on the new KL-10 processor.

Code Name: Jupiter

DEC focused on expanding the capabilities of their flagship minicomputer line, the PDP-11 family, for the next few years, with a planned enhancement to take the line from 16 bit mini to 32 bit supermini. The end result was an entirely new family, the VAX, which offered virtual memory like the PDP-10 mainframes in a new lower cost package.

But DEC did not forget their mainframe customer base. They began designing a new PDP-10 system, intended to include enhanced peripherals, support more memory, and run much faster than the KL-10 in the current Dec-10/DEC-20 systems. As part of the design, codenamed “Jupiter”, the limited 18 bit address space of the older systems was upgraded to 30 bits, that is, a memory size of one gigaword (1GW = 1024MW), which was nearly 2.5 times the size of the equivalent VAX memory, and far larger than the memory sizes available in the IBM offerings of the period.

Based on the promise of the Jupiter systems, customers made do with the KL-10 systems which were available, often running multiple systems to make up for the lack of horsepower. Features were added to the KL, by changes to the microcode as well as by adding new hardware. The KL-10 was enhanced with the ability to address the new 30-bit address space, although the implementation was limited to addressing 23 bits (where the hardware only handled 22); thus, although a system maxed out at 4MW, virtual memory could make it look like 8MW.

DEC also created a minicomputer sized variant of the PDP-10 family, which they called the DECSYSTEM-2020. This was intended to extend the family into department sized entities, rather than the corporation sized mainframe members of the family.11 There was also some interest in creating a desktop variant; one young engineer was well known for pushing the idea of a “10 on a desk”, although his idea was never prototyped at DEC.

DEC canceled the Jupiter project, apparently destined to be named the DECSYSTEM-40, in May 1983, with an announcement to the Large Systems customers at the semiannual DECUS symposia. Customer outrage was so great that DEC agreed to continue hardware development on the KL-10 until 1988, and software development across the family until 1993.

Stanford University Network

In 1980, there were about a dozen sites at Stanford University which housed PDP-10 systems, mostly KL-10 systems running TOPS-20 but also places like SAIL, which had attached a KL-10 to the WAITS dual processor. Three of the TOPS-20 sites were the Computer Science Department (“CSD”), the Graduate School of Business (“GSB”), and the academic computing facility called LOTS.12

At this time, local-area networking was seen as a key element in the future of computing, and the director of LOTS (whom we’ll call “R”) wrote a white paper on the future of Ethernet13 on the campus. R also envisioned a student computer, what today we would call a workstation, which featured a megabyte of memory, a million pixels on the screen, a processor capable of executing a million instructions per second, and an Ethernet connection capable of transferring a millions bits of data per second, which he called the “4M machine”.

Networking also excited the director of the CSD computer facility, whom we’ll call “L”.14 L designed an Ethernet interface for the KL-10 processors in the DEC-20s which were ubiquitous at Stanford. This was dubbed the Massbus-Ethernet Interface Subsystem, or MEIS,15 pronounced “maze“.

The director of the GSB computer facility, whom we’ll call “S”, was likewise interested in networking, as well as being a brilliant programmer herself. (Of some importance to the story is the fact that she was eventually married to L.) S assigned one of the programmers working for her to add code to the TOPS-20 operating system to support the MEIS, using the PUP protocols created at PARC for the Alto personal computer.16

The various DEC-20 systems were scattered across the Stanford campus, each one freestanding in a computer room. R, L, and S ran miles of 50ohm coaxial cable, the medium of the original Ethernet, through the so-called steam tunnels under the campus, connecting all the new MEISes together. Now, it was possible to transfer files between DEC-20s from the command line rather than by writing them to a tape and carrying them from one site to another. It was also possible to log in from one DEC-20 to another–but using one mainframe to connect to another seemed wasteful of resources on the source system, so L came up with a solution.

R’s dream of a 4M machine had borne fruit: While still at CSD, he had a graduate student create the design for the Stanford University Network processor board. L repurposed the SUN-1 board17 at the processor in a terminal interface processor (“EtherTIP”), in imitation of the TIPs used by systems connected to the ARPANET and to commercial networks like Tymnet and Telenet. Now, instead of wiring terminals directly to a single mainframe, and using the mainframe to connect from one place to another, the terminals could be wired to an EtherTIP and could freely connect to any system on the Ethernet.

A feature of the PUP protocols invented at PARC was the concept of internetworking, connecting two or more Ethernets together to make a larger network. This is done by using a computer connected to both networks to forward data from each to the other. At PARC, a dedicated Alto acted as the router for this purpose; L designated some of the SUN-1 based system as routers rather than as EtherTIPs, and the Stanford network was complete.

Stanford University also supported a number of researchers who were given access to the ARPANET as part of their government sponsored research, so several of the PDP-10s on campus were connected to the ARPANET. When the ARPANET converted to using the TCP/IP protocols which had been developed for the purpose of bring internetworking to wide area networks, our threesome were ready, and assigned programmers from CSD, GSB, and LOTS to make L’s Ethernet routers speak TCP/IP as well as PUP. TOPS-20 was also updated to use TCP/IP, by Stanford programmers as well as by DEC.

S and L saw a business opportunity in all this, and began a small company to sell the MEIS and the associated routers and TIPs to companies and universities who wanted to add Ethernet to their facilities. They saw this as a way to finance the development of L’s long-cherished dream of a desktop PDP-10. They eventually left Stanford as the company grew, as it had tapped the exploding networking market at just the right time. The company grew so large in fact that the board of directors discarded the plan to build L’s system, and so the founders left Cisco Systems to pursue other opportunities.

XKL

L moved to Redmond in 1990, where he founded XKL Systems Corporation. This company had as its business plan to build the “10 on a desk”. The product was codenamed “TOAD”, which is what L had been calling his idea for a decade and a half because “Ten On A Desktop” is a mouthful. He hired a small team of engineers, including his old friend R from Stanford, to build a system which implemented the full 30-bit address space which DEC had abandoned with the cancelled Jupiter project, and which included modern peripherals and networking capabilities.18 R was assigned as Chief Architect; his job was to insure that the TOAD was fully compatible with the entire PDP-10 family, without necessarily replicating every bug in the earlier systems.

R also oversaw the port of TOPS-20 to the new hardware, although some boards19 had a pair of engineers assigned: One handled the detailed design and implementation of the board, while the other worked on the changes to the relevant portion of the operating system. R was responsible for the changes which related to the TOAD’s new bus architecture, as well as those relating to the much larger memory which the TOAD supported and the new CPU.20

The TOAD was supposed to come to market with a boring name, the “TD-1”, but ran into trademark issues. By that time, I was working at XKL, officially doing pre- and post-sales customer advocacy, but also working on the TOPS-20 port.21 Part of my customer advocacy duties was some low-key marketing; when we lost the proposed name, I pointed out that people had been hearing about L’s TOAD for years, and we should simply go with it; S, considered the unofficial “Arbiter of Taste” at XKL, agreed with me.22 We officially introduced the XKL Toad-1 System at a DECUS trade show in the spring of 1995.