Introducing Darkstar: A Xerox Star Emulator

Star History and Development

The Xerox 810 Information System (“Star”)

In 1981, Xerox released the Xerox 8010 Information System (codenamed “Dandelion” during development) and commonly referred to as the Star. The Star took what Xerox learned from the research and experimentation done with the Alto at Xerox PARC and attempted to build a commercial product from it.  It was envisioned as center point of the office of the future, combining high-resolution graphics with the now-familiar mouse, Ethernet networking for sharing and collaborating, and Xerox’s Laser Printer technology for faithful “WYSIWYG” document reproduction.  The Star’s operating system (called “Star” at the outset, though later renamed “Viewpoint”) introduced the Desktop Metaphor to the world.  In combination with the Star’s unique keyboard it provided a flexible, intuitive environment for creating and collaborating on documents and mail in a networked office environment.

The Star’s Keyboard

Xerox later sold the Star hardware as the “Xerox 1108 Scientific Information Processor” – In this form it competed with Lisp workstations from Symbolics, LMI, and Texas Instruments in the burgeoning AI workstation market and while it wasn’t quite as powerful as any of their offerings it was considerably more affordable – and sometimes much smaller.  (The Symbolics 3600 workstation, c. 1983 was the size of a refrigerator and cost over $100,000).

The Star never sold well – it was expensive ($16,500 for a single workstation and most offices would need far more than just one) and despite being flexible and powerful, it was also quite slow. Unlike the IBM PC, which also made its debut in 1981 and would eventually sell millions, Xerox ended up selling somewhere in the neighborhood of 25,000 systems, making the task of finding a working Star a challenge these days.

Given its history and relationship to the Alto, the Star seemed appropriate for my next emulation project. (You can find the Alto emulator, ContrAlto, here). As with the Alto a substantial amount of detailed hardware documentation had been preserved and archived, making it possible to learn about the machine’s inner workings… except in a few rather important places:


From the March 1982 edition of the Dandelion Hardware Manual.  Still waiting for these sections to be written…

Fortunately, Al Kossow at Bitsavers was able to provide extra documentation that filled in most of the holes.  Cross-referencing all of this with the available schematics, it looked like there was enough information to make the project possible.

The Dandelion Hardware

The Star’s Central Processor (CP). Note the ALU (4xAM2901, top middle) and 4KW microcode store (bottom)

Much like the Alto, the Dandelion’s Central Processor (referred to as the “CP”) is microcoded, and, again like the Alto, this microcode is responsible for controlling various peripherals, including the display, Ethernet, and hard drive.  The CP is also responsible for executing bytecode macroinstructions.  These macroinstructions are what the Star’s user programs and operating systems are actually compiled to.  The CP is sometimes referred to as the “Mesa” processor because it was designed to efficiently execute Mesa bytecodes, but it was in no way limited to implementing just the Mesa instruction set: The Interlisp-D and Smalltalk systems defined their own microcode for executing their own bytecodes, custom-tailored and optimized to their environments.

Mesa was a strongly-typed “high-level language.” (Xerox hackers loved their puns…) It originated on the Alto but quickly grew too large for it (a smaller, stripped-down Mesa called “Butte” (i.e. “a small Mesa”) existed for the Alto but was still fairly unwieldy.)  The Star’s primary operating system was written in Mesa, which allowed a set of very sophisticated tools to be developed in a relatively short period of time.

The Star architecture offloaded the control of lower-speed devices (the keyboard and mouse, serial ports, and the floppy drive) to an 8-bit Intel 8085-based I/O processor board, referred to as the IOP.  The IOP is responsible for booting the system: it runs basic diagnostics, loads microcode into the Central Processor and starts it running.  Once the CP is running, it takes over and works in tandem with the IOP.

Emulator Development

The Star’s I/O Processor (IOP). Intel 8085 is center-right.

Since the IOP brings the whole system up, it seemed that the IOP was the logical place to begin implementing the emulator.  I started with an emulation of the 8085 processor and hooked up the IOP ROMs and RAMs.  Since the first thing the IOP does at power up or reset is execute a vigorous set of self-tests, the IOP was, in effect, testing my work as I progressed which was extremely helpful.  This is one important lesson Xerox learned from the Alto and applied to the Star: on-board diagnostics are a good thing.  The Alto had no diagnostic facilities built in so if anything failed that prevented the system from running the only way to determine the fault was to get out the oscilloscope and the schematics and start probing.  On the Star, diagnostics and status are reported through a 4-digit LED display, the “Maintenance Panel” (or MP for short).  If the IOP finds a fault during testing, it presents a series of codes on this panel.  During a normal system boot, various codes are displayed to indicate progress.  The MP was the first I/O device I emulated on the IOP, for obvious reasons.

Development on the IOP progressed nicely for several weeks (and the codes reported in the emulated MP kept increasing, reflecting my progress in a quantitative way) and during this time I implemented a source-level debugger for the IOP’s 8085 code to help me along.  This was invaluable in working out what the IOP was trying to do and why it was failing to do so.  It allowed me to step through the original code, place breakpoints, and investigate the contents of the IOP’s registers and memory while the emulated system was running.

The IOP Debugger

Once the IOP self-tests were passing, the IOP emulation was running to the point where it attempted to actually boot the Central Processor!  This meant I had to shift gears and switch over to implementing an emulation of the CP and make it talk to the IOP. This is where the real fun began.

For the next couple of months I hunkered down and implemented a rough emulation of the CP, starting with system’s 16-bit ALU (implemented with four 4-bit AM2901 ALU chips chained together).  The 2901 (see top portion of the following diagram) forms the nexus of the processor; in addition to providing the processor’s 16 registers and basic arithmetic and logical operations, it is the primary data path between the “X bus” and “Y bus.”  The X Bus provides inputs to the ALU from various sources: I/O devices, main memory, a handful of special-purpose register files and the Mesa stack and bytecode buffer.  The ALU’s output connects to the Y bus, providing inputs back into these same components.

The Star Central Processor Data Paths

One of the major issues I was confronted with nearly immediately when writing the CP emulation was one of fidelity: how faithful to the hardware does this emulation need to be? This issue arose specifically because of two hardware details related to the ALU and its inputs:

  1. The AM2901 ALU has a set of flags that get raised based on the result of an ALU operation (for example, the “Carry” flag gets raised if the result of an operation causes a carry out from the most-significant bits). For arithmetic operations these flags make sense but the 2901 also sets these flags as the result of logical operations. The meaning of the flags in these cases is opaque and of no real use to programmers (what does it mean for a “carry” flag to be set as a result of a logical OR?) and exist only as a side-effect of the ALU’s internal logic. But they are documented in the spec sheet (see the picture below).
  2. With a 137ns clock cycle time, the CP pushes the underlying hardware to its limits. As a result, some combinations of input sources requested by a microinstruction will not produce valid results because the data simply cannot all make it to its destination on time. Some combinations will produce garbage in all bits, but some will be correct only in the lower nibble or byte of the result, with the upper bits being undefined. (This is due to the ALU in the CP being comprised of four 4-bit ALUs chained together.)
Logic equations for the “NOT R XOR S” ALU operation’s flags. What it means is an exercise left to the reader.

I spent a good deal of time pondering and experimenting. For #1, I decided to implement my ALU emulation with the assumption that Xerox’s microcode would not make use of the condition flags for non-arithmetic operations, as I could see no reason to make use of them for logical ops and implementing the equations for all of them would be computationally expensive, making the emulation slower. This ended up being a valid assumption for all logical ops except for OR — as it turns out, some microcode assumed that the Carry flag would be set appropriately for this class of operation. When this issue was found, I added the appropriate operations to my ALU implementation.

For #2 I assumed that if Xerox’s microcode made use of any “invalid” combinations of input sources, that it wouldn’t depend on the garbage portion of the results. (That is, if code made use of microinstructions that would only produce valid results in the lower 4 or 8 bits, the microcode would also only depend on the lower 4 or 8 bits generated.) Thus the emulated ALU always produces a complete, correct result across all 16-bits regardless of input source. This assumption appears to have held — I have encountered no real-world microcode that makes assumptions about undefined results thus far.

The above compromises were made for reasons of implementation simplicity and efficiency. The downside is that it is possible to write microcode that will behave differently on the emulation than on the real hardware. However, going through the time, trouble, and expense of a 100% accurate emulation did not seem worth it when no real microcode would ever require this level of accuracy. Emulation is full of trade-offs like this. It would be great to provide an emulation that is perfect in every respect, but sometimes compromises must be made.

I implemented a debugger and disassembler for the CP similar to the one I put together when emulating the IOP.  Emulation of the various X bus-related registers and devices followed, and slowly but surely the CP started passing boot diagnostics as I fixed bugs and implemented missing hardware.  Finally it reached the point where it moved from the diagnostic stage to executing the first Mesa bytecodes of the operating system – the Star was now executing real code!  At that time it seemed appropriate to implement the Star’s display controller so I could see what the Star was trying to tell me – and a few days and much debugging of the central processor later I was greeted with this display from the install floppy (and there was much rejoicing):

The emulated Star says “Hello” for the very first time

Following this I spent two weeks of late nights hacking — implementing the hard disk controller and fixing bugs.  The Star’s hard drive controller doesn’t use an off-the-shelf controller chip as this wasn’t an option at the time the Star was being developed in the late 1970s. It’s a very clever, minimal design with most of the heavy lifting being done in microcode rather than hardware. Thus the emulation has to work at a very low level, simulating (in a sense) the rotation of the platters and providing data from the disk as it moves under the heads, one word at a time (and at just the right time.)

During this period I also got to learn how Xerox’s hard disk formatting and diagnostic tools worked.  This involved some reverse engineering:  Xerox didn’t want end-users to be able to do destructive things with their hard disks so these tools were password protected.  If you needed your drive reformatted you called a Xerox service engineer and they came out to take care of it (for a minor service charge).  These days, these service engineers are in short supply for some reason.

Luckily, the passcodes are stored in plaintext on the floppy disk so they were easy to unearth.  For future reference, the password is “wizard” or “elf” (if you’re so inclined):

Having solved The Mystery of the Missing Passwords I was at last able to format a virtual hard disk and install Viewpoint, and after waiting nervously for the installation to finish I was rewarded with:

Viewpoint, at long last!

Everything looked good, until the hard disk immediately corrupted itself and the system crashed!  It was very encouraging to see a real operating system running (or nearly so), and over the following weeks I hammered out the remaining issues and started on a design for a real user interface for the emulator. 

I gave it a name: Darkstar.  It starts with a “D” (thus falling in line with the rest of the “D-Machines” produced by Xerox) contains “Star” in the name, and is also a nerdy reference to a cult-classic sci-fi film.  Perfect. 

Getting Darkstar

Darkstar is available for download on our Github site and is open source under the BSD 2-Clause license.  It runs on Windows and on Unix systems using the Mono runtime.  It is still very much a work in progress.  Feedback, bug reports, and contributions are always welcome.

Fun with the Star

You’ve downloaded and installed Darkstar and have perused the documentation – now what?  Darkstar doesn’t come with any Xerox software, but pre-built hard disk images are available on Bitsavers (and for the more adventurous among you, piles of floppy disk images are available if you want to install something yourself).  Grab http://bitsavers.org/bits/Xerox/8010/8010_hd_images.zip — this contains hard disk images for Viewpoint 2.0, XDE 5.0, and The Harmony release of Interlisp-D. 

You’ll probably want to start with Viewpoint; it’s the crowning achievement of the Star and it invented the desktop metaphor, with icons representing documents and folders. 

To boot Viewpoint successfully you will need to set the emulated Star’s time and date appropriately – Xerox imposed a very strict licensing scheme (referred to as Product Factoring) typically with licenses that expired monthly.  Without a valid license code, Viewpoint grants users a 6-day grace period, after which all programs are deactivated. 

Since this is an emulation, we can control everything about the system so we can tell the emulated Star that it’s always just a few hours after the installation took place, bypassing the grace period expiration and allowing you to play with Viewpoint for as long as you like.  Set the date to Nov. 10, 1990 and start the system running.

Now wait.  The system is running diagnostics.

Keep waiting.  Viewpoint is loading.

Go get a coffee.

Seriously, it takes a while for Viewpoint to start up.  Xerox didn’t intend for users to reboot their Stars very often, apparently.  Once everything is loaded a graphic of a keyboard will start bouncing around the screen:

The Bouncing Keyboard

Press any key or click the mouse to get started and you will be presented with the Viewpoint Logon Option Sheet:

The Logon Option Sheet

You can login with user name “user” with password “password”.  Hit the “Next” key (mapped to “Home” on your computer’s keyboard) to move between fields, or use the mouse to click in them.  Click on the “Start” button to log in and in a few moments, there you are:

Initial Viewpoint Desktop

The world is your oyster.  Some things work as you expect – click on things to select them, double-click to open them.  Some things work a little differently – you can’t drag icons around with the mouse as you might expect: if you want to move them, use the “Move” key (F6) on your keyboard; if you want to copy them, use the “Copy” key (F4).  These two keys apply to most objects in the system: files, folders, graphical objects, you name it.  The Star made excellent use of the mouse, but it was also very keyboard-centric and employed a keyboard designed to work efficiently with the operating system and tools.  Documentation for the system is available online – check out the PDFs at http://bitsavers.org/pdf/xerox/viewpoint/VP_2.0/, as they’re worth a read to familiarize yourself with the system. 

If you want to write a new document, you can open up the “Blank Document” icon, click the “Edit” button and start writing your magnum opus:

Plagiarism?

One can change text and paragraph properties – font type, size, weight and all other sorts of groovy things by selecting text with the mouse (use the left mouse to select the starting point, and the right to define the end) and pressing the “Prop’s” key (F8):

Mad Props

If you’re an artist or just want to pretend that you are one, open up the “Blank Canvas” icon on the desktop:

MSPaint.exe’s Great Grandpappy

Need to do some quick calculations?  Check out the Calculator accessory:

I’m the Operator with my Pocket Calculator
Help!

There are of course many more things that can be done with Viewpoint, far too many to cover here.  Check out the extensive documentation as linked previously, and also look at the online training and help available from within Viewpoint itself (check the “Help” tab in the upper-right corner.)

Viewpoint is only one of a handful of systems that can be run on Darkstar. Stay tuned for future installments, covering XDE and Interlisp-D!

Life with a 51 year old CDC 6500

The CDC 6500 has led a rough life over the last 6 months or so: way back on the afternoon of July 2, 2018, I got an email from the CDC’s Power Control PLC telling me that it had to turn off the computer because the cooling water was too hot! A technician came out and found that the chiller was low on refrigerant. He brought it back up to the proper level, and went away. Next morning it was down again.

After much gnashing of teeth and tearing of hair, it was determined that the compressor was bad in the chiller. “We’ll have a new one in 5 weeks!” The new one turned out to be bad too, and so another was ordered, that was easier to get. Only about 3 weeks, instead of the 8 that the official one took. That worked for a few weeks, and the CDC went down again because the water was too hot.

This time it was very puzzling because as long as the technician was here, it worked fine. He spent most of a morning watching it, decided it was OK, and left, but he didn’t make it to the freeway before it went down again. He came back, and watched for the rest of the afternoon, and found that the main condenser fan would overheat, and shut down, causing the backup fan to come on. The load wasn’t very high, so the backup fan had to cycle on and off, while the main fan motor cooled off. This would go on for a while, till both motors were off at the same time, and then the compressor would go over pressure because the condenser fans were off, and the chiller would stop cooling, resulting in the “Water too HOT” computer shutdown.

Another week went by waiting for replacement fan motors from the chiller manufacturer, with no luck. Eventually we gave up and got new fan motors locally, installed them and the chiller has been working since. While the CDC didn’t seem to mind being off for 102 days for the compressor problem, it didn’t like being off for 3 weeks while we fiddled with the fans.

Both when it was off for 102 days, and this time, we found that Bay 1 was low on refrigerant. The first time we just filled it up, but the second time we looked closer, and found that there is a small leak where the power wires go into the Bay 1 compressor. The compressor manufacturer, the same guys that made the chillers compressor, will gladly sell us a new compressor, but the parts for the 50 year old, R12 compressor, are no longer available. We are working on that, but I haven’t heard that we found the parts yet.

Back to more recent times: now that the chiller is chillin’, and the CDC’s cooling system is coolin’, why isn’t the computer computing?

Let’s run some diagnostics and see what happens: I try to run my CDC diagnostic tape, but the machine complains that Central Memory doesn’t seem to be available. No, I didn’t run the real tape drives, I ran the imaginary one that uses a disk file on a PC to pretend to be  a tape drive. Anyway, that didn’t work, so I flip the zillion or so Dead Start switches in my emulated Dead Start panel, to fire up my Central processor based memory test, and get no display at all! This is distinctly unusual. Let’s try my PP based Central Memory test: That seems to work till it finishes writing all of memory, then the display goes blank. Is there a pattern here?

I put a scope probe on the memory request line inside the memory controller in Chassis 3, and find that someone is requesting memory every possible cycle. There are four possible requests: the Peripheral Processors as a group get a request line, each Central Processor gets a request line, and the missing Extended Core Memory gets a request. Let’s find out who it is: the PP’s aren’t doing it, neither of the CP’s are doing it, and the non-existent ECM isn’t doing it. Huh? Nobody is wants it, but ALL of it is getting requested!

I am going to step back a little bit, and try to explain why it sometimes takes me a while to fix this beast. This machine was designed before there were any standards about logic diagrams. Every manufacturer had to come up with their own scheme for schematics. Here is one where I found a problem, but we will get to that in a bit.

Now when there are two squares with one above the other, and arrows from each going to the other, those are flip-flops. When you have a square, or a circle with multiple arrows going into it, that is a gate. Which one is an “or” gate, and which one is an “and” gate? Sorry, you have to figure out that for your self, because the CDC documentation says either one can be either one. The triangle with a number in it would be a test point on the edge of the module. The two overlapping circles, kind of like an elipsis, indicate that is a coax cable receiver, as opposed to a regular twisted pair signal. A “P” followed by a number indicates a pin of the module.

This module receives the PP read and write signals from the PP’s in chassis 1, on pins P19 and P24. On the right side of the diagram, you can see where all the pins connect. If we look at pin 24, we can see it connects to W07 wire 904, and pin 19 is connected to W07-903. The W “Jack”s are coax cables, the other letter signals go somewhere inside this chassis.

Really, what we are looking at here, is that a circle, or a square is the collector pull-up resistor of one or more silicon NPN transistors. the arrow heads are the base of the transistors, and the line coming into the head has a base resistor in it. If there are three arrows coming into a square, like at the bottom, those three 2n2369 transistors all have their collectors tied together, with one pull-up resistor. I could be slow, because it took about 6 months before I felt I was at all fluent in reading these logic diagrams.

Now we have to talk about the Central Memory Architecture a bit. The CDC has 32 banks of 16K words of memory. Each of these banks is separate, and they can be interleaved any way the 4 requestors ask for them. At the moment, I am only running half of them, because there is something wrong with the other half. Each of these banks does a complete cycle in 1uS. The memory controller in chassis 3 can put out an address every 100nS, along with whether it is for a read or a write. This request goes to all banks in parallel. If the bank that it points to is available, he will send back an “accept” pulse, and that requestor is free to ask for something else. If the controller doesn’t get an “accept” he will ask again in about 400nS. There is a bunch of modules involved in this dance, and it is a big circle.

A little more background: This machine was designed before there was such a thing as plated through holes on printed circuit boards. The two boards in each module were double sided. What they did when they needed to get to the other side of a PCB, was they would put a tiny brass rivet in the via hole, and solder both sides.

What I eventually found was that the signal from P23 of the module in 3L34, wasn’t making it to pin 15! There was a via rivet that wasn’t making its connection to the other side of the board. I re-soldered all the vias on that module, and now we were only requesting memory when someone wanted it!

Now that we can request memory and have a reasonable chance of it responding correctly, it is on to testing memory. I loaded up my CP based test, and it ran… for a while. Then it quit, with  a very strange error. The test uses a single bit, and its complement to check the existence of every location of memory. It will read a location, and compare it with what should be there, and put the difference in a second register. Normally I would expect a single bit error, or maybe 12 bits if a module failed that way. The result looked like 59 bad bits, or the error being exactly the same as what it read. Usually this is because the CPU that is running the test is miss-executing the compare instruction.

While I was thinking about that, I ran Exchange Jump Test to see what that said. A PP can cause a CP to swap all its registers, including the Program Counter with the contents of some memory that the PP points to. This is called an Exchange Jump. The whole process happens in about 2.6uS as it requests 16 banks of memory in a sequence. This works the memory pretty hard. Exchange Jump Test (EJT) would fail after a while, and as I looked at the results, I noticed that it was usually failing a certain bit in bank 7. I checked, and noticed it was an original memory module, so I looked at my bench and found I didn’t have any new ones assembled, so I had to put the sides on a couple of finished PCB assemblies, and test them. I then swapped out the old memory in bank 7 with a new semiconductor memory, and EJT passed!

I then checked to see if my CP based memory test worked, and it did too. We are back in Business after over 5 months. I am keeping my fingers crossed in the hope that the chiller stays alive for a while.

Bruce Sherry

Detecting Nothing

The IBM360/30 gets stuck in a microcode loop.  The documentation indicates that a branch should be taken if the Z-bus is zero, and the branch should be taken.  The branch is not being taken.

A previous annoyance was that the microcode would stop at address 0xB46.  As the documentation indicates for that location, it is checking that a register is zero.  Hmm… checking for zero?  That is the problem with the loop not stopping.  So I dug deeper into this stop.  There was a stuck bit!  And here is what I found:

The circuit is an AND-OR-Invert gate and the output of the AND was high.  The above circuit is the AND gate.  If any of the inputs on the bottom go low the output should go low.  The output was not going low.  However, there is nothing on this circuit to force it low, but rather it allows the output to go low.  So, the problem was the input to the OR gate:

Aha!  That transistor with an X on it is not good.  Fortunately, we have spares of this SLT module, and replacing it fixed the problem with the first microcode stop.  However, the microcode loop with the non-taken branch is still not working as documented.  Dig deeper…

New peripherals for old Computers

Five years ago, when we were getting done with restoring our PDP10-KI, we were running out of working disk drives to run it from. We were down to one set of replacement heads, two working drives, and we didn’t have a source for new ones. We found some folks that said they could rebuild the packs, but it turned out that they couldn’t re-write the servo surface, so if we lost that we were in trouble.

Alright, what else can be done? The Digital RP06, the drive of choice for the KI, has lots of registers available from the MASBUS. The MASBUS is kind of a UNIBUS, with a synchronous data channel for moving the actual data. We had been having difficulty keeping track of everything on an existing project, so I looked into doing things a different way.

My idea was to use an FPGA, (Field Programmable Gate Array) to emulate the behavior of the control unit inside the RP06. This is like the ease of writing software, but for hardware. No wires to change, no cuts, when I mess up the logic. The PC would be responsible for handling the actual data for the disk, or possibly tape.

I spent a while poking around the Internet looking for an FPGA card that would plug into a PC. There were a lot of expensive, and some less expensive evaluation boards out there. I eventually happened across http://mesanet.com/. I had heard of these folks before from my experiences with LinuxCNC, which runs my milling machine at home. These folks have been doing this for a long time, and since they have to deal with industrial environments and big motors, their products are very robust.

For the MASBUS Disk Emulator, (MDE), I chose the Mesa 5i22 card, which has 96 I/O’s I can play with, and a Spartan3 Xilinx FPGA. The 5i22 doesn’t remember what the Xilinx configuration is, so the PC pours in the correct bits each time.

Bob Armstrong, down in the Silicon Valley, wrote all the software for the PC, and we eventually emulated RP06’s, RP07’s which hold twice the data, and TU77 tape drives. Here is a picture of our main collection of MDE’s.

There are 3 Industrial PCs, and 8 MASBUS Cable Driver/Receiver boxes. These are running both PDP10-KL’s, and the PDP10-KS’s. There is a real TU78 tape drive, to the left of the MDE rack. The PDP10-KI, and the PDP11/70 each have their own located elsewhere in the museum.

I also used the 5i22 for all of the emulations that we needed for the CDC6500:

 

Here we have one Industrial PC, and 6 6000 series channel attach Driver/Receivers, along with one 3000 series channel Driver/Receiver. We emulate the dead start panel, the DD60 Display, Tape drives, disk drives, printers, card readers, card punches, the serial terminal interface and the 6681 channel converter so we could talk to the real 405 card reader. Bob Armstrong also wrote the PC code for all these emulations.

Jeff Kaylin has also used 5i22 cards on the Sigma 9, with Bob doing the software, and Craig Arno and Glen Hermannsfeldt used one to emulate the card reader and punch on the IBM 360/20.

All is not sweetness and light however, after making the 5i22 for over 10 years, the parts are getting hard to obtain, so Mesa Electronics has stopped production of that board. We ordered all their remaining stock of the 5i22s. They no longer make the 5i22, but they make lots of other similar boards, so we ordered some 7i61’s and 7i80’s to play with.

The 7i61 uses USB to talk to a PC, and has 96 I/O’s to play with. The 7i80 uses Ethernet to talk to the PC, but only has 72 I/O’s. To conserve 5i22’s, I converted my CDC 6681 6000 to 3000 channel converter to the 7i61 because it needs all 4 cables for 96 I/Os. I used my code, along with open source code from Peter Wallace, of Mesa Electronics, to load my code into the serial flash chip on the 7i61, so the PC is no longer involved in the 6681 emulation. After turning on the power to the Mesa card, it knows how to be a 6681 automagicly! I no longer have to remember to type the proper incantation into the PC to get it loaded up, this is a GOOD thing!

After getting the CDC 6500 working, I had several broken modules that I wanted to fix, so I built a Module tester, using the 7i80 card:

You can see the 7i80-HD card under the cables down from the test card.

It took me a while to collect the appropriate bits of Peter Wallace’s code, so that I could have the Ethernet interface live in with my test code, all at the same time, but perseverance pays off, and it all works now. I have fixed 4 out of 5 broken UA modules, and I know why I am not going to fix the other UA, and the ED module that I replaced in CP1.

I have to confess: the module plugged into the tester, is not really from the 6500, but it is the same form factor, and technology.

I love these Mesa Electronics “Anything I/O” cards! As their name says, I can teach them to be most Anything!

Bruce Sherry 20180418.

That Pesky PS Module!

When we last left our hero, he had re-soldered all the Via rivets on one of the 510 “PS”, core memory sense amplifier modules in the CDC6500, and the machine was working.

That lasted about a day, and the memory went away again. What was wrong this time? You guessed it, bit 56 in bank 36 was bad again. Third time is the charm: I am going to replace this module! I head off looking for a spare PS module. Where did we put all those spare parts we got with the machine? Oh wait, we didn’t get any spares with the machine. Bummer!

This is where I get to practice my “MAD Skillz”, and make some spare PS modules. What does a PS module look like? What does CDC give me?

On the left side we have an actual schematic of one of the 4 amplifiers on the module, YAY! Having been around this block before, I take apart the offending module and check to see if it matches. Wiring wise, yes it matches, but the values have been changed to protect the innocent.

After a while playing with the newest version of Eagle, I ended up with this:

The easy part is done, now the fun starts! The circuitry for the module is split between two printed circuit boards, one that connects to the odd pins on the connector, and has odd numbered transistors, and one for the even bits. Eagle really doesn’t understand this, so I have to fool it. First I put test points on each side of all components that go between the boards. I have to then add in the wire jumpers that also go between the boards, and I end up with another schematic:

I take this schematic and duplicate it into odd and even sides, then I write an Eagle script to delete all the between boards components, and all the test points that belong on the other board. Here is what the schematic for the odd board looks like, pretty terrible:

Now I have two schematics: PS_O and PS_E, and I do the PC layout thing. I have the original boards to use as an example, which I follow very closely so that timing and signal integrity will be as close as I can to the original module. Here are pictures of the odd and even layouts:

But wait, I’m not done yet! Remember Eagle doesn’t understand the whole module. I now have to verify that the two boards, together with all the components that go between, match that original schematic I started with.

I go over every line on the board pairs, and the schematic highlighting them as I go, until EVERYTHING is highlighted.

Done yet? Grumble, grumble: No! I have forgotten to identify which end of the diodes and polarized capacitors have the band on them! If I want them assembled properly, I guess I should do that before they go out to FAB!

Back to the PC mines…

Bruce Sherry 20180201 10:57AM

Chasing the Pesky Ratio!

It seems like I did something really silly! I had to come up with some goals for 2018. I hate this time of year, I think everybody does. OK, what can I put down that is measurable and achievable? How about keeping the CDC6500 running more than 50% of the time? That might work. Oops, did I hit the send button?

“Hey, Daiyu: How do I tell what users have been on the machine?” Daiyu Hurst is my systems programmer, who lives Back East somewhere. If it is on the other side of Montana from Seattle, it is just “Back East” to me. She lives in one of those “I” states, Indiana, or Illinois, not Idaho, I know where that is. After a short pause, she found the appropriate incantations for me to utter, and we have a list of who was on the machine, and when they logged in. I had to use Perl to filter out those lines, but that was pretty easy.

What is all this other gobble-de-gook in this file:

 03.14.06.AAAI005T. UCCO, 4.096KCHS. 
 03.16.09.AAAI005T. UCCO, 4.096KCHS. 
 03.18.11.AAAI005T. UCCO, 4.096KCHS. 
 03.20.12.AAAI005T. UCCO, 4.096KCHSUCCO, 4.096KCHS. 
 05.00.30.SYSTEM S. ARSY, 1, 98/01/24.
 05.00.30.SYSTEM S. ADPM, 11, NSD.
 05.00.30.SYSTEM S. ADDR, 01, LCM, 40.
 05.00.44.SYSTEM S. SDCI, 46.603SECS.:
 05.00.44.SYSTEM S. SDCA, 0.032SECS.:
 07.32.30.SYSTEM S. ARSY, 1, 98/01/24.
 07.32.30.SYSTEM S. ADPM, 11, NSD.
 07.32.30.SYSTEM S. ADDR, 01, LCM, 40.
 07.33.07.AAAI005T. ABUN, BRUCE, LCM.:
 07.33.37.SYSTEM S. SDCI, 116.108SECS.:
 07.33.37.SYSTEM S. SDCA, 0.078SECS.:
 07.33.37.SYSTEM S. SDCM, 0.005KUNS.:
 07.33.37.SYSTEM S. SDMR, 0.004KUNS.:

The line with “ARSY” in it is when I booted the machine, at 5:00 this morning, from home. It crashed before I got in, and I booted it again at 7:32. Then we get to 7:33:07, and the “ABUN” line, where I login from telnet.

From the first few lines we can see that the machine appeared to still be running and putting things in its accounting log at 3:20, but it crashed before it could print a message about 3:22.

OK from this, I can mutter a few incantations at PERL, and come up with something like:

1054 Booted on 98/01/23 @ 07.39.30
 Previous uptime: 0 days 5 hours 59 minutes
 Down time: 0 days 17 hours 28 minutes
 1065 Booted on 98/01/23 @ 13.38.30
 Previous uptime: 0 days 1 hours 23 minutes
 Down time: 0 days 4 hours 35 minutes
 1068 Booted on 98/01/23 @ 14.12.30
 Previous uptime: 0 days 0 hours 0 minutes
 Down time: 0 days 0 hours 33 minutes
 1392 New Date:98/01/24
 1498 Booted on 98/01/24 @ 05.00.30
 Previous uptime: 0 days 13 hours 7 minutes
 Down time: 0 days 1 hours 40 minutes
 1503 Booted on 98/01/24 @ 07.32.30
 Previous uptime: 0 days 0 hours 0 minutes
 Down time: 0 days 2 hours 31 minutes

Last uptime: 0 days 0 hours 1 minutes

Total uptime: 2 days, 1 hours 37 minutes in: 0 months 7 days 0 hours 14 minutes
Booted 15 times, upratio = 0.29

Here is where the hunt for the Pesky Ratio comes in: See that last line? In the last week, the CDC has been running 29% of the time. That isn’t even close to 50%. I KNOW the 6000 series were not the most reliable machines of their time, but really: 29%?

What has been going on? A week ago, I was having trouble keeping the machine going for more than a couple of minutes. Finally, it occurred to me I might see how the memory was doing, and it wasn’t doing well. It took me a while to find why bit 56 in bank 36 was bad. I had to explore the complete wrong end of the word for a while, before I realized that end worked, and I should have been looking at the other end. I chased it down to Sense Amplifier (PS) module 12M40. When I put it on the extender, the signal would come and go, as I probed different places. I noticed that I had re-soldered a couple of via rivets before, so I re-soldered ALL the via rivets on the module.

What do I mean “via rivets”? In those days, either one of two things were true: either they didn’t have plated through holes in printed circuit boards, or they were too expensive. None of the CDC 6500 modules I have looked at have plated through holes. Most of the modules do have traces on both sides of the two PCBs that the module is made with. How did they get a signal from one side to the other? They put in a tiny brass rivet! Near as I can tell, all the soldering was done from the outside of the module, and most of the time the solder would flow to the top of the rivet somehow. Since I have found many of these rivets not conducting, I have to assume that the process wasn’t perfect.

After soldering all the rivets on this module, I put it back in the machine, and we were off and running. Monday, I booted the machine at 8:11, and it ran till 2:11. When I got in yesterday, the machine wouldn’t boot. Testing memory again found bit 56 in bank 36 bad again! I put module 12M40 on the extender, and the signal wasn’t there. I poked a spot with the scope, and it was there. I poked, prodded, squeezed, twisted and tweaked, and I couldn’t get it to fail.

This is three times for This Module! I like to keep the old modules if I can, but my Pesky Ratio is suffering here! I took the machine back down, and brought it back up with only 64K of memory, and pulled out the offending module:

There are 510 of these PS modules in the machine, three for each of the 170 storage modules, or about 10% of all the modules in the machine. Having a spare would be nice. My next task will be to make about 10 new PS modules.

In the time I have been writing this post, the display on the CDC has gone wonky again. This appears to happen when the Perpheral Processors (PP’s) forget how to skip on zero for a while. Once this happens, I can’t talk coherently to channel 10 or PP11. I have a few little tests that copy themselves to all the PP’s, and they will all work, except the last one: PP11.

I have yet to write a diagnostic that can catch the PP’s making the mistake that I can see on the logic analyzer once a day or so. Right now the solution seems to be to wait a while, and the problem will go away again. This is another reason while the Pesky Ratio is so difficult to hunt, but I fix what I can, when I can.

Onward: One bug at a time!

 

 

Surface-Mount Prototyping

I thought I’d glue a surface-mount 7414 onto a resistor pack, just to make it easier to prototype. Then I thought, hmm… a resistor pack would be useful to pull down signals when the driving logic hasn’t be initialized.
So, here is the picture:

I soldered wires onto the resistor pack first, then submerged them in water so they wouldn’t come undone when I soldered the other end.
Works like a charm (0.1″ prototyping spacing).

Fixing 40-year-old Software Bugs, Part One

The museum had a big event a few weeks ago, celebrating the 45th anniversary of the 1st “Intergalactic Spacewar Olympics.”  Just a couple of weeks before said event, the museum acquired a beautiful Digital Equipment Corporation Lab-8/e minicomputer and I thought it would be an interesting challenge to get the system restored and running Spacewar in time for the event.

As is fairly obvious to you DEC-heads out there, the Lab-8/e was a PDP-8/e minicomputer in a snazzy green outfit.  It came equipped with scads of analog hardware for capturing and replaying laboratory data, and a small Tektronix scope for displaying information.  What makes this machine perfect for the PDP-8 version of Spacewar is the inclusion of the VC8E Point Plotting controller and the KE8E Extended Arithmetic Element (or EAE).  The VC8E is used by Spacewar to draw the game’s graphics on a display; the EAE is used to make the various rotations and translations done by the game’s code fast enough to be fun.

The restoration was an incredibly painless process.  I started with the power supply which worked wonderfully after replacing 40+ year old capacitors, and from there it was a matter of testing and debugging the CPU and analog hardware.  There were a few minor faults but in a few days everything was looking good, so I moved on to getting Spacewar running.

But which version to choose?  There are a number of Spacewar variants for the PDP-8, but I decided upon this version, helpfully archived on David Gesswein’s lovely PDP-8 site.  It has the advantage of being fairly advanced with lots of interesting options, and the source code is adaptable for a variety of different configurations — it’ll run on everything from a PDP-12 with a VR12 to a PDP-8/e with a VC8E.

I was able to assemble the source file into a binary tape image suited for our Lab-8/e’s hardware using the Palbart assembler.  The Lab-8/e has a VC8E display and the DK8-EP programmable clock installed.  (The clock is used to keep the game running at a constant frame-rate, without it the game speed would vary depending on how much stuff was onscreen and how much work the CPU has to do.)  These are selected by defining VC8E=1 and DKEP=1 in the source file

Loading and running the program yielded an empty display, though the CPU was running *something*.  This was disappointing, but did I really think it’d be that easy?  After some futzing about I noticed that if I hit a key on the Lab-8/e’s terminal, the Tektronix screen would light up briefly for a single frame of the game, and then go dark again.  Very puzzling.

My immediate suspicion was that the DK8-EP programmable clock wasn’t interrupting the CPU. The DK8-EP’s clock can be set to interrupt after a specified interval has elapsed, and Spacewar uses this functionality to keep the game running at a steady speed — every time the clock interrupts, the screen is redrawn and the game’s state is updated.  (Technically, due to the way interrupts are handled by the Spacewar code, an interrupt from any device will cause the screen to be redrawn — which is why input from the terminal was causing the screen flash.)

I dug out the DK8-EP diagnostics and loaded them onto the Lab-8/e.  The DK8-EP passed with flying colors, but Spacewar was still a no go.  I decided to take a closer look at the Spacewar code, specifically the code that sets up the DK8-EP.  That code looks like this (with PDP-12 conditional code elided):

/SUBROUTINE TO START UP CLOCK
/MAY BE HARDWARE DEPENDENT
/THIS IS FOR KW12A CLOCK - PDP12
/OR PROGRAMABLE PDP8E CLOCK DK8EP
   CLSK=6131      /SKIP IF CLOCK
   CLLR=6132      /LOAD CONTROL
   CLAB=6133      /AC TO BUFFER PRESET
   CLEN=6134      /LOAD ENABLE
   CLSA=6135      /BIT RESET FLAGS

STCLK, 0
   CLA CLL        /JUST IN CASE   
   TAD (-40       /ABOUT 30CPS
   CLAB           /LOAD PRSET
   CLA CLL
   TAD (5300      /INTR ON CLOCK - 1KC
   CLLR
   CLA CLL
   JMP I STCLK

The bit relevant to our issue is in bold above; the CLLR IOT instruction is used to load the DK8-EP’s clock control register with the contents of the 8’s Accumulator register (in this case, loaded with the value 5300 octal by the previous instruction).  The comments suggest that this sets a 1 Khz clock rate, with an interrupt every time the clock overflows.

I dug out the a copy of the programming manual for the DK8-EP from the 1972 edition of the “PDP-8 Small Computer Handbook” (which you can find here if you’re so  inclined).  Pages 7-28 and 7-29 reveal the following information:

DK8-EP Nitty Gritty

 

The instruction we’re interested in is the CLDE (octal 6132) instruction: (the Spacewar code defines this as CLLR) “Set Clock Enable Register Per AC.”  The value set in the AC by the Spacewar code (from the octal value 5300) decodes as:

  • Bit 0 set: Enables clock overflow to cause an interrupt.
  • Bits 1&2 set to 01: Counter runs at selected rate.
  • Bits 3,4&5 set to 001: 1Khz clock rate.

(Keep in mind that the PDP-8, like many minicomputers from the era, numbers its bits in the opposite order of today’s convention, so the MSB is bit 0, and the LSB is bit 11.)  So the comments in the code appear to be correct: the code sets up the clock to interrupt, and it should be enabled and running at a 1Khz rate.  Why wasn’t it interrupting?  I wrote a simple test program to verify the behavior outside of Spacewar, just in case it was doing something unexpected that was affecting the clock.  It behaved identically.  At this point I was beyond confused.

But wait: The diagnostic was passing — what was it doing to make interrupts happen?

DK8E-EP Diagnostic Listing

The above is a snippet of code from the DK8E family diagnostic listing, used to test whether a clock overflow causes an interrupt as expected.  The JMS I XIOTF instruction at location 2431 jumps to a subroutine that executes a CLOE IOT to set the Clock Enable Register with the contents in AC calculated in the preceding instruction.  (Wait, CLOE?  I thought the mnemonic was supposed to be CLDE?)  The three TAD instructions at locations 2426-2430 define the Clock Enable Register bits.  The total sum is 4610 octal, which means (again referring to the 1972 Small Computer Handbook):

  • Bit 0 set: Enables clock overflow to cause an interrupt
  • Bits 1+2 unset: Counter runs at selected rate, and overflows every 4096 counts.
  • Bit 3, 4+5 set to 110: 1Mhz clock rate
  • Bit 8 set:  Events in Channels 1, 2, or 3 cause an interrupt request and overflow.

So this seems pretty similar to what the Spacewar code does (at a different clock rate) with one major difference:  Bit 8 is set.  Based on the description in the Small Computer Handbook having bit 8 set doesn’t make a lot of sense — this test isn’t testing channels 1, 2, or 3 and this code doesn’t configure these channels either.  Also, the CLOE vs CLDE mnemonic difference is odd.

All the same, the bit is set and the diagnostic does pass.  What happens if I set that Clock Enable Register bit in the Spacewar code?  Changing the TAD (5300 instruction to TAD (5310 is a simple enough matter (why, I don’t even need to reassemble it, I can just toggle the new bits in via the front panel!) and lo and behold… it works.

But why doesn’t the code make any sense?  I thought perhaps there might have been a different revision of the hardware or a different set of documentation so I took a look around and finally found the following at the end of the DK8-EP engineering drawings:

The Real Instructions

Oh hey look at that why don’t you.  Bit 8’s description is a bit more elaborate here: “Enabled events in channels 1, 2, or 3 or an enabled overflow (bit 0) cause an interrupt request when bit 0 is set to a one.” And per this manual, setting bit 0 doesn’t enable interrupts at all! To add insult to injury, on the very next page we have this:

More Real Info

That’s definitely CLOE, not CLDE.  The engineering drawings date from January 1972 (first revision in 1971), while the 1972 edition of the PDP-8 Small Computer Handbook has a copyright of 1971, so they’re from approximately the same time period.  I suspect that the programming information given in the Small Computer Handbook was simply poorly transcribed from the engineering documentation…  and then Spacewar was written using it as a reference.  There is a good chance given that this version of Spacewar supports a multitude of different hardware (including four different kinds of programmable clocks) that it was never actually tested with a DK8-EP.  Or perhaps there actually was a hardware change removing the requirement for bit 8 being set, though I can find no evidence of one.

So with that bug fixed, all’s well and our hero can ride off into the sunset in the general direction of the 2017 Intergalactic Spacewar Olympics, playing Spacewar all the way.  Right?  Not so fast, we’re not out of the woods yet.  Stay tuned for PART TWO!

The Hunt!

The CDC 6500 has been down since last Friday, so that will be a week in 3 hours. What have I been doing during that time? Let me tell you:

The first thing I noticed was that my PP memory test, called March, wasn’t working. The first real thing it does, after getting loaded into PP0, is copy itself to the next PP in line. In order to do that, it increments 3 instructions to point to the next channel from the one that got loaded in from the deadstart system. After it has self-modified its program properly, it runs those instructions to do the actual copy. The very first OAN instruction it tried to execute hung, this is not supposed to happen.

I spent 3 days looking at this problem before I started drawing timing diagrams of the channel address being selected by the various PPs. The PPs each have their own memory, but they all share the same execution hardware in chassis 1. This makes it a little hard to look at, as a PP is running 1uS cycles, the hardware is running 100nS cycles, and each PP gets a 100nS Slot to do his thing. As I was looking at PP0s slot time, and what channel he was trying to push some data to, it looked like it was getting done at the wrong time. When I plotted out 1uS of all the channel address bits, I finally noticed that PP0 was addressing channel 0, PP1 was addressing channel 1… and PP11 was addressing channel 11, and back to PP0.

The strange thing about that was that that is the way the system starts at deadstart time. Every PP sucks on the channel with his number. The deadstart panel lives off of the end of channel 0, PP0 sucks up everything the deadstart panel put on channel 0, stores it into memory, and when the panel disconnects, because he has run out of program to send, the PP starts executing the program.

Wait a minute here: the program was supposed to have incremented the 3 channel instructions, so they would be pointing to channel 1, why is PP0 still looking at channel 0? Rats: the channel hardware is doing fine, but the increment isn’t working! 3 days to prove something wasn’t the problem!

OK, so the increment isn’t working, what is it doing? I spent a while writing little bits of code to test various ways of incrementing a location of memory, and then Daiyu Hurst reminded me about a program she had generated for me that was a stand-alone version of the PP verification program that runs on the beginning of most deadstart tapes. OK, what does that do?

It hangs at location 6. It did that because it failed a ZJN (jump on zero) instruction. Why is that? The accumulator wasn’t zero. Hmm, instruction 1 was LDN 0, which loads the accumulator with 0! Why doesn’t that work? After another day, or so, I prove to myself that it actually does work, and 0 gets loaded into the accumulator at the end of instruction 1. Another thing that isn’t the problem!

What’s next? The next instruction is UJN 2, (unconditional jump 2 locations forward) which being at location 2, should jump to 4, which it does. It is not supposed to change the contents of the accumulator, but it does!

There are 2 inputs to the “A” adder, the A input is selected to be A, and the B input is zeros. All 12 of the inputs to the A side are zero. Wait: aren’t there 18 bit in the accumulator, what about those other 6 bits? Ah: bit 14 is a 1!

It will not sit still! I chase bit 14 for a while, and it starts working, but a different bit is failing now! I chased different bits around the loop for a while, put module K01 on the extender to look, and the test started passing! This worked for a while. I had the PPs test memory, and that worked, but if I had CP0 test memory, it didn’t like it. When I got back from lunch, it had gone back to failing my LDN 0 test. I put some secret sauce on the pins of module K01, and we are back to trying to run other diagnostics.

I remembered I was having trouble with the imaginary tape drives, to I tried booting from real tape, and I get to the part where it tests memory, and that fails. OK, we have some progress.

That was then, this is now, and we are back to failing to LDN 0. I found that bit 0 for the “B” input of the A adder was not correct. It seems that a via rivet was not conducting between the collector of Q30 and Q32 to the base of Q19 on my friend the QA module in K01. I resoldered all the via rivets, and the edge pins, just for good measure.

Central Memory still doesn’t work, but I can run some diagnostics again!

To paraphrase Sherlock Holmes: When you eliminate all the things the problem isn’t, you are left with what the problem is!

Bruce Sherry

Bendix G15 Germanium Diodes

In restoring the Bendix G-15 vacuum tube computer, I have uncovered a phenomena which is requiring us to replace over 3000 germanium diodes. These diodes appear to have lost their hermetic seal and the atmospheric contamination has caused their leakage current to rise to very high levels as they reach a normal operating ambient temperature of approx. 40 degrees C. Because these diodes are used in the clamp circuits that generate the 20 volt logic swing of the computer, the combined low impedance of the approx. 3000 diodes ends up shorting out the -20 volt power supply after 5 to 10 minutes of power-on time.
We have replacement diodes on order, and this should resolve the power supply issue.

Interestingly though, the failed diodes exhibit another interesting phenomena which this engineer hasn’t seen before.  Hooking up a diode to an ohmmeter to measure its leakage current, and heating the diode to about 40 degrees C, causes the diode leakage, measured as resistance, to go from a few thousand ohms to a few tens of ohms.  If the ohmmeter remains connected and the diode is allowed to cool to normal ambient, the low resistance measurement persists.  If the ohmmeter is disconnected briefly and then reconnected, the diode leakage current returns to its nominal few thousand ohms.