After working on the Lambda’s monitors as described in my last writeup, my next plan of action was to see if I could get diagnostics loaded into the SDU via 9-track tape.
ROM Upgrade Time
But first, I wanted to upgrade the SDU’s Monitor ROM set. The SDU Monitor is a program that runs on the SDU’s 8088 processor. It provides the user’s interface to the SDU where it provides commands for loading and executing files, and booting the system. It also communicates with devices on the Multibus and the NuBus. As received, my Lambda has Version 8 of the monitor which is as far as I know the last version released to the public at large. However, the Lambdas that Daniel Seagraves owns came with an internal-only Monitor in their SDUs, designated Version 102. This version adds a few convenient features: it can deal more gracefully with loss of CMOS RAM (important since I don’t have a backup battery anymore) and adds a few commands for defining custom hard drive types.
A week or so prior, Daniel had sent me a copy of the Version 102 ROMs so all I had to do was write (“burn”) the copy onto real EPROMs and install them in the SDU. I had spare EPROMs (four Intel 27128A’s) at the ready but the thing about EPROMs is that they need to be erased before they can be programmed with new data. To do that, you need an EPROM eraser — a little box with a UV lamp in it and a timer — and after searching the house for mine, I came to the realization that I’d taken it to work a few months back for and I’d never brought it back home. And due to present circumstances, it was going to be stuck there for awhile.
So with much wailing and gnashing of teeth I ordered a replacement off the Internet and began waiting patiently for it to arrive in 5-7 days. Meanwhile I decided to start documenting this entire process for some kind of blog thing and so I went off, took pictures, and started writing long-winded prose about Lisp Machines and restorations.
Also during this time I decided to test the old wives’ tale about using sunlight to erase EPROMs. At that time Seattle was experiencing an extremely lovely bout of sunny weather, so I took four 27128’s outside, put them on the windowsill so as to gather as much sun as possible, and left them there for the next four days.
[Four Days Pass…]
They’re still not erased. So much for that idea. Only a few more days until my real EPROM eraser arrives anyway…
[A Few More Days Pass…]
At last my dirt-cheap EPROM ERASER arrived on my doorstep bearing dire warnings of UV exposure and also of the overheating of this fine precision instrument. Ignoring the 15 minute time-limit warning, I put four EPROMs into the drawer, cranked the timer up to 30 minutes and turned it on. And once again I found myself waiting.
[Thirty Minutes Pass…]
I pulled out my trusty DATA I/O 280 programmer and ran its “blank check” routine to ensure that the EPROMs were indeed as blank as they ought to be, and the programmer said “BLANK CHECK OK.”
It’s then a simple matter to hook the programmer up to my PC and program the new ROMs and soon enough all four were ready to get installed in the SDU. But before I did that I wanted to double-check that the Lambda was still operating — it’d been a couple of weeks since I had last powered it up and things can go wrong sometimes. Best not to introduce a new variable (i.e. new ROMs) into the equation before I can verify the current state.
And so I hooked things back up to the Lambda and turned it on. And… nothing. No SDU prompt on the terminal and all three LEDs on the front panel are stuck on solid. (As we learned in my second post in this series, this indicates that the SDU is failing its self tests.) I pressed the Reset button a couple of times. Nothing. Power cycled the system just for luck. NIL.
“Well, fiddle-dee-dee!” I said. (I may have used slightly more colorful language than this, but this is a family-friendly blog). “Gosh darn it all to heck.”
I retraced my steps — had I changed anything since the last time I’d powered it on? Yes — I’d installed an Ethernet board that Daniel had graciously sent me (my system apparently never had an Ethernet interface, which is an odd choice for a Lisp Machine). Maybe the Ethernet board was causing some problem here? Pulling the board made no difference in behavior. I checked the power supply voltages at the power supply and at the backplane and everything was dead on. I pulled the SDU out and inspected it, and double-checked socket connections and everything looked OK.
Well, at this point I’m frustrated and my tendency in situations like this is to obsess about whether I broke something and so I run in circles for a bit when what I really need to do is take a step back: OK — it’s broken. How is it broken? How do I go about answering that question? Think, man, think!
Well, I know that the three LEDs are on solid — this would indicate that the SDU’s self-test code either wasn’t running or wasn’t getting very far before finding a failure. So: let’s assume for now that the self-test code isn’t running — how do I confirm that this is the case?
The SDU uses an Intel 8088 16-bit microprocessor to do its business, and it’s a relatively simple matter to take a look at various pins on the chip to see if there’s activity, or lack thereof. The most vital things to any processor (and thus good first investigations while debugging a microprocessor-based system) are power, clock, and reset signals. Power obviously makes the CPU actually, you know, do things. A clock signal is what drives the CPU’s internal logic, one cycle at a time, and the reset signal is what tells the CPU to clear its state and restart execution from step 0. A lack of the first two or an abundance of the latter could cause the symptoms I was seeing.
Time to get out the oscilloscope; this will let me see the signals on the pins I’m probing. Looking at the Intel 8088 pinout (at right) the pins I want to look at are pin 40 (Vcc), Pin 21 (RESET) and pin 19 (CLK). Probing reveals immediately that Vcc and CLK are ok. Vcc is a nice solid 5 volts and CLK shows a 5Mhz clock signal. RESET however, is at 3.5V — a logic “1” meaning that the CPU is being held in a Reset state, preventing it from running!
So that’s one question answered: The SDU is catatonic because for some reason RESET is being held high. Typically, RESET gets raised at power-up (to initialize the CPU among other things) and might also be attached to a Reset button or other affordance. In the SDU, there is also power monitoring signal attached to the RESET line designated as DCOT (DC Out of Tolerance) — if the +5 voltage goes out of range the CPU is reset:
It seemed possible (though unlikely) that the Lambda’s Reset switch or the cabling associated with it had failed, causing the symptoms I was seeing, but as expected the cabling tested out OK.
I then checked the DCOT signal and even though the power supply voltages were measuring OK, I was reading 8V on the DCOT pin at the paddleboard. 8V is high for a normal TTL signal (which are normally between 0 and 5V) and this started me wondering. When I disconnected the DCOT wire from the paddleboard, the DCOT signal measured at the power supply was 0V while the signal at the paddleboard remained at 8V… suggesting some sort of failure between the power supply and the SDU for this signal. It also explains the the odd 8V reading– it’s likely derived from a 12V source with a pull-up resistor; the expectation being that the DCOT signal from the power supply would normally pull the signal down further into valid TTL range.
But what could have failed here? Clearly the power supply itself thinks things are OK (hence the 0V reading there). The difference in reading at one end versus the other can really only point to a problem in the wiring between the power supply and the SDU paddleboard.
There is a small three-conductor cable that runs from the SDU paddlecard down to a connector just above the power supply (pictured at the right). A second three-conductor cable is plugged into this and runs to the power supply itself. Checking these signals for continuity revealed that none of the three wires were continuous from the SDU back to the power supplies. The cable from the connector to the power supply tested fine — so what happened to the cable that runs from the connector to the SDU?
I pulled out the power supply tray to get a look at the cabling, and one glance below the card cage revealed the answer:
“Aw, nut bunnies,” I may have been heard to remark to myself. Those three wires had apparently been ripped from the connector (quite neatly, I might add) the last time I had pushed the power supply drawer back in. (Likely while I was taking pictures of the power supplies for my blog writeups…) Quite how it got caught on the tray I’m not sure.
This was easy enough to fix — the wires were reinserted into the pins, and the cable itself rerouted so it would hopefully never get snagged on the power supply tray again. I reconnected everything, held my breath and flipped The Switch one more time.
[Several long seconds pass…]
SDU Monitor version 8 CMOS RAM invalid >>
greeted me on the terminal. Yay. Whew.
New SDU Monitor, At Last
OK. So at last I’m back to where I’d started this whole exercise, after an evening of panic and frenzied investigation. What was it I was going to do when I’d started out? Oh yeah, I had these new SDU ROMs all ready to go, let’s put ’em in:
SDU Monitor version 102 >> >> help r usage: r [-b][-w][-l] addr[,n] w usage: w [-b][-w][-l] addr[,n] d x usage: x [-b][-w][-l] addr[,n] dev usage: dev reset usage: reset [-m] [-n] [-b] enable usage: enable [-x] [-m] [-n] init usage: init ttyset usage: ttyset dev setbaud usage: setbaud portnum baudrate disktype usage: disktype type heads sectors cyls gap1 gap2 interleave skew secsize badtrk disksetup usage: disksetup setdr usage: setdr name file [ptr] >>
Ah, much better. So now the SDU was functional and upgraded, and I was ready to move onto the next phase: running system diagnostics.
The SDU has the capability to run programs off of 9-track tape. This is how an operating system is loaded onto a new disk and it’s how diagnostics are loaded into the system to test the various components. The Lambda uses a Ciprico Tapemaster controller, which is normally hooked up to a Cipher F880 tape drive mounted in the top of the Lambda’s chassis.
My Lambda’s F880 was missing when I picked it up, but the Tapemaster should in theory be able to talk to any tape drive with a Pertec interface. I’m still trying to track down an actual F880 drive, but in the meantime I have one potentially compatible drive in my collection — a Qualstar 1052. This was a low-cost, no-frills drive when it was introduced in the late 1980s but it’s simple and well documented and best of all: it has no plastic or rubber parts, so no worries about parts of the transport turning into tar or becoming brittle and breaking off.
It’s also really slow. The drive has no internal buffer so it can’t read ahead, which means that depending on how it’s accessed it may have to “shoeshine” (reverse the tape, then read forward again) the tape frequently. But speed isn’t really what I’m after here — will it work with the Lambda or won’t it?
I have a tape containing diagnostics (previously written on a modern Unix system with a SCSI 9-track drive attached) ready to go. So I cabled up the Qualstar to the Lambda’s Pertec cabling (as pictured in the above photograph) and attempted to load a program from the tape using the “tar” program:
The tape shoeshined (shoeshone?) once (yay!) and stopped (boo!), and the SDU spat back:
tape IO error 0xD >>
Well, that’s better than nothing, but only barely. But what does IO error 0xD mean? The unfortunate reality is that there is little to no documentation available on the SDU or the associated diagnostics. But I do have the Ciprico Tapemaster manual, thanks to bitsavers.org:
Error 0xD indicates a data parity error: the data being transmitted over the Pertec cabling isn’t making it from the drive to the Tapemaster intact, so the controller is signalling a problem. The SDU stops the transfer and helpfully provides the relevant error code to us.
So where are the parity errors coming from? It could be a controller fault but given this system’s history I decided to take a closer look at the cabling first. A Pertec tape drive is connected to the controller via two 50-pin ribbon cables designated “P1” and “P2.” While I’d previously checked the cables for damage, I hadn’t actually checked the edge connectors at the ends of the cables, and well, there you go:
It’s a bit difficult to discern in the above picture but if you look closely at the gold contacts you can see that there’s greenish-white corrosion on many of them. Dollars to donuts that this is the problem. For cleaning out edge connectors like this, I’ll usually spray the insides with contact cleaner and then, to apply a bit of abrasion to the pins, I wipe a thin piece of cardboard soaked in isopropyl alcohol in and out of the slot. I used this technique here and pulled out a good quantity of crud and dirt, leaving the connector nice and clean. Or at least clean enough to function, I hoped. Rinse and repeat for the second Pertec cable and let’s try this again:
And the tape shoeshines once… and shoeshines again… and again… hm. Is it actually reading anything or is there some other problem and it’s just reading the same block over and over? Let’s let it run for a bit…
>> /tar/load no memory in main bus Initializing SDU SDU Monitor version 102 >>
No more parity errors, and the “load” program did eventually load. It then complained about a lack of memory. It looks like the tape drive, the cable, and the controller all work! (Thanks to the Qualstar’s slowness, it took about five minutes between the “/tar/load” and the “no memory in main bus” error, so this is going to be a time-consuming diagnostic process going forward.)
The “no memory in main bus” error is not unexpected since at that moment the only boards installed in the Lambda’s backplane were the SDU and the tape controller. I have a few memory boards at my disposal, and I opted to re-install the 4mb memory board that normally resides in slot 9. Let’s run that again:
no memory in main bus
SDU Monitor version 102
Well, hm. Maybe that memory board doesn’t work — let’s try the 16mb board normally in slot 12:
>> /tar/load using 220K in slot 12 load version 307 Disk unit 0 is not ready. /tar/loadbin exiting Initializing SDU SDU Monitor version 102 >>
Huzzah! The LMI has memory that works well enough to respond to the SDU, and it has a functional tape subsystem. It’s going to be awhile before I have a functioning disk, and as per the error message in the output, /tar/load expects one to be present. This is completely rational, since “load” is the program that is used to load Lisp load bands onto the disk from tape.
That’s enough for now — in the next installment, since the Lambda is now capable of loading diagnostics from tape, we’ll actually run some diagnostics! Thrills! Chills! Indecipherable hexadecimal sludge! See you next time!