At Home With Josh Part 6: Diagnostic Time!

In our last exciting episode, after a minor setback I got the Lambda’s SDU to load programs from 9-track tape. Now it’s time to see if I can actually test the hardware with the available diagnostics.

Tape Images

Tape images of the Lambda Release and System tapes are available online. Daniel Seagraves has been working on updating the system and has his latest and greatest are available here. A tape image is a file that contains a bit-for-bit copy of the data on the original tape. Using this file in conjunction with a real 9-track drive allows an exact copy of the original media to be made. In my case, I have an HP 7980S 9-track drive connected to a Linux PC for occasions such as these. At the museum we have an M4 Data 9-track drive set up to do the same thing. The old unix workhorse tool “dd” can be used to write these files back to tape, one at a time:

$ dd if=file1 of=/dev/nst0 bs=1024

(Your UNIX might name tape devices differently, consult your local system administrator for more information.)

Data on 9-track tapes is typically stored as a sequence of files, each file being separated by a file mark. The Lambda Release tape contains five such files, the first two being relevant for diagnostics and installation, and the remainder containing Lisp load bands and microcode that get copied onto disk when a Lisp system is installed.

The first file on tape is actually an executable used by the SDU — it is a tiny 2K program that can extract files from UNIX tar archives on tape and execute them. Not coincidentally, this program is called “tar.” The second tape file is an actual tar archive that contains a variety of utility programs and diagnostics. Here’s a rundown of the interesting files we have at our disposal:

  • 3com – Diagnostic for the Multibus 3Com Ethernet controller
  • 2181 – Diagnostic for the Interphase 2181 SMD controller
  • cpu – Diagnostic for the 68010 UNIX processor
  • lam – Diagnostic for the Lambda’s Lisp processors
  • load – Utility for loading a new system: partitioning disks and copying files from tape.
  • ram – Diagnostic for testing NuBus memory
  • setup – Utility for configuring the system
  • vcmem – Diagnostic for testing the VCMEM (console interface) boards.

The unfortunate thing is: there is no documentation for most of these beyond searching for strings in the files that might reveal secrets. Daniel worked out the syntax for some of them while writing his LambdaDelta emulator, but a lot of details are still mysterious.

In case you missed it, I summarized the hardware in the system along with a huge pile of pictures of the installed boards in an earlier post — it might be helpful to reacquaint yourself to get some context for the following diagnostic runs. Plus pictures are pretty.

I arbitrarily decided to start by testing the NuBus memory boards, starting with the 16mb board in slot 9 (which I’d moved from slot 12 since the last writeup). The diagnostic is loaded and executed using the aforementioned tar program as below. The “-v” is the verbose flag, so we’ll get more detailed output. the “-S 9” indicates to the diagnostic that we want to test the board in slot 12.

SDU Monitor version 102
>> reset
>> /tar/ram -v -S 9
ram: error 6 test 1 bad reset state, addr=0xf9ffdfe0, =0x1, should=0x4
ram: error11 test 3 bad configuration rom
ram: error 1 test 6 bad check bits 0xffff, should be 0xc, data 0x0
ram: error 1 test 7 bad check bits 0xffff, should be 0xc, data 0xffffffff
ram: error 7 test 8 for dbe w/flags off, DBE isn't on
ram: error 7 test 9 for dbe w/flags off, DBE isn't on
ram: status fill addr 0xf9000000
ram: status fill addr 0xf9002000
... [elided for brevity] ...
ram: status fill addr 0xf903c000
ram: status fill addr 0xf903e000
ram: status fill check addr 0xf9000000
ram: status fill check addr 0xf9002000
... [elided for brevity] ...
ram: status fill check addr 0xf903c000
ram: status fill check addr 0xf903e000

Well, the first few lines don’t look exactly promising what with all the errors being reported. The test does continue on to fill and check regions of the memory but only up through address 0xf907e000 (the first 512KB of memory on the board, that is). Thereafter:

ram: status fill check addr 0xf907c000
ram: status fill check addr 0xf907e000
ram: status block of length 0x4000 at 0xf9000000
ram: status stepsize 4 forward
ram: error 4 test 16 addr 0xf9000004 is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf9000008 is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf900000c is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf9000010 is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf9000014 is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf9000018 is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf900001c is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf9000020 is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf9000024 is 0xffffffff sb 0x0 (data f/o)
ram: error 4 test 16 addr 0xf9000028 is 0xffffffff sb 0x0 (data f/o)

And so on and so forth, probably across the entire region from 0xf9000000-0xf907ffff. This would take a long time to run to completion (remember, this output is coming across a 9600bps serial line — each line takes about a second to print) so I wasn’t about to test this theory. The output appears to be indicating that memory reads are returning all 1’s (0xffffffff) where they’re supposed to be 0 (0x0).

So this isn’t looking very good, but there’s a twist: These diagnostics fail identically under Daniel’s emulator. After some further discussion with Daniel it turns out these diagnostics do not apply to the memory boards I have installed in the system (or that the emulator simulates). The Memory boards that were available at the time of the Lambda’s introduction were tiny in capacity: Half megabyte boards were standard and it was only later that larger (1, 2, 4, 8, and 16mb boards) were developed. The only memory boards I have are the later 4 and 16mb boards and these use different control registers and as a result the available diagnostics don’t work properly. If there ever was a diagnostic written for these newer, larger RAM boards, it has been lost to the ages.

This means that I won’t be able to do a thorough check of the memory boards, at least not yet. But maybe I can test the Lisp CPU? I slotted the RG, CM, MI and DP boards into the first four slots of the backplane and started up the lam diagnostic program:

SDU Monitor version 102
>> reset
>> /tar/lam -v
/tar/lam version 6
compiled by wer on Wed Mar 28 15:24:02 1984 from machine capricorn
setting up maps
initializing lambda
starting conreg = 344
PMR
passed ones test
passed zeros test
TRAM-ADR
passed ones test
passed zeros test
TRAM
passed ones test
passed zeros test
loading tram; double-double
disk timed out; unit=0x0 cmd=0x8F stat=0x0 err=0x0
disk unit 0 not ready
can't open c.tram-d-d
SPY:
passed ones test
passed zeros test
HPTR:
Previous uinst destination sequence
was non-zero after force-source-code-word
during lam-execute-r
Previous uinst destination sequence
was non-zero after force-source-code-word
during lam-execute-r
Previous uinst destination sequence
was non-zero after force-source-code-word
... [and so on and so forth] ...

Testing starts off looking pretty good — the control registers and TRAM (“Timing RAM”) tests pass, and then it tries to load a TRAM file from disk. Aww. I don’t have a disk connected yet, and even if I did it wouldn’t have any files on it. And to add insult to injury, as it turns out even the file it’s trying to load (“double-double”) is unavailable — like the later RAM diagnostics, it is lost to the ages. The TRAM controls the speed of the execution of the lisp processor and the “double-double” TRAM file causes the processor to run slowly enough that the SDU can interrogate it while running diagnostics. Without a running disk containing that file I won’t be able to proceed here.

So, as with the memory I can verify that the processor’s hardware is there and at least responding to the outside world, but I cannot do a complete test. Well, shucks, this is getting kind of disappointing.

The vcmem diagnostic tests the VCMEM board — this board contains the display controller and memory that drives the high-resolution terminals that I restored in a previous writeup. It also contains the serial interfaces for the terminal’s keyboard and mouse. Perhaps it’s finally time to test out the High-Resolution Terminals for real. I made some space on the bench next to the Lambda and set the terminal and keyboard up there, and grabbed one of the two console cables and plugged it in. After powering up the Lambda, I was greeted with a display full of garbage!

Isn’t that the most beautiful garbage you’ve ever seen?

This may not look like much, but this was a good sign: The monitor was syncing to the video signal, and the display (while full of random pixels) is crisp and clear and stable. The garbage being displayed was likely due to the video memory being uninitialized: Nothing had yet cleared the memory or reset the VCMEM registers. There is an SDU command called “ttyset” that assigns the SDU’s console to various devices; currently I’d been starting the Lambda up in a mode that forces it to use the serial port on the back as the console, but by executing

>> ttyset keytty

The SDU will start using the High-Resolution terminal as the console instead. And, sure enough, executing this caused the display to clear and then:

It lives!

There we are, a valid display on the screen! The keyboard appeared to work properly and I was able to issue commands to the SDU using it. So even without running the vcmem diagnostic, it’s apparent that the VCMEM board is at least minimally functional. But I really wanted to see one of these diagnostics do its job, so I ran it anyway:

SDU Monitor version 102
/tar/vcmem -v -S 8
vcmem: status addr = 0xf8020000
vcmem: status fill addr 0xf8020000
... [elided again for brevity] ...
vcmem: status fill addr 0xf803e000
vcmem: status fill check addr 0xf8020000
vcmem: status fill check addr 0xf8022000
vcmem: status fill check addr 0xf8024000
vcmem: status fill check addr 0xf8026000
...
vcmem: status fill check addr 0xf8036000
vcmem: status fill check addr 0xf8038000
vcmem: status fill check addr 0xf803a000
vcmem: status fill check addr 0xf803c000
vcmem: status fill check addr 0xf803e000

As the test continued, patterns on the screen slowly changed, reflecting the memory being tested. Many different memory patterns are tested over the next 15 minutes.

vcmem: status movi block at 0xf803c000
vcmem: status movi stepsize 2 forward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
vcmem: status movi stepsize 2 backward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
vcmem: status movi stepsize 4 forward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
vcmem: status movi stepsize 4 backward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
vcmem: status movi stepsize 8 forward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
vcmem: status movi stepsize 8 backward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
... [elided] ...
vcmem: status movi stepsize 4096 forward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
vcmem: status movi stepsize 4096 backward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
vcmem: status movi stepsize 8192 forward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000
vcmem: status movi stepsize 8192 backward
vcmem: status movi checking 0x0000 writing 0xffff
vcmem: status movi checking 0xffff writing 0x0000

And at last the test finished with no errors reported, leaving a test pattern on the display. How about that, a diagnostic that works with the hardware I have.

Not your optometrist’s eye chart…

Looking crisp, clear, and nice and straight. This monitor is working fine — what about the other one? As you might recall, I got two High-Resolution Terminals with this system and pre-emptively cleaned and replaced all the capacitors in both of them. The second of these would not display anything on the screen when powered up (unlike the first) though I was seeing evidence that it was otherwise working. Now that I’d verified that the VCMEM board was working and producing a valid video signal, I thought I’d see if I could get anything out of the second monitor.

Well, what do you know? Note the cataracts in the corners.

Lo and behold: it works! I soon discovered the reason for the difference in behavior between the two monitors: The potentiometer (aka “knob”) that controls the contrast on this display is non-functional; with it turned up on the first monitor you can see the retrace, with it turned down it disappears. Interestingly the broken contrast control doesn’t seem to have a detrimental effect on the display, as seen above.

So that’s a VCMEM board, two High-Resolution Terminals, and the keyboard tested successfully, with the CPU and Memory boards only partially covered. I have yet to test the Ethernet and Disk controllers. The 3com test runs:

SDU Monitor version 102
>> /tar/3com -v
3com: status Reading station address rom start addr=0xff030600
3com: status Reading station address ram start addr=0xff030400
3com: status Transmit buffer: 0xff030800 to 0xff030fff.
3com: status Receive A buffer: 0xff031000 to 0xff0317ff.
3com: status Receive B buffer: 0xff031800 to 0xff031fff.
3com: status Receive buffer A - 0x1000 to 0x17ff.
3com: status Receive buffer B - 0x1800 to 0x1fff.
>>
Hex editors to the rescue!

No errors reported and the test exits without complaining so it looks like things are OK here. Now onto the disk controller. I don’t have a disk hooked up at the moment, but after a bit of digging into the test’s binary, it looks like the “-C” option should run controller-only tests:

SDU Monitor version 102
>>/tar/2181 -C
Initializing controller
2181: error 3 test 0 Alarm went off - gave up waiting for IO completion
2181: error 3 test 0 Alarm went off - gave up waiting for IO completion
2181: error 10 test 0 no completion (either ok or error) from iopb status
iopb: cyl=0 head=0 sector=0 (TRACK 0)
87 11 00 00 00 00 00 00 00 00 00 00 10 00 c5 62 00 40 00 00 00 00 c5 3a
2181: error 3 test 0 Alarm went off - gave up waiting for IO completion
2181: error 3 test 0 Alarm went off - gave up waiting for IO completion
2181: error 10 test 0 no completion (either ok or error) from iopb status
iopb: cyl=0 head=0 sector=0 (TRACK 0)
87 11 00 00 00 00 00 00 00 00 00 00 10 00 c5 62 00 40 00 00 00 00 c5 3a
2181: error 3 test 0 Alarm went off - gave up waiting for IO completion
2181: error 3 test 0 Alarm went off - gave up waiting for IO completion
2181: error 10 test 0 no completion (either ok or error) from iopb status
iopb: cyl=0 head=0 sector=0 (TRACK 0)
87 11 00 00 00 00 00 00 00 00 00 00 10 00 c5 62 00 40 00 00 00 00 c5 3a
2181: error 3 test 0 Alarm went off - gave up waiting for IO completion

This portends a problem. The output seems to indicate that the test is asking the controller to do something and then report a status (either “OK” or “Error”) and the controller isn’t responding at all within the allotted time, so the diagnostic gives up and reports a problem.

This could be caused by the lack of a disk, perhaps the “-C” option isn’t really doing what it seems like it should, but my hacker sense wass tingling, and my thought was that there was a real problem here.

Compounding this problem is a lack of any technical information on the Interphase SMD 2181 controller. Not even a user’s manual. The Lambda came with a huge stack of (very moldy) documentation, including binders covering the hardware: “Hardware 1” and “Hardware 3.” There’s supposed to be a “Hardware 2” binder but it’s missing… and guess which binder contains the 2181 manual? Sigh.

There are two LEDs on the controller itself and at power-up they both come on, one solid, one dim. In many cases LEDs such as these are used to indicate self-test status — but lacking documentation I have no way to interpret this pattern. I put out a call on the Interwebs to see if I could scare up anything, but to no avail.

Looks like my diagnostic pass at the system was a mixed bag: Outdated diagnostics, meager documentation, and what looks like a bad disk controller combined with the success of the consoles and at least a basic verification of most of the Lambda’s hardware.

In my next installment, I’ll hook up a disk and see if I can’t suss out the problem with the Interphase 2181. Until then, keep on chooglin’.

At Home With Josh Part 5: Tape Drives and EPROMS And Whiskers on Kittens

After working on the Lambda’s monitors as described in my last writeup, my next plan of action was to see if I could get diagnostics loaded into the SDU via 9-track tape.

ROM Upgrade Time

Monitor version 8

But first, I wanted to upgrade the SDU’s Monitor ROM set. The SDU Monitor is a program that runs on the SDU’s 8088 processor. It provides the user’s interface to the SDU where it provides commands for loading and executing files, and booting the system. It also communicates with devices on the Multibus and the NuBus. As received, my Lambda has Version 8 of the monitor which is as far as I know the last version released to the public at large. However, the Lambdas that Daniel Seagraves owns came with an internal-only Monitor in their SDUs, designated Version 102. This version adds a few convenient features: it can deal more gracefully with loss of CMOS RAM (important since I don’t have a backup battery anymore) and adds a few commands for defining custom hard drive types.

One of the 27128A EPROMs

A week or so prior, Daniel had sent me a copy of the Version 102 ROMs so all I had to do was write (“burn”) the copy onto real EPROMs and install them in the SDU. I had spare EPROMs (four Intel 27128A’s) at the ready but the thing about EPROMs is that they need to be erased before they can be programmed with new data. To do that, you need an EPROM eraser — a little box with a UV lamp in it and a timer — and after searching the house for mine, I came to the realization that I’d taken it to work a few months back for and I’d never brought it back home. And due to present circumstances, it was going to be stuck there for awhile.

Bah.

So with much wailing and gnashing of teeth I ordered a replacement off the Internet and began waiting patiently for it to arrive in 5-7 days. Meanwhile I decided to start documenting this entire process for some kind of blog thing and so I went off, took pictures, and started writing long-winded prose about Lisp Machines and restorations.

Also during this time I decided to test the old wives’ tale about using sunlight to erase EPROMs. At that time Seattle was experiencing an extremely lovely bout of sunny weather, so I took four 27128’s outside, put them on the windowsill so as to gather as much sun as possible, and left them there for the next four days.

[Four Days Pass…]

They’re still not erased. So much for that idea. Only a few more days until my real EPROM eraser arrives anyway…

[A Few More Days Pass…]

At last my dirt-cheap EPROM ERASER arrived on my doorstep bearing dire warnings of UV exposure and also of the overheating of this fine precision instrument. Ignoring the 15 minute time-limit warning, I put four EPROMs into the drawer, cranked the timer up to 30 minutes and turned it on. And once again I found myself waiting.

[Thirty Minutes Pass…]

The Faithful DATA I/O 280 Gang Programmer

I pulled out my trusty DATA I/O 280 programmer and ran its “blank check” routine to ensure that the EPROMs were indeed as blank as they ought to be, and the programmer said “BLANK CHECK OK.”

It’s then a simple matter to hook the programmer up to my PC and program the new ROMs and soon enough all four were ready to get installed in the SDU. But before I did that I wanted to double-check that the Lambda was still operating — it’d been a couple of weeks since I had last powered it up and things can go wrong sometimes. Best not to introduce a new variable (i.e. new ROMs) into the equation before I can verify the current state.

Uh Oh

And so I hooked things back up to the Lambda and turned it on. And… nothing. No SDU prompt on the terminal and all three LEDs on the front panel are stuck on solid. (As we learned in my second post in this series, this indicates that the SDU is failing its self tests.) I pressed the Reset button a couple of times. Nothing. Power cycled the system just for luck. NIL.

“Well, fiddle-dee-dee!” I said. (I may have used slightly more colorful language than this, but this is a family-friendly blog). “Gosh darn it all to heck.”

I retraced my steps — had I changed anything since the last time I’d powered it on? Yes — I’d installed an Ethernet board that Daniel had graciously sent me (my system apparently never had an Ethernet interface, which is an odd choice for a Lisp Machine). Maybe the Ethernet board was causing some problem here? Pulling the board made no difference in behavior. I checked the power supply voltages at the power supply and at the backplane and everything was dead on. I pulled the SDU out and inspected it, and double-checked socket connections and everything looked OK.

Well, at this point I’m frustrated and my tendency in situations like this is to obsess about whether I broke something and so I run in circles for a bit when what I really need to do is take a step back: OK — it’s broken. How is it broken? How do I go about answering that question? Think, man, think!

Well, I know that the three LEDs are on solid — this would indicate that the SDU’s self-test code either wasn’t running or wasn’t getting very far before finding a failure. So: let’s assume for now that the self-test code isn’t running — how do I confirm that this is the case?

The SDU uses an Intel 8088 16-bit microprocessor to do its business, and it’s a relatively simple matter to take a look at various pins on the chip to see if there’s activity, or lack thereof. The most vital things to any processor (and thus good first investigations while debugging a microprocessor-based system) are power, clock, and reset signals. Power obviously makes the CPU actually, you know, do things. A clock signal is what drives the CPU’s internal logic, one cycle at a time, and the reset signal is what tells the CPU to clear its state and restart execution from step 0. A lack of the first two or an abundance of the latter could cause the symptoms I was seeing.

i8088 pinout, from the datasheet.

Time to get out the oscilloscope; this will let me see the signals on the pins I’m probing. Looking at the Intel 8088 pinout (at right) the pins I want to look at are pin 40 (Vcc), Pin 21 (RESET) and pin 19 (CLK). Probing reveals immediately that Vcc and CLK are ok. Vcc is a nice solid 5 volts and CLK shows a 5Mhz clock signal. RESET however, is at 3.5V — a logic “1” meaning that the CPU is being held in a Reset state, preventing it from running!

So that’s one question answered: The SDU is catatonic because for some reason RESET is being held high. Typically, RESET gets raised at power-up (to initialize the CPU among other things) and might also be attached to a Reset button or other affordance. In the SDU, there is also power monitoring signal attached to the RESET line designated as DCOT (DC Out of Tolerance) — if the +5 voltage goes out of range the CPU is reset:

Power supply status signals, from the “SDU General Description” manual.

It seemed possible (though unlikely) that the Lambda’s Reset switch or the cabling associated with it had failed, causing the symptoms I was seeing, but as expected the cabling tested out OK.

SDU Paddlecard. The cable carrying the DCOT signal is the bundle 2nd from the right.

I then checked the DCOT signal and even though the power supply voltages were measuring OK, I was reading 8V on the DCOT pin at the paddleboard. 8V is high for a normal TTL signal (which are normally between 0 and 5V) and this started me wondering. When I disconnected the DCOT wire from the paddleboard, the DCOT signal measured at the power supply was 0V while the signal at the paddleboard remained at 8V… suggesting some sort of failure between the power supply and the SDU for this signal. It also explains the the odd 8V reading– it’s likely derived from a 12V source with a pull-up resistor; the expectation being that the DCOT signal from the power supply would normally pull the signal down further into valid TTL range.

But what could have failed here? Clearly the power supply itself thinks things are OK (hence the 0V reading there). The difference in reading at one end versus the other can really only point to a problem in the wiring between the power supply and the SDU paddleboard.

Connectors just above the power supply. Connector on the left carries actual power, connector on the right contains the power supply status signals.

There is a small three-conductor cable that runs from the SDU paddlecard down to a connector just above the power supply (pictured at the right). A second three-conductor cable is plugged into this and runs to the power supply itself. Checking these signals for continuity revealed that none of the three wires were continuous from the SDU back to the power supplies. The cable from the connector to the power supply tested fine — so what happened to the cable that runs from the connector to the SDU?

I pulled out the power supply tray to get a look at the cabling, and one glance below the card cage revealed the answer:

Oh.

“Aw, nut bunnies,” I may have been heard to remark to myself. Those three wires had apparently been ripped from the connector (quite neatly, I might add) the last time I had pushed the power supply drawer back in. (Likely while I was taking pictures of the power supplies for my blog writeups…) Quite how it got caught on the tray I’m not sure.

This was easy enough to fix — the wires were reinserted into the pins, and the cable itself rerouted so it would hopefully never get snagged on the power supply tray again. I reconnected everything, held my breath and flipped The Switch one more time.

[Several long seconds pass…]

SDU Monitor version 8
CMOS RAM invalid
>>

greeted me on the terminal. Yay. Whew.

New SDU Monitor, At Last

OK. So at last I’m back to where I’d started this whole exercise, after an evening of panic and frenzied investigation. What was it I was going to do when I’d started out? Oh yeah, I had these new SDU ROMs all ready to go, let’s put ’em in:

SDU Monitor version 102
>>
>> help
r usage: r [-b][-w][-l] addr[,n]
w usage: w [-b][-w][-l] addr[,n] d
x usage: x [-b][-w][-l] addr[,n]
dev usage: dev
reset usage: reset [-m] [-n] [-b]
enable usage: enable [-x] [-m] [-n]
init usage: init
ttyset usage: ttyset dev
setbaud usage: setbaud portnum baudrate
disktype usage: disktype type heads sectors cyls gap1 gap2 interleave skew secsize badtrk
disksetup usage: disksetup
setdr usage: setdr name file [ptr]

>>

Ah, much better. So now the SDU was functional and upgraded, and I was ready to move onto the next phase: running system diagnostics.

9-Track Mind

The SDU has the capability to run programs off of 9-track tape. This is how an operating system is loaded onto a new disk and it’s how diagnostics are loaded into the system to test the various components. The Lambda uses a Ciprico Tapemaster controller, which is normally hooked up to a Cipher F880 tape drive mounted in the top of the Lambda’s chassis.

Qualstar 1052 9-Track Tape Drive

My Lambda’s F880 was missing when I picked it up, but the Tapemaster should in theory be able to talk to any tape drive with a Pertec interface. I’m still trying to track down an actual F880 drive, but in the meantime I have one potentially compatible drive in my collection — a Qualstar 1052. This was a low-cost, no-frills drive when it was introduced in the late 1980s but it’s simple and well documented and best of all: it has no plastic or rubber parts, so no worries about parts of the transport turning into tar or becoming brittle and breaking off.

It’s also really slow. The drive has no internal buffer so it can’t read ahead, which means that depending on how it’s accessed it may have to “shoeshine” (reverse the tape, then read forward again) the tape frequently. But speed isn’t really what I’m after here — will it work with the Lambda or won’t it?

I have a tape containing diagnostics (previously written on a modern Unix system with a SCSI 9-track drive attached) ready to go. So I cabled up the Qualstar to the Lambda’s Pertec cabling (as pictured in the above photograph) and attempted to load a program from the tape using the “tar” program:

>> /tar/load

The tape shoeshined (shoeshone?) once (yay!) and stopped (boo!), and the SDU spat back:

tape IO error 0xD
>>

Well, that’s better than nothing, but only barely. But what does IO error 0xD mean? The unfortunate reality is that there is little to no documentation available on the SDU or the associated diagnostics. But I do have the Ciprico Tapemaster manual, thanks to bitsavers.org:

Relevant snippet from the Ciprico Tapemaster manual

Error 0xD indicates a data parity error: the data being transmitted over the Pertec cabling isn’t making it from the drive to the Tapemaster intact, so the controller is signalling a problem. The SDU stops the transfer and helpfully provides the relevant error code to us.

So where are the parity errors coming from? It could be a controller fault but given this system’s history I decided to take a closer look at the cabling first. A Pertec tape drive is connected to the controller via two 50-pin ribbon cables designated “P1” and “P2.” While I’d previously checked the cables for damage, I hadn’t actually checked the edge connectors at the ends of the cables, and well, there you go:

Crusty Connectors
It’s cleaner now, trust me.

It’s a bit difficult to discern in the above picture but if you look closely at the gold contacts you can see that there’s greenish-white corrosion on many of them. Dollars to donuts that this is the problem. For cleaning out edge connectors like this, I’ll usually spray the insides with contact cleaner and then, to apply a bit of abrasion to the pins, I wipe a thin piece of cardboard soaked in isopropyl alcohol in and out of the slot. I used this technique here and pulled out a good quantity of crud and dirt, leaving the connector nice and clean. Or at least clean enough to function, I hoped. Rinse and repeat for the second Pertec cable and let’s try this again:

>> /tar/load

And the tape shoeshines once… and shoeshines again… and again… hm. Is it actually reading anything or is there some other problem and it’s just reading the same block over and over? Let’s let it run for a bit…

A graphic portrayal of tape shoeshining!
>> /tar/load
no memory in main bus
Initializing SDU
 
SDU Monitor version 102
>>

No more parity errors, and the “load” program did eventually load. It then complained about a lack of memory. It looks like the tape drive, the cable, and the controller all work! (Thanks to the Qualstar’s slowness, it took about five minutes between the “/tar/load” and the “no memory in main bus” error, so this is going to be a time-consuming diagnostic process going forward.)

The “no memory in main bus” error is not unexpected since at that moment the only boards installed in the Lambda’s backplane were the SDU and the tape controller. I have a few memory boards at my disposal, and I opted to re-install the 4mb memory board that normally resides in slot 9. Let’s run that again:

>> /tar/load
no memory in main bus
Initializing SDU

SDU Monitor version 102
>>

Well, hm. Maybe that memory board doesn’t work — let’s try the 16mb board normally in slot 12:

>> /tar/load
using 220K in slot 12
load version 307
Disk unit 0 is not ready.

/tar/loadbin exiting
Initializing SDU

SDU Monitor version 102
>>

Huzzah! The LMI has memory that works well enough to respond to the SDU, and it has a functional tape subsystem. It’s going to be awhile before I have a functioning disk, and as per the error message in the output, /tar/load expects one to be present. This is completely rational, since “load” is the program that is used to load Lisp load bands onto the disk from tape.

That’s enough for now — in the next installment, since the Lambda is now capable of loading diagnostics from tape, we’ll actually run some diagnostics! Thrills! Chills! Indecipherable hexadecimal sludge! See you next time!