hardware, restoration, software

Fixing 40-year-old Software Bugs, Part One

The museum had a big event a few weeks ago, celebrating the 45th anniversary of the 1st “Intergalactic Spacewar Olympics.”  Just a couple of weeks before said event, the museum acquired a beautiful Digital Equipment Corporation Lab-8/e minicomputer and I thought it would be an interesting challenge to get the system restored and running Spacewar in time for the event.

As is fairly obvious to you DEC-heads out there, the Lab-8/e was a PDP-8/e minicomputer in a snazzy green outfit.  It came equipped with scads of analog hardware for capturing and replaying laboratory data, and a small Tektronix scope for displaying information.  What makes this machine perfect for the PDP-8 version of Spacewar is the inclusion of the VC8E Point Plotting controller and the KE8E Extended Arithmetic Element (or EAE).  The VC8E is used by Spacewar to draw the game’s graphics on a display; the EAE is used to make the various rotations and translations done by the game’s code fast enough to be fun.

The restoration was an incredibly painless process.  I started with the power supply which worked wonderfully after replacing 40+ year old capacitors, and from there it was a matter of testing and debugging the CPU and analog hardware.  There were a few minor faults but in a few days everything was looking good, so I moved on to getting Spacewar running.

But which version to choose?  There are a number of Spacewar variants for the PDP-8, but I decided upon this version, helpfully archived on David Gesswein’s lovely PDP-8 site.  It has the advantage of being fairly advanced with lots of interesting options, and the source code is adaptable for a variety of different configurations — it’ll run on everything from a PDP-12 with a VR12 to a PDP-8/e with a VC8E.

I was able to assemble the source file into a binary tape image suited for our Lab-8/e’s hardware using the Palbart assembler.  The Lab-8/e has a VC8E display and the DK8-EP programmable clock installed.  (The clock is used to keep the game running at a constant frame-rate, without it the game speed would vary depending on how much stuff was onscreen and how much work the CPU has to do.)  These are selected by defining VC8E=1 and DKEP=1 in the source file

Loading and running the program yielded an empty display, though the CPU was running *something*.  This was disappointing, but did I really think it’d be that easy?  After some futzing about I noticed that if I hit a key on the Lab-8/e’s terminal, the Tektronix screen would light up briefly for a single frame of the game, and then go dark again.  Very puzzling.

My immediate suspicion was that the DK8-EP programmable clock wasn’t interrupting the CPU. The DK8-EP’s clock can be set to interrupt after a specified interval has elapsed, and Spacewar uses this functionality to keep the game running at a steady speed — every time the clock interrupts, the screen is redrawn and the game’s state is updated.  (Technically, due to the way interrupts are handled by the Spacewar code, an interrupt from any device will cause the screen to be redrawn — which is why input from the terminal was causing the screen flash.)

I dug out the DK8-EP diagnostics and loaded them onto the Lab-8/e.  The DK8-EP passed with flying colors, but Spacewar was still a no go.  I decided to take a closer look at the Spacewar code, specifically the code that sets up the DK8-EP.  That code looks like this (with PDP-12 conditional code elided):

/SUBROUTINE TO START UP CLOCK
/MAY BE HARDWARE DEPENDENT
/THIS IS FOR KW12A CLOCK - PDP12
/OR PROGRAMABLE PDP8E CLOCK DK8EP
   CLSK=6131      /SKIP IF CLOCK
   CLLR=6132      /LOAD CONTROL
   CLAB=6133      /AC TO BUFFER PRESET
   CLEN=6134      /LOAD ENABLE
   CLSA=6135      /BIT RESET FLAGS

STCLK, 0
   CLA CLL        /JUST IN CASE   
   TAD (-40       /ABOUT 30CPS
   CLAB           /LOAD PRSET
   CLA CLL
   TAD (5300      /INTR ON CLOCK - 1KC
   CLLR
   CLA CLL
   JMP I STCLK

The bit relevant to our issue is in bold above; the CLLR IOT instruction is used to load the DK8-EP’s clock control register with the contents of the 8’s Accumulator register (in this case, loaded with the value 5300 octal by the previous instruction).  The comments suggest that this sets a 1 Khz clock rate, with an interrupt every time the clock overflows.

I dug out the a copy of the programming manual for the DK8-EP from the 1972 edition of the “PDP-8 Small Computer Handbook” (which you can find here if you’re so  inclined).  Pages 7-28 and 7-29 reveal the following information:

DK8-EP Nitty Gritty

 

The instruction we’re interested in is the CLDE (octal 6132) instruction: (the Spacewar code defines this as CLLR) “Set Clock Enable Register Per AC.”  The value set in the AC by the Spacewar code (from the octal value 5300) decodes as:

  • Bit 0 set: Enables clock overflow to cause an interrupt.
  • Bits 1&2 set to 01: Counter runs at selected rate.
  • Bits 3,4&5 set to 001: 1Khz clock rate.

(Keep in mind that the PDP-8, like many minicomputers from the era, numbers its bits in the opposite order of today’s convention, so the MSB is bit 0, and the LSB is bit 11.)  So the comments in the code appear to be correct: the code sets up the clock to interrupt, and it should be enabled and running at a 1Khz rate.  Why wasn’t it interrupting?  I wrote a simple test program to verify the behavior outside of Spacewar, just in case it was doing something unexpected that was affecting the clock.  It behaved identically.  At this point I was beyond confused.

But wait: The diagnostic was passing — what was it doing to make interrupts happen?

DK8E-EP Diagnostic Listing

The above is a snippet of code from the DK8E family diagnostic listing, used to test whether a clock overflow causes an interrupt as expected.  The JMS I XIOTF instruction at location 2431 jumps to a subroutine that executes a CLOE IOT to set the Clock Enable Register with the contents in AC calculated in the preceding instruction.  (Wait, CLOE?  I thought the mnemonic was supposed to be CLDE?)  The three TAD instructions at locations 2426-2430 define the Clock Enable Register bits.  The total sum is 4610 octal, which means (again referring to the 1972 Small Computer Handbook):

  • Bit 0 set: Enables clock overflow to cause an interrupt
  • Bits 1+2 unset: Counter runs at selected rate, and overflows every 4096 counts.
  • Bit 3, 4+5 set to 110: 1Mhz clock rate
  • Bit 8 set:  Events in Channels 1, 2, or 3 cause an interrupt request and overflow.

So this seems pretty similar to what the Spacewar code does (at a different clock rate) with one major difference:  Bit 8 is set.  Based on the description in the Small Computer Handbook having bit 8 set doesn’t make a lot of sense — this test isn’t testing channels 1, 2, or 3 and this code doesn’t configure these channels either.  Also, the CLOE vs CLDE mnemonic difference is odd.

All the same, the bit is set and the diagnostic does pass.  What happens if I set that Clock Enable Register bit in the Spacewar code?  Changing the TAD (5300 instruction to TAD (5310 is a simple enough matter (why, I don’t even need to reassemble it, I can just toggle the new bits in via the front panel!) and lo and behold… it works.

But why doesn’t the code make any sense?  I thought perhaps there might have been a different revision of the hardware or a different set of documentation so I took a look around and finally found the following at the end of the DK8-EP engineering drawings:

The Real Instructions

Oh hey look at that why don’t you.  Bit 8’s description is a bit more elaborate here: “Enabled events in channels 1, 2, or 3 or an enabled overflow (bit 0) cause an interrupt request when bit 0 is set to a one.” And per this manual, setting bit 0 doesn’t enable interrupts at all! To add insult to injury, on the very next page we have this:

More Real Info

That’s definitely CLOE, not CLDE.  The engineering drawings date from January 1972 (first revision in 1971), while the 1972 edition of the PDP-8 Small Computer Handbook has a copyright of 1971, so they’re from approximately the same time period.  I suspect that the programming information given in the Small Computer Handbook was simply poorly transcribed from the engineering documentation…  and then Spacewar was written using it as a reference.  There is a good chance given that this version of Spacewar supports a multitude of different hardware (including four different kinds of programmable clocks) that it was never actually tested with a DK8-EP.  Or perhaps there actually was a hardware change removing the requirement for bit 8 being set, though I can find no evidence of one.

So with that bug fixed, all’s well and our hero can ride off into the sunset in the general direction of the 2017 Intergalactic Spacewar Olympics, playing Spacewar all the way.  Right?  Not so fast, we’re not out of the woods yet.  Stay tuned for PART TWO!

restoration

Minecraft World Boundary

The worlds created on the LivingComputers minecraft server have boundaries. One may go from -1024 to 1023 in both directions.
If one gets outside the boundary, one may not move. And if one is far enough outside the boundary, one suffocates in a wall.
There are 2 methods to get outside the boundary: transport there (if you have that permission), and dismount from transportation. Getting off a boat or a minecart may put you outside the boundary. If the boat or minecart are still in position, you may remount it, and reenter the world.
Villagers may not pass through the boundary. However, it is possible to trade across it.
Water passes through the boundary.
Arrows pass through the boundary.
Trees grow through the boundary.
I believe when blocks are destroyed, their remains may end up across the boundary. I believe if you get close enough you can capture it.
I have yet to observe monsters across a boundary, and I suspect I can be shot, but I don’t know if a creeper can blow me up. (I’m not sure I want to find out).

Dynamite will destroy blocks across the boundary.  Water may be poured across the boundary (and fetched.)  A creeper blowing up on my side of the boundary did not seem to destroy blocks across it.

restoration

Minecraft Python coordinates

Using setBlock() to place blocks requires some attention.
Blocks are placed at Integer locations. You may pass a floating point value to the function, but it is converted to integer.
It is the converting to integer which may aggravate you:
The function int(x + dx) does not necessarily return the same value as int(x) + int(dx). If you are placing blocks around the origin (0,y,0), any -1 > dx < 1 will be truncated to 0. However, away from the origin, say x = 14, x – 0.5 will go to 13, where x + 0.5 will go to 14.
I assume your x and z will be integers already, so make dx and dz integers before you add them to x and z.
setBlock(x+int(dx), y, z+int(dz), 0)

hardware, restoration

The Hunt!

The CDC 6500 has been down since last Friday, so that will be a week in 3 hours. What have I been doing during that time? Let me tell you:

The first thing I noticed was that my PP memory test, called March, wasn’t working. The first real thing it does, after getting loaded into PP0, is copy itself to the next PP in line. In order to do that, it increments 3 instructions to point to the next channel from the one that got loaded in from the deadstart system. After it has self-modified its program properly, it runs those instructions to do the actual copy. The very first OAN instruction it tried to execute hung, this is not supposed to happen.

I spent 3 days looking at this problem before I started drawing timing diagrams of the channel address being selected by the various PPs. The PPs each have their own memory, but they all share the same execution hardware in chassis 1. This makes it a little hard to look at, as a PP is running 1uS cycles, the hardware is running 100nS cycles, and each PP gets a 100nS Slot to do his thing. As I was looking at PP0s slot time, and what channel he was trying to push some data to, it looked like it was getting done at the wrong time. When I plotted out 1uS of all the channel address bits, I finally noticed that PP0 was addressing channel 0, PP1 was addressing channel 1… and PP11 was addressing channel 11, and back to PP0.

The strange thing about that was that that is the way the system starts at deadstart time. Every PP sucks on the channel with his number. The deadstart panel lives off of the end of channel 0, PP0 sucks up everything the deadstart panel put on channel 0, stores it into memory, and when the panel disconnects, because he has run out of program to send, the PP starts executing the program.

Wait a minute here: the program was supposed to have incremented the 3 channel instructions, so they would be pointing to channel 1, why is PP0 still looking at channel 0? Rats: the channel hardware is doing fine, but the increment isn’t working! 3 days to prove something wasn’t the problem!

OK, so the increment isn’t working, what is it doing? I spent a while writing little bits of code to test various ways of incrementing a location of memory, and then Daiyu Hurst reminded me about a program she had generated for me that was a stand-alone version of the PP verification program that runs on the beginning of most deadstart tapes. OK, what does that do?

It hangs at location 6. It did that because it failed a ZJN (jump on zero) instruction. Why is that? The accumulator wasn’t zero. Hmm, instruction 1 was LDN 0, which loads the accumulator with 0! Why doesn’t that work? After another day, or so, I prove to myself that it actually does work, and 0 gets loaded into the accumulator at the end of instruction 1. Another thing that isn’t the problem!

What’s next? The next instruction is UJN 2, (unconditional jump 2 locations forward) which being at location 2, should jump to 4, which it does. It is not supposed to change the contents of the accumulator, but it does!

There are 2 inputs to the “A” adder, the A input is selected to be A, and the B input is zeros. All 12 of the inputs to the A side are zero. Wait: aren’t there 18 bit in the accumulator, what about those other 6 bits? Ah: bit 14 is a 1!

It will not sit still! I chase bit 14 for a while, and it starts working, but a different bit is failing now! I chased different bits around the loop for a while, put module K01 on the extender to look, and the test started passing! This worked for a while. I had the PPs test memory, and that worked, but if I had CP0 test memory, it didn’t like it. When I got back from lunch, it had gone back to failing my LDN 0 test. I put some secret sauce on the pins of module K01, and we are back to trying to run other diagnostics.

I remembered I was having trouble with the imaginary tape drives, to I tried booting from real tape, and I get to the part where it tests memory, and that fails. OK, we have some progress.

That was then, this is now, and we are back to failing to LDN 0. I found that bit 0 for the “B” input of the A adder was not correct. It seems that a via rivet was not conducting between the collector of Q30 and Q32 to the base of Q19 on my friend the QA module in K01. I resoldered all the via rivets, and the edge pins, just for good measure.

Central Memory still doesn’t work, but I can run some diagnostics again!

To paraphrase Sherlock Holmes: When you eliminate all the things the problem isn’t, you are left with what the problem is!

Bruce Sherry

hardware, restoration

Bendix G15 Germanium Diodes

In restoring the Bendix G-15 vacuum tube computer, I have uncovered a phenomena which is requiring us to replace 576 germanium diodes. These diodes appear to have lost their hermetic seal and the atmospheric contamination has caused their leakage current to rise to very high levels as they reach a normal operating ambient temperature of approx. 40 degrees C. Because these diodes are used in the clamp circuits that generate the 20 volt logic swing of the computer, the combined low impedance of the approx. 500 diodes ends up shorting out the -20 volt power supply after 5 to 10 minutes of power-on time.
We have replacement diodes on order, and this should resolve the power supply issue.

Interestingly though, the failed diodes exhibit another interesting phenomena which this engineer hasn’t seen before.  Hooking up a diode to an ohmmeter to measure its leakage current, and heating the diode to about 40 degrees C, causes the diode leakage, measured as resistance, to go from a few thousand ohms to a few tens of ohms.  If the ohmmeter remains connected and the diode is allowed to cool to normal ambient, the low resistance measurement persists.  If the ohmmeter is disconnected briefly and then reconnected, the diode leakage current returns to its nominal few thousand ohms.

restoration

DEC PDP10 Model 1095 Repair

A few months ago, our PDP10 Model 1095 ( pictured ) had just successfully booted the WAITS operating system and was running an early version of Ethernet.  One afternoon, the PDP11-40 front-end computer ( unit with chassis extended on left ) stopped working and I was tasked to find out what had happened and repair it.  What followed was almost three months of difficult troubleshooting and repair.

What had happened was, one of the peripheral devices ( a TC-11 DECTAPE Controller at the left end of the machine )  attached to the PDP11’s Unibus had had a power supply failure, causing the regulated 15 volt supply to rise to 28 volts.  These supplies have an over-voltage crowbar circuit which is designed to shutdown the supply by blowing a fuse if the power supply ever goes into an over-voltage condition.  This crowbar circuit failed and this resulted in a number of circuit boards in the PDP11 frying.

Once I replaced and/or repaired the failed circuit boards, I upgraded the TC-11 power supply to a modern switcher which doesn’t have the failure mode described above.

With the hardware sorted out ( this is a couple of weeks into troubleshooting ),  I set about trying to boot the WAITS operating system once again.  A further snag cropped up at this point.  WAITS wouldn’t fully start and would complain about a “pointer mismatch”.  This points to the DTE-20 10-11 interface, but no combination of replacement boards would succeed in bringing up WAITS ( except for a number of random times ).  The solution to this problem turned out to be an old bugaboo of the KL-10 processor.  A number of the control devices do not fully initialize at power-on as their reset lines do not go to all of the parts in a particular device.  We have seen this phenomena on the RH-20, but apparently the DTE-20 also has some hardware that doesn’t get initialized.  I determined this by running the -10 side diagnostic for the DTE-20, and then booting WAITS successfully.  This was after weeks of eliminating all other possibilities as myself and others were not aware that the DTE-20 had components that came up in an unknown state at power-on.

One side note: In upgrading the TC-11 power supply, it was found that the power controller that feed line voltage to it, had failed some time ago and been hacked to make it work without it’s contactor.  A new contactor was ordered and installed.

restoration

IBM 360/30 lost its memory?

Well, not quite.

It appears the memory can be read, but it also looks like it is not being restored/written.

See how the Yellow trace and the Blue trace ~almost~ are high together?  Well, IF they were both high together then the memory is supposed to write.

restoration

It Verks, it Verks!

It has been about 3 months, but we seem to have 128KW of memory on the CDC 6500 now.

From the picture you can see “CM = 303700”, which is just about 100KW free!

We have built 35 new Storage Modules, all of which work, and 32 of them are installed in the machine in place of Core Modules which were not happy. There are 10 more new Storage Modules in process, and the mostly assembled boards should be in next week. They will need their connector pins and pulse transformers installed. I will have to build more chassis sides and fronts for them, but that may take a while, as I still have 3 good modules, itching to go to work, sitting on my bench.

Something I find interesting in this photo, is the memory access patterns. It is hard to see in the picture, but bank 30 is at the top of chassis 11, and bank 34 is at the top of chassis 12. The left two LEDs in the new storage modules are the Read and Write indicators. The ones on chassis 12 are bright, and the ones in chassis 11 are off. The top rows are separated by 4 locations! The machine isn’t real busy, it just has two instances of a prime number program running, but still… Can’t see that with Core Modules.

Anyway, let’s see, both CPUs working: check! All of memory working: check! Real card reader working: check! Real tape drives working: oops, not at the moment. I guess I’m not done yet.

Bruce

restoration

The New Tennis For Two Oscilloscope

Our first floor ‘Tennis for Two’ exhibit has a new scope.  Not really new though because chronologically it is lots older than the scope we were using.  We had been using a Tektronix 465 scope which dates from no earlier than 1972.  I remember this oscilloscope model well as I had one exactly like it on my bench at a previous job as ‘my scope’.  The Tektronix 465 which had been displaying Tennis for Two was made at least 14 years after Tennis For Two had been invented.  Not an ideal situation for a display certainly, but a lot better than no oscilloscope at all which was our alternative.  It took a while to acquire and restore a period appropriate scope.

I wanted to use the same model oscilloscope which had displayed Tennis for Two when it made its debut at Brookhaven National Labs back in 1958 but which scope was it exactly?  All I had to go on was this photograph.

As VISITORS DAY EXHIBITS

The Tennis for Two oscilloscope is on the left side of the instrument stage.  This next photo shows exactly where.  This is actually two photographs photo-shopped together and the scope image detail portion was not photographed in 1958.

Enlarging a photo does not produce the detail shown above.  The original photo is black and white and the enlargement is in color.  Obviously two photos were combined.  Beyond the obvious in the original ‘Blade Runner‘ movie Agent Deckard used a device which allowed him to zoom in on a photograph without loss of picture quality no matter what the magnification.

In real life that is simply not possible.  In real life the ‘replicant’ from the Blade Runner movie would have lived another day safe from Dekard for this is what you get upon enlargement of our scope in the Tennis for Two photo.  Loss of detail.

The photo is too fuzzy to make out the white DuMont label just below the display screen in the photo center or a model number.  You can’t even tell where these labels are.  I had to look at the photos of countless old scopes to make a match.  It turned out the diamond pattern of the central knobs below the screen is very distinctive and only DuMont scopes of Dumont ‘304’ model type have that pattern.  Upon identifying the type of scope it was I was able to find one on E-Bay and restore it.  After physically acquiring the oscilloscope certain features in my fuzzy photograph such as the white DuMont Label made sense.  Without a real scope to compare it to these features would have remained a mystery.

Restoration involved replacing almost every capacitor in the scope which could have age related issues before I turned it on.  That is not all but most of them.  Minor troubleshooting then was able to get the scope to work.  The vertical Amplifier had a bad connection on a calibration switch which would not let the vertical amplifier signal pass.  As we don’t need to use the calibration switch I simply bypassed the connection.  If we were to ever calibrate this oscilloscope we would not use the internal calibration signal and would provide an external calibration signal anyway.

Here is the final result with the enlarged photograph I pasted on my office wall while I researched scope pictures.  The shapes on some of the knobs are different but that is OK.  DuMont mixed round knobs with pointed knobs of the ‘crows feet’ variety frequently.  The scope we are using now and which I show here is an enhanced ‘304’ type called a DuMont 9559 and has options for doing RF measurements.  I also acquired a plain Jane DuMont ‘304’ which has all crows feet type knobs.  Physically the front panels are identical otherwise.

What matters most is not that the knobs absolutely match but that we now display Tennis for Two on a period appropriate oscilloscope.  The scope I restored was in better overall condition between the two I had acquired and I suspect Brookhaven used top of the line models in their research.  We really can’t know what the exact model used was so the Dumont 9559 is an appropriate choice.

hardware, restoration

More Memory!

Here is what has been going on on the CDC 6500: More Memory!

I’m sure the computer purity police will come and take me away, but this is what I have been doing. We are now up to 22 new storage module replacements, 17 of which you can see here. There are 3 more in chassis’s 9 and 10, and PP0 and PP1 each have one in chassis 1. I have 3 more to finish assembly of, when the parts come in later today.

Of the twenty units I have used in Central Memory, they are all involved in getting the First Location of banks 20 to 37 working, and that is all that works so far. If I try to test the second location, the first location fails. I don’t think the other three boards will get me through the second location, so I will probably be going out for more modules later today, or maybe next week.

My poor little milling machine has been working its bearings to the bone making Storage Modules sides and fronts. Since it doesn’t have “rigid tapping” (read automatic tapping), I started doing that by hand, but eventually I figured out a way to have the milling machine supply the energy to turn the tap, while I manually told it which direction to turn it.

So far, all the modules built have had the surface mount assembly done outside the Museum, and we have installed the pins and transformers. I may see about trying to convince other folks to do the through hole assembly and the machining.

Things are improving, we have moved from 65536 locations of memory to 65552 locations that work. Unfortunately it doesn’t work well enough to let the machine run with it out there, I still have to completely disable the upper 64K in order to have the machine boot.

Bruce Sherry 20170630