The CDC 6500 has been down since last Friday, so that will be a week in 3 hours. What have I been doing during that time? Let me tell you:
The first thing I noticed was that my PP memory test, called March, wasn’t working. The first real thing it does, after getting loaded into PP0, is copy itself to the next PP in line. In order to do that, it increments 3 instructions to point to the next channel from the one that got loaded in from the deadstart system. After it has self-modified its program properly, it runs those instructions to do the actual copy. The very first OAN instruction it tried to execute hung, this is not supposed to happen.
I spent 3 days looking at this problem before I started drawing timing diagrams of the channel address being selected by the various PPs. The PPs each have their own memory, but they all share the same execution hardware in chassis 1. This makes it a little hard to look at, as a PP is running 1uS cycles, the hardware is running 100nS cycles, and each PP gets a 100nS Slot to do his thing. As I was looking at PP0s slot time, and what channel he was trying to push some data to, it looked like it was getting done at the wrong time. When I plotted out 1uS of all the channel address bits, I finally noticed that PP0 was addressing channel 0, PP1 was addressing channel 1… and PP11 was addressing channel 11, and back to PP0.
The strange thing about that was that that is the way the system starts at deadstart time. Every PP sucks on the channel with his number. The deadstart panel lives off of the end of channel 0, PP0 sucks up everything the deadstart panel put on channel 0, stores it into memory, and when the panel disconnects, because he has run out of program to send, the PP starts executing the program.
Wait a minute here: the program was supposed to have incremented the 3 channel instructions, so they would be pointing to channel 1, why is PP0 still looking at channel 0? Rats: the channel hardware is doing fine, but the increment isn’t working! 3 days to prove something wasn’t the problem!
OK, so the increment isn’t working, what is it doing? I spent a while writing little bits of code to test various ways of incrementing a location of memory, and then Daiyu Hurst reminded me about a program she had generated for me that was a stand-alone version of the PP verification program that runs on the beginning of most deadstart tapes. OK, what does that do?
It hangs at location 6. It did that because it failed a ZJN (jump on zero) instruction. Why is that? The accumulator wasn’t zero. Hmm, instruction 1 was LDN 0, which loads the accumulator with 0! Why doesn’t that work? After another day, or so, I prove to myself that it actually does work, and 0 gets loaded into the accumulator at the end of instruction 1. Another thing that isn’t the problem!
What’s next? The next instruction is UJN 2, (unconditional jump 2 locations forward) which being at location 2, should jump to 4, which it does. It is not supposed to change the contents of the accumulator, but it does!
There are 2 inputs to the “A” adder, the A input is selected to be A, and the B input is zeros. All 12 of the inputs to the A side are zero. Wait: aren’t there 18 bit in the accumulator, what about those other 6 bits? Ah: bit 14 is a 1!
It will not sit still! I chase bit 14 for a while, and it starts working, but a different bit is failing now! I chased different bits around the loop for a while, put module K01 on the extender to look, and the test started passing! This worked for a while. I had the PPs test memory, and that worked, but if I had CP0 test memory, it didn’t like it. When I got back from lunch, it had gone back to failing my LDN 0 test. I put some secret sauce on the pins of module K01, and we are back to trying to run other diagnostics.
I remembered I was having trouble with the imaginary tape drives, to I tried booting from real tape, and I get to the part where it tests memory, and that fails. OK, we have some progress.
That was then, this is now, and we are back to failing to LDN 0. I found that bit 0 for the “B” input of the A adder was not correct. It seems that a via rivet was not conducting between the collector of Q30 and Q32 to the base of Q19 on my friend the QA module in K01. I resoldered all the via rivets, and the edge pins, just for good measure.
Central Memory still doesn’t work, but I can run some diagnostics again!
To paraphrase Sherlock Holmes: When you eliminate all the things the problem isn’t, you are left with what the problem is!