restoration

Testing old technology

I have 4500 modules in the CDC 6500, and it isn’t always easy to debug them in the machine, because convincing the machine to wiggle its lines so I can check each transistor on a particular module is difficult.

In order to make this problem a little easier, I have built a cordwood module tester:

It has taken a couple of revisions to get the pin drivers correct, but it does useful things for me when I need it to. The three cables going out the back go to a Mesa Electronics 7i80 FPGA board, which I use for driving signals to the module under test. I have written some hardware for the FPGA, in VHDL, that knows how to talk to the tester board.

I have also written an assembler in Perl, to assemble stimulus commands I write, to the appropriate VHDL that gets included into the FPGA. Here is a sample of what I feed into the assembler:

# P14 = T70 negative pulse
# P17 = clock gate
# set the pins:
p1=h p2=h p3=h p4=h p5=h p6=h p7=h p8=h p9=h p10=h p11=h p12=h p13=h p14=h p15=h p16=h p17=h
p18=h p19=h p20=h p21=h p22=h p23=h p24=h p25=h p26=h p27=h p28=h
# scope sync pulse on unused pin
p1=h
p1=L
# run clocks for a bit...
P17=h
P14=L
P14=h
P14=h
P14=h P17=L
P14=L

Each line represents what it should do for the next 20 nano-seconds. This results in a table that gets glued together with some header and trailer bits to create the VHDL input file. Here is the output listing:

 # P14 = T70 negative pulse
 # P17 = clock gate
 # set the pins:
 0 0ns 0xffe0000 p1=0 p2=0 p3=0 p4=0 p5=0 p6=0 p7=0 p8=0 p9=0 p10=0 p11=0 p12=0 p13=0 p14=0 p15=0 p16=0 p17=0
 1 20ns 0x0000000 p18=0 p19=0 p20=0 p21=0 p22=0 p23=0 p24=0 p25=0 p26=0 p27=0 p28=0
 # scope sync pulse on unused pin
 2 40ns 0x0000000 p1=0
 3 60ns 0x0000001 p1=1
 # run clocks for a bit...
 4 80ns 0x0000001 P17=0
 5 100ns 0x0002001 P14=1
 6 120ns 0x0000001 P14=0
 7 140ns 0x0000001 P14=0
 8 160ns 0x0010001 P14=0 P17=1
 9 180ns 0x0012001 P14=1
 10 200ns 0x0010001 P14=0
 11 220ns 0x0010001 P14=0

The listing shows the memory location, what time that instruction should execute, the actual bits in hexadecimal notation, and what the source file said.

This handy little tool has enabled me to fix 4 out of 5 of the UA modules I had replaced in the CDC, and to know what was wrong with the other one.

hardware

New peripherals for old Computers

Five years ago, when we were getting done with restoring our PDP10-KI, we were running out of working disk drives to run it from. We were down to one set of replacement heads, two working drives, and we didn’t have a source for new ones. We found some folks that said they could rebuild the packs, but it turned out that they couldn’t re-write the servo surface, so if we lost that we were in trouble.

Alright, what else can be done? The Digital RP06, the drive of choice for the KI, has lots of registers available from the MASBUS. The MASBUS is kind of a UNIBUS, with a synchronous data channel for moving the actual data. We had been having difficulty keeping track of everything on an existing project, so I looked into doing things a different way.

My idea was to use an FPGA, (Field Programmable Gate Array) to emulate the behavior of the control unit inside the RP06. This is like the ease of writing software, but for hardware. No wires to change, no cuts, when I mess up the logic. The PC would be responsible for handling the actual data for the disk, or possibly tape.

I spent a while poking around the Internet looking for an FPGA card that would plug into a PC. There were a lot of expensive, and some less expensive evaluation boards out there. I eventually happened across http://mesanet.com/. I had heard of these folks before from my experiences with LinuxCNC, which runs my milling machine at home. These folks have been doing this for a long time, and since they have to deal with industrial environments and big motors, their products are very robust.

For the MASBUS Disk Emulator, (MDE), I chose the Mesa 5i22 card, which has 96 I/O’s I can play with, and a Spartan3 Xilinx FPGA. The 5i22 doesn’t remember what the Xilinx configuration is, so the PC pours in the correct bits each time.

Bob Armstrong, down in the Silicon Valley, wrote all the software for the PC, and we eventually emulated RP06’s, RP07’s which hold twice the data, and TU77 tape drives. Here is a picture of our main collection of MDE’s.

There are 3 Industrial PCs, and 8 MASBUS Cable Driver/Receiver boxes. These are running both PDP10-KL’s, and the PDP10-KS’s. There is a real TU78 tape drive, to the left of the MDE rack. The PDP10-KI, and the PDP11/70 each have their own located elsewhere in the museum.

I also used the 5i22 for all of the emulations that we needed for the CDC6500:

 

Here we have one Industrial PC, and 6 6000 series channel attach Driver/Receivers, along with one 3000 series channel Driver/Receiver. We emulate the dead start panel, the DD60 Display, Tape drives, disk drives, printers, card readers, card punches, the serial terminal interface and the 6681 channel converter so we could talk to the real 405 card reader. Bob Armstrong also wrote the PC code for all these emulations.

Jeff Kaylin has also used 5i22 cards on the Sigma 9, with Bob doing the software, and Craig Arno and Glen Hermannsfeldt used one to emulate the card reader and punch on the IBM 360/20.

All is not sweetness and light however, after making the 5i22 for over 10 years, the parts are getting hard to obtain, so Mesa Electronics has stopped production of that board. We ordered all their remaining stock of the 5i22s. They no longer make the 5i22, but they make lots of other similar boards, so we ordered some 7i61’s and 7i80’s to play with.

The 7i61 uses USB to talk to a PC, and has 96 I/O’s to play with. The 7i80 uses Ethernet to talk to the PC, but only has 72 I/O’s. To conserve 5i22’s, I converted my CDC 6681 6000 to 3000 channel converter to the 7i61 because it needs all 4 cables for 96 I/Os. I used my code, along with open source code from Peter Wallace, of Mesa Electronics, to load my code into the serial flash chip on the 7i61, so the PC is no longer involved in the 6681 emulation. After turning on the power to the Mesa card, it knows how to be a 6681 automagicly! I no longer have to remember to type the proper incantation into the PC to get it loaded up, this is a GOOD thing!

After getting the CDC 6500 working, I had several broken modules that I wanted to fix, so I built a Module tester, using the 7i80 card:

You can see the 7i80-HD card under the cables down from the test card.

It took me a while to collect the appropriate bits of Peter Wallace’s code, so that I could have the Ethernet interface live in with my test code, all at the same time, but perseverance pays off, and it all works now. I have fixed 4 out of 5 broken UA modules, and I know why I am not going to fix the other UA, and the ED module that I replaced in CP1.

I have to confess: the module plugged into the tester, is not really from the 6500, but it is the same form factor, and technology.

I love these Mesa Electronics “Anything I/O” cards! As their name says, I can teach them to be most Anything!

Bruce Sherry 20180418.

hardware

That Pesky PS Module!

When we last left our hero, he had re-soldered all the Via rivets on one of the 510 “PS”, core memory sense amplifier modules in the CDC6500, and the machine was working.

That lasted about a day, and the memory went away again. What was wrong this time? You guessed it, bit 56 in bank 36 was bad again. Third time is the charm: I am going to replace this module! I head off looking for a spare PS module. Where did we put all those spare parts we got with the machine? Oh wait, we didn’t get any spares with the machine. Bummer!

This is where I get to practice my “MAD Skillz”, and make some spare PS modules. What does a PS module look like? What does CDC give me?

On the left side we have an actual schematic of one of the 4 amplifiers on the module, YAY! Having been around this block before, I take apart the offending module and check to see if it matches. Wiring wise, yes it matches, but the values have been changed to protect the innocent.

After a while playing with the newest version of Eagle, I ended up with this:

The easy part is done, now the fun starts! The circuitry for the module is split between two printed circuit boards, one that connects to the odd pins on the connector, and has odd numbered transistors, and one for the even bits. Eagle really doesn’t understand this, so I have to fool it. First I put test points on each side of all components that go between the boards. I have to then add in the wire jumpers that also go between the boards, and I end up with another schematic:

I take this schematic and duplicate it into odd and even sides, then I write an Eagle script to delete all the between boards components, and all the test points that belong on the other board. Here is what the schematic for the odd board looks like, pretty terrible:

Now I have two schematics: PS_O and PS_E, and I do the PC layout thing. I have the original boards to use as an example, which I follow very closely so that timing and signal integrity will be as close as I can to the original module. Here are pictures of the odd and even layouts:

But wait, I’m not done yet! Remember Eagle doesn’t understand the whole module. I now have to verify that the two boards, together with all the components that go between, match that original schematic I started with.

I go over every line on the board pairs, and the schematic highlighting them as I go, until EVERYTHING is highlighted.

Done yet? Grumble, grumble: No! I have forgotten to identify which end of the diodes and polarized capacitors have the band on them! If I want them assembled properly, I guess I should do that before they go out to FAB!

Back to the PC mines…

Bruce Sherry 20180201 10:57AM

hardware, restoration

Chasing the Pesky Ratio!

It seems like I did something really silly! I had to come up with some goals for 2018. I hate this time of year, I think everybody does. OK, what can I put down that is measurable and achievable? How about keeping the CDC6500 running more than 50% of the time? That might work. Oops, did I hit the send button?

“Hey, Daiyu: How do I tell what users have been on the machine?” Daiyu Hurst is my systems programmer, who lives Back East somewhere. If it is on the other side of Montana from Seattle, it is just “Back East” to me. She lives in one of those “I” states, Indiana, or Illinois, not Idaho, I know where that is. After a short pause, she found the appropriate incantations for me to utter, and we have a list of who was on the machine, and when they logged in. I had to use Perl to filter out those lines, but that was pretty easy.

What is all this other gobble-de-gook in this file:

 03.14.06.AAAI005T. UCCO, 4.096KCHS. 
 03.16.09.AAAI005T. UCCO, 4.096KCHS. 
 03.18.11.AAAI005T. UCCO, 4.096KCHS. 
 03.20.12.AAAI005T. UCCO, 4.096KCHSUCCO, 4.096KCHS. 
 05.00.30.SYSTEM S. ARSY, 1, 98/01/24.
 05.00.30.SYSTEM S. ADPM, 11, NSD.
 05.00.30.SYSTEM S. ADDR, 01, LCM, 40.
 05.00.44.SYSTEM S. SDCI, 46.603SECS.:
 05.00.44.SYSTEM S. SDCA, 0.032SECS.:
 07.32.30.SYSTEM S. ARSY, 1, 98/01/24.
 07.32.30.SYSTEM S. ADPM, 11, NSD.
 07.32.30.SYSTEM S. ADDR, 01, LCM, 40.
 07.33.07.AAAI005T. ABUN, BRUCE, LCM.:
 07.33.37.SYSTEM S. SDCI, 116.108SECS.:
 07.33.37.SYSTEM S. SDCA, 0.078SECS.:
 07.33.37.SYSTEM S. SDCM, 0.005KUNS.:
 07.33.37.SYSTEM S. SDMR, 0.004KUNS.:

The line with “ARSY” in it is when I booted the machine, at 5:00 this morning, from home. It crashed before I got in, and I booted it again at 7:32. Then we get to 7:33:07, and the “ABUN” line, where I login from telnet.

From the first few lines we can see that the machine appeared to still be running and putting things in its accounting log at 3:20, but it crashed before it could print a message about 3:22.

OK from this, I can mutter a few incantations at PERL, and come up with something like:

1054 Booted on 98/01/23 @ 07.39.30
 Previous uptime: 0 days 5 hours 59 minutes
 Down time: 0 days 17 hours 28 minutes
 1065 Booted on 98/01/23 @ 13.38.30
 Previous uptime: 0 days 1 hours 23 minutes
 Down time: 0 days 4 hours 35 minutes
 1068 Booted on 98/01/23 @ 14.12.30
 Previous uptime: 0 days 0 hours 0 minutes
 Down time: 0 days 0 hours 33 minutes
 1392 New Date:98/01/24
 1498 Booted on 98/01/24 @ 05.00.30
 Previous uptime: 0 days 13 hours 7 minutes
 Down time: 0 days 1 hours 40 minutes
 1503 Booted on 98/01/24 @ 07.32.30
 Previous uptime: 0 days 0 hours 0 minutes
 Down time: 0 days 2 hours 31 minutes

Last uptime: 0 days 0 hours 1 minutes

Total uptime: 2 days, 1 hours 37 minutes in: 0 months 7 days 0 hours 14 minutes
Booted 15 times, upratio = 0.29

Here is where the hunt for the Pesky Ratio comes in: See that last line? In the last week, the CDC has been running 29% of the time. That isn’t even close to 50%. I KNOW the 6000 series were not the most reliable machines of their time, but really: 29%?

What has been going on? A week ago, I was having trouble keeping the machine going for more than a couple of minutes. Finally, it occurred to me I might see how the memory was doing, and it wasn’t doing well. It took me a while to find why bit 56 in bank 36 was bad. I had to explore the complete wrong end of the word for a while, before I realized that end worked, and I should have been looking at the other end. I chased it down to Sense Amplifier (PS) module 12M40. When I put it on the extender, the signal would come and go, as I probed different places. I noticed that I had re-soldered a couple of via rivets before, so I re-soldered ALL the via rivets on the module.

What do I mean “via rivets”? In those days, either one of two things were true: either they didn’t have plated through holes in printed circuit boards, or they were too expensive. None of the CDC 6500 modules I have looked at have plated through holes. Most of the modules do have traces on both sides of the two PCBs that the module is made with. How did they get a signal from one side to the other? They put in a tiny brass rivet! Near as I can tell, all the soldering was done from the outside of the module, and most of the time the solder would flow to the top of the rivet somehow. Since I have found many of these rivets not conducting, I have to assume that the process wasn’t perfect.

After soldering all the rivets on this module, I put it back in the machine, and we were off and running. Monday, I booted the machine at 8:11, and it ran till 2:11. When I got in yesterday, the machine wouldn’t boot. Testing memory again found bit 56 in bank 36 bad again! I put module 12M40 on the extender, and the signal wasn’t there. I poked a spot with the scope, and it was there. I poked, prodded, squeezed, twisted and tweaked, and I couldn’t get it to fail.

This is three times for This Module! I like to keep the old modules if I can, but my Pesky Ratio is suffering here! I took the machine back down, and brought it back up with only 64K of memory, and pulled out the offending module:

There are 510 of these PS modules in the machine, three for each of the 170 storage modules, or about 10% of all the modules in the machine. Having a spare would be nice. My next task will be to make about 10 new PS modules.

In the time I have been writing this post, the display on the CDC has gone wonky again. This appears to happen when the Perpheral Processors (PP’s) forget how to skip on zero for a while. Once this happens, I can’t talk coherently to channel 10 or PP11. I have a few little tests that copy themselves to all the PP’s, and they will all work, except the last one: PP11.

I have yet to write a diagnostic that can catch the PP’s making the mistake that I can see on the logic analyzer once a day or so. Right now the solution seems to be to wait a while, and the problem will go away again. This is another reason while the Pesky Ratio is so difficult to hunt, but I fix what I can, when I can.

Onward: One bug at a time!

 

 

hardware, restoration

The Hunt!

The CDC 6500 has been down since last Friday, so that will be a week in 3 hours. What have I been doing during that time? Let me tell you:

The first thing I noticed was that my PP memory test, called March, wasn’t working. The first real thing it does, after getting loaded into PP0, is copy itself to the next PP in line. In order to do that, it increments 3 instructions to point to the next channel from the one that got loaded in from the deadstart system. After it has self-modified its program properly, it runs those instructions to do the actual copy. The very first OAN instruction it tried to execute hung, this is not supposed to happen.

I spent 3 days looking at this problem before I started drawing timing diagrams of the channel address being selected by the various PPs. The PPs each have their own memory, but they all share the same execution hardware in chassis 1. This makes it a little hard to look at, as a PP is running 1uS cycles, the hardware is running 100nS cycles, and each PP gets a 100nS Slot to do his thing. As I was looking at PP0s slot time, and what channel he was trying to push some data to, it looked like it was getting done at the wrong time. When I plotted out 1uS of all the channel address bits, I finally noticed that PP0 was addressing channel 0, PP1 was addressing channel 1… and PP11 was addressing channel 11, and back to PP0.

The strange thing about that was that that is the way the system starts at deadstart time. Every PP sucks on the channel with his number. The deadstart panel lives off of the end of channel 0, PP0 sucks up everything the deadstart panel put on channel 0, stores it into memory, and when the panel disconnects, because he has run out of program to send, the PP starts executing the program.

Wait a minute here: the program was supposed to have incremented the 3 channel instructions, so they would be pointing to channel 1, why is PP0 still looking at channel 0? Rats: the channel hardware is doing fine, but the increment isn’t working! 3 days to prove something wasn’t the problem!

OK, so the increment isn’t working, what is it doing? I spent a while writing little bits of code to test various ways of incrementing a location of memory, and then Daiyu Hurst reminded me about a program she had generated for me that was a stand-alone version of the PP verification program that runs on the beginning of most deadstart tapes. OK, what does that do?

It hangs at location 6. It did that because it failed a ZJN (jump on zero) instruction. Why is that? The accumulator wasn’t zero. Hmm, instruction 1 was LDN 0, which loads the accumulator with 0! Why doesn’t that work? After another day, or so, I prove to myself that it actually does work, and 0 gets loaded into the accumulator at the end of instruction 1. Another thing that isn’t the problem!

What’s next? The next instruction is UJN 2, (unconditional jump 2 locations forward) which being at location 2, should jump to 4, which it does. It is not supposed to change the contents of the accumulator, but it does!

There are 2 inputs to the “A” adder, the A input is selected to be A, and the B input is zeros. All 12 of the inputs to the A side are zero. Wait: aren’t there 18 bit in the accumulator, what about those other 6 bits? Ah: bit 14 is a 1!

It will not sit still! I chase bit 14 for a while, and it starts working, but a different bit is failing now! I chased different bits around the loop for a while, put module K01 on the extender to look, and the test started passing! This worked for a while. I had the PPs test memory, and that worked, but if I had CP0 test memory, it didn’t like it. When I got back from lunch, it had gone back to failing my LDN 0 test. I put some secret sauce on the pins of module K01, and we are back to trying to run other diagnostics.

I remembered I was having trouble with the imaginary tape drives, to I tried booting from real tape, and I get to the part where it tests memory, and that fails. OK, we have some progress.

That was then, this is now, and we are back to failing to LDN 0. I found that bit 0 for the “B” input of the A adder was not correct. It seems that a via rivet was not conducting between the collector of Q30 and Q32 to the base of Q19 on my friend the QA module in K01. I resoldered all the via rivets, and the edge pins, just for good measure.

Central Memory still doesn’t work, but I can run some diagnostics again!

To paraphrase Sherlock Holmes: When you eliminate all the things the problem isn’t, you are left with what the problem is!

Bruce Sherry

restoration

It Verks, it Verks!

It has been about 3 months, but we seem to have 128KW of memory on the CDC 6500 now.

From the picture you can see “CM = 303700”, which is just about 100KW free!

We have built 35 new Storage Modules, all of which work, and 32 of them are installed in the machine in place of Core Modules which were not happy. There are 10 more new Storage Modules in process, and the mostly assembled boards should be in next week. They will need their connector pins and pulse transformers installed. I will have to build more chassis sides and fronts for them, but that may take a while, as I still have 3 good modules, itching to go to work, sitting on my bench.

Something I find interesting in this photo, is the memory access patterns. It is hard to see in the picture, but bank 30 is at the top of chassis 11, and bank 34 is at the top of chassis 12. The left two LEDs in the new storage modules are the Read and Write indicators. The ones on chassis 12 are bright, and the ones in chassis 11 are off. The top rows are separated by 4 locations! The machine isn’t real busy, it just has two instances of a prime number program running, but still… Can’t see that with Core Modules.

Anyway, let’s see, both CPUs working: check! All of memory working: check! Real card reader working: check! Real tape drives working: oops, not at the moment. I guess I’m not done yet.

Bruce

hardware, restoration

More Memory!

Here is what has been going on on the CDC 6500: More Memory!

I’m sure the computer purity police will come and take me away, but this is what I have been doing. We are now up to 22 new storage module replacements, 17 of which you can see here. There are 3 more in chassis’s 9 and 10, and PP0 and PP1 each have one in chassis 1. I have 3 more to finish assembly of, when the parts come in later today.

Of the twenty units I have used in Central Memory, they are all involved in getting the First Location of banks 20 to 37 working, and that is all that works so far. If I try to test the second location, the first location fails. I don’t think the other three boards will get me through the second location, so I will probably be going out for more modules later today, or maybe next week.

My poor little milling machine has been working its bearings to the bone making Storage Modules sides and fronts. Since it doesn’t have “rigid tapping” (read automatic tapping), I started doing that by hand, but eventually I figured out a way to have the milling machine supply the energy to turn the tap, while I manually told it which direction to turn it.

So far, all the modules built have had the surface mount assembly done outside the Museum, and we have installed the pins and transformers. I may see about trying to convince other folks to do the through hole assembly and the machining.

Things are improving, we have moved from 65536 locations of memory to 65552 locations that work. Unfortunately it doesn’t work well enough to let the machine run with it out there, I still have to completely disable the upper 64K in order to have the machine boot.

Bruce Sherry 20170630

hardware, restoration

I knew this would happen

I have been worried that I would have to re-manufacture a module because I just couldn’t fix it. The time has come! I found this PZ module in chassis 10, while chasing memory problems. This one causes all 6 bits that go through it to fail my tests. I have traced it down to what appears to be an open base on Q26, which I have circled in the picture. OK, what’s the problem? Q26, as you can see is in the middle of the card. When I have had to replace middle transistors in the past, the front bracket has been held on with 6 little 2-56 screws. You don’t see any screws on this module other than the ones that hold it in place, do you?

This module coming from chassis 10, means it is newer, since Purdue upgraded from 64K to 128K several years after they got the machine. Instead of screws, this module is riveted together, and the rivets are soldered, as are most of the modules in Bay 3 which holds chassis’ 9-12.

Awkward.

Bruce Sherry

Update: I managed to get it far enough apart to change the offending transistor, and failed to break it completely, so now it works, without the 3 weeks of my time to re-manufacture it, not to mention the 3-4 weeks for the manufacturing process to actually produce the replacement.

BS

restoration

Little Card Reader

Here at the LCM+L Skunkworks, some of the folks have been working on a Little Punched Card reader, so Visitors can see what they punched in their punched cards. Jeff and Hunter have been working on the electronics, while I have been building mechanical parts for them. Here is a picture of where it is today:

You can see it is made from clear plastic, so you can see EVERYTHING. This being a prototype, it has a few warts: like those extra holes in the sides. I had never actually used the boring head with my milling machine, so I had to figure out how to use it. Boring heads are used to make round holes, usually larger than you can get with drills. Yesterday, I made some new side plates, sans extra holes.

I started by drilling the holes for the 6-32 screws that hold the thing together:

Once I had the holes in, I had to put in the bearing holes, this is where the boring head comes in. I have drills up to 0.500″, and a 0.750″ one, but the bearings are 16mm in diameter, or about 0.630″. I start to drilling the bearing holes to half inch:

Then I have to bore out the holes to 0.630″ with the boring head:

The gray part moves to adjust the size of the hole, here is a picture of the business end, where I can adjust the size of the hole it makes by 0.001″:

After I added the holes to hold the motor mount, and a bunch of filing edges, here are the two sides, sitting on Jeff’s desk:

I mentioned that I was doing this on my milling machine. If you don’t know about milling machines, they are basically like a very rigid drill press, with a movable table to hold the part you are working on. I can move the table around very precisely to get the holes Exactly where I want them. You can also work on the sides of the parts with special cutters, which is how I got the plates to the basic shape. Here is a link to what my milling machine started out life as: http://www.grizzly.com/products/Mill-Drill/G0463. It was smallest machine that I felt could be called a real milling machine.

restoration

CDC 6500 Memory, Again.

Going out for 20 more Storage Module replacement boards today. All 5 of the first run work with one timing change, and different values for 12 resistors.

Over the last week or so, I have fixed all the bad PS and PZ modules that I had previously identified. The machine has been running with all those replacement modules, as the first 5 PP’s for more than a week now.

It’ll take a week to fab the boards, and a week or two to do the surface mount assembly, then David Cameron and I will install the connector pins, and pulse transformers. Then I will be off to fix the other half of the memory to bring the machine up to his full 128K words.