So Jeff said he wanted a haunted castle put in the North West corner of the Hallowe’en world we were building in Minecraft. If you missed the Minecraft 11th birthday world, there are still bits of it visible at mc.livingcomputers.org as of June 4, 2020.
Ok, a castle… Hmm, how about a 4 level castle about 40 blocks in diameter? A basement, three floors, and a roof? I see builds with circles in Minecraft, how do I do that? I couldn’t seem to convince a Minecraft function to do much math. Perl can do math, how about I write a Perl script to do the calculations? I started with something like this:
Why would I go through all this trouble? When we run a public Minecraft server, we sometimes take a lot of grief. With a Minecraft function, an OP can just type: /function minecraft:castle.mcfunction and Poof, whatever has happened has gone away, without shutting down the server, or kicking players off!
It all started so innocently! The LCM+L Employee Engagement Committee wanted something for employees who had gone beyond the call of duty, to be able to wear that would get people to ask why only they they had this thing. We thought maybe it might blink to catch folks attention. Sure, I can do that, no problem!
When I asked my boss: Stephen Jones about it, he said “Sure, but it has to be a fully programmable 9 bit computer with a complete front panel!”. Oh, he says, it should be able to make sounds too. Before it was “falling off a log”, for me, but now the log is 8 feet in the air, one must fall off carefully, and with forethought!
I made something the size of a business card, and it had a few problems: 1) I had messed up the multiplexing, and whenever you pushed on one of the data bit buttons both the data LED, and the address LED above it came on, whether I wanted them to or not. 2) The bit ordering on the LEDs and switches was backwards, requiring bit fiddling in the software. 3) It was almost all surface mount. This shouldn’t be a problem, right? Somewhere in here, someone asked, could we sell it as a kit? Could we teach a soldering class with it? A lot of the parts on this one were 0805 SMT parts, that means they are 0.08″ x 0.05″. That is the outline of the part, which is pretty small! Working with them requires tweezers and, at least for me, magnification.
The board grew a little bit to 2.5″ x 4.25″, and the LED’s and buttons went to through hole, to be big enough for beginners to solder. That is what the photo above shows, but I hadn’t mounted the speaker in that upper right circle when I took the picture. I did fix the multiplexing, and bit ordering problems in the process.
Some about the Bit Token: The processor on the board is the ATMEGA328, that is the heart of the Arduino Uno. I was a little short on pins, and brain cycles to use the ATMEGA32U2 or a CP2102 usb to serial chip to make it actually Arduino compatible. One can get a programmer widget for under $10 on Amazon.
The first thing I had to do, starting with the original board, was test the hardware. I started by writing code to blink the lights. Remember blinking the lights? That used to be the target! Pressing the “run” button while holding one or more of the data buttons will cause it to blink its lights in 10 different patterns. Pressing the “run” button again will stop that pattern, so you can try another.
Any computer needs to communicate, with more than just the lights, so the Bit Token has a TTL serial port. There is yet another widget to hook USB to a TTL serial port, and holding a couple of data buttons and poking the “run” button will cause it to send “Hello World!” to your terminal.
Now is where the log leaps off the ground, new target: A fully programmable 9 bit processor, with a complete working front panel! Everything till now has been playing with the ATMEGA, now I have to teach the ATMEGA to pretend to be a 9 bit computer. All the software I have written has been in “C”, using AtmelStudio 7, available from the Microchip website. ( Microchip bought Atmel a few years back.)
What kind of a processor do I create? I could do a 9 bit PDP-8, but HP already did a 16 bit PDP-8 called the HP-1000 series. I poked around a bit, and eventually decided to roll my own, which I call “NineBitter”, kind of a mish-mash of the German word for No, and the taste. Anyway a bunch of coding and debugging ensued, but eventually I have a computer with:
512 9 bit words of memory
3 register instructions
4 I/O ports
I guess I need some programs to demonstrate that it works. I did some programming in binary, but that got pretty tedious fast, so I implemented an assembler in PERL. It has warts, but it mostly works. Its weakest point is error reporting. The first serious program I wrote with the assembler is the boot loader, and I found a way for it to be there in RAM whenever you power up or reset the board! To test it, I had it load another copy of itself, and used the front panel to verify that it was correct.
The next program I wrote was a game: Kill the Bit! That was one of the earliest games folks played on the Altair 8800. The idea is there is a bit rotating through the data lights, and your goal is to press the button below the bit to stop it. If you miss, you get another bit lit, chasing the first. Here is the program:
; Kill The Bit:; Author: Bruce Sherry; Date: 3/31/2020 2:41:19 PM;; Stomp the bit with the buttons as it rotates around!;*****************************************************
start: org 0120
xor R0,R0,R0 ; clear register R0
addi R0,R1 ; make a Bit
loop: out R0,leds ; display the results
addi R2,19 ; load a delay count
deloop: subi R2,1 ; wait a while
in R1,switches ; Read the switches into register 1
xor R0,R0,R1 ; flip the bits
rl R0 ; rotate the bit
jnz loop ; go back for more
halt ; you win!
A theme we have at LCM+L is “Hello World!” which comes from the “C Programming Language” manual by Kernighan and Ritchie. The first program in the book merely types “Hello World!” when you run it, so that program had to be written too:
; Hello World!:; Author: Bruce Sherry; Date: 4/21/2020 10:59:24 AM;; Say hi to the world on the other side of the serial port.;***********************************************************
NL: equ 012
CR: equ 015
LF: equ 012
start: org 0120
xor R0,R0,R0 ; clear out the reg
addi R3,message ; get the start of the message
loadn R1,R3,0 ; use reg 1 as temp
addi R3,1 ; step to the next location
or R2,R1,R1 ; set the flags, and save it in 2
out R1,serial ; send the character
sub R2,R2,R2 ; clear 2
out R2,serial ; send a carriage return
out R1,serial ; and the line feed
message: string "Hello World!\n"
Most people (at least the ones that have seen “2001, A Space Odessy”) know that the first song a computer learns is “Daisy, Daisy”, so I taught NineBitter to play that.
You are probably wondering where the title of this blog entry came from. Another of the targets for this project to hit came from the idea that we could teach some computer science with this little thing. Besides having the ATMEGA and NineBitter, we needed Another architecture to play with.
A friend of mine, Bryan, sent me a link to an article called “Surprisingly Turing-Complete”, which sent me down a very deep rabbit hole, where I found a few pages about something called an OISC, or a One Instruction Set Computer. A computer with ONLY ONE instruction. It seems to be possible to make a computer without any instructions, but we will leave that as an excersise for the reader.
On one of the OISC pages I found http://mazonka.com/subleq/ which is about SUBLEQ, which stands for SUBtract and Branch if Less than or EQual to Zero. There is a Challenge! A SUBLEQ instruction uses 3 memory locations: <address1> <address2> <jump target>. The contents of the location pointed to by <address1>, is subtracted from the location pointed to by <address2>. The result is stored in <address2>, then if the result is less than or equal to zero the next instruction is taken from the <jump target> otherwise it is just the next location.
Here is the first program the papers describe:
0: 3 4 6
3: 7 7 7
6: 3 4 0
The number on the left, with the colon, is the address in memory, then we get the contents of 3 locations. The first instruction takes the contents of location 3, which is a 7, subtracts it from the contents of location 4, which is also a 7, leaving 0, which gets stored in location 4. Since the result is zero, it jumps to location 6.
Location 6 does the same thing, but instead jumps back to location 0, but this result is -7. Location 0 does it again with the result being -14, then location 6 changes it to -21. As long as the result is negative, everything is fine.
Most folks seem to implement a 32 bit version of SUBLEQ, so they don’t run into what happens when numbers change sign. In order to get it to run properly on my SUBLEQ processor, called “NinetyOne”, I had to change it to:
In SUBLEQ assembly:
# Loop Minus 7 Fixed
# had to add the two z z instructions to handle positive numbers
begin: x y again
z z again
x:7 y:7 z:0
again: x y begin
z z begin
Which assembles to:
0: 6 7 9
3: 8 8 9
6: 7 7 0
9: 6 7 0
12: 8 8 0
I will let you peruse the various InterNet pages about SUBLEQ to understand what all that gobbledegook means. Another good page to look at is: https://esolangs.org/wiki/Subleq
Now the HARD part: I want to write KillTheBit, HelloWorld, and DaisyDaisy in SUBLEQ. For KillTheBit, I need an XOR operation, and getting boolean operations out of only Subtract is hurting my head! I need the I/O ports too, and I probably don’t have the emulation correct for that. I guess I just have to put my nose to the grindstone, my shoulder to the wheel, my eye on the ball, and try to figure this out in that position.
Let us know if you think owning your own Bit Token computer would be fun. It is looking like the kit might cost around $54.95.
If you have any suggestions or solutions, please let me know,
Back in the days when I commuted to Intel in Santa Clara, California, form Livermore, on the back of a dinosaur, I heard about a group of fellow geeks that met at The Standford Linear Accelerator Center, that called themselves “The Homebrew Computer Club”.
At the time, I was trying to build my own computer out of an Intel 8008 that had been donated to me by the Intel Product Engineering group that handled the 8008. If I recall, the i8008 needed high voltage clocks, and hadn’t learned much about straight transistor design then, so it was taking some head scratching.
I remember talking to the other technicians about the Altair when it came out, and noting that if you wanted a chip from Intel, it was $400, but you could get the Altair kit for $395.
Back to Homebrew: They met in an Auditorium on top of a hill at S.L.A.C. S.L.A.C. is kind of an interesting place, their main claim to fame is this 1 mile long Linear Accelerator that runs under the Interstate 280 Freeway! In the auditorium, we would all take seats, and Lee Felsenstein would moderate the discussion. Gordon French, the librarian, would stand and mention whatever new thing had appeared in the library. Other folks would talk about anything related to computers that they wanted.
Getting off of work at 4, hanging around the south bay for 2 hours +, then going to Homebrew, spending an hour driving home after the meeting made for a very long day, but I did it from time to time. I was young and foolish then. Later in my time at Intel, I got off at 5PM, which meant less wasted time, but I got to work at 5AM, so those were REALLY long days.
I don’t remember when I discovered the Santa Clara Byte Shop. I remember seeing an Apple I sitting on a table in the corner of the store. A computer on a single board! It had a 6502 in it so I kind of looked down my nose at it, being an Intel guy and all. I had my Imsai 8080 by then.
I remember going to a Homebrew meeting one time, and seeing the prototype Apple ][ there, with the prototype case. They hadn’t had time to get it painted yet. All in one box, and it would do color graphics to boot! Steve Wozniak did AMAZING things with almost no hardware! It boggled the mind. But it still had a 6502 in it.
If you look at me, Woz and Jobs, I guess you can tell who wins. Well, Jobs is dead, but he did go out in a blaze of glory.
Hi, my name is Bruce, and I am a Computer Nerd! I have been a computer nerd for over 50 years, so I will probably always be a computer nerd.
In 1969, as a Senior at Lincoln High School, in Seattle Washington I accepted the challenge of taking a class in Fortran IV. I was also taking what was then called “typing” so I got to use a brand new IBM 029 card punch machine to punch my own programs to submit to the Seattle Public Schools IBM 360/40 in downtown Seattle. I had a great time figuring out how to teach the IBM to do what I wanted, as opposed to what I told it.
I had been one of those typical “under-achievers” in high school, so the big schools weren’t chasing after me, but I got a lot of marketing stuff from smaller schools. I went and interviewed a Computer Programming school, in downtown Seattle, and was not impressed. I ended up signing up at a private school in Lynnwood to become an electronics technician at a price of around $2000, or the same as the new Volkswagen Bug my brother had just bought.
After finishing at CTC Education Systems, I got a job at Tally Corporation, in Kent, repairing paper tape readers, and line printers. After building a little exerciser widget for the paper tape readers I worked on, one of the engineers handed me a data book for the just out Intel 8008, and asked me if I thought it could test those paper tape readers.
WOW! A Whole computer in a tiny little chip! I spent all my spare time, for about two weeks, reading that book! I discovered it wasn’t really a “whole” computer, but it was a good starting point. I decided that the 8008 was not really fast enough to do what we wanted. I think I have since figured out a different way to do what we wanted with the 8008, but long after I left Tally.
It didn’t take me a very long time to get tired of fixing paper tape readers. My brother Bob had moved to Livermore, California to work at Lawrence Livermore National Laboratory. That was only about 45 minutes from “The Silicon Valley”, the Mecca of the Electronics Industry, where that company Intel was.
I took a couple of days off, and drove to Livermore to visit Bob, and poke around the “valley” a little bit. I applied at Intel, and HP. HP didn’t want to talk to me because they wouldn’t hire someone without at least a two year degree, not some little private school. Intel said “we don’t have a position for you at this time”. I thought that was that. They called back 3 months later, and I had a phone interview with the head of the test engineering department. Intel and I came to an agreement, and I moved in with Bob on Memorial Day weekend, 1973, and started at Intel the next Tuesday.
In my spare time, I started designing all the stuff to go around an 8008 to make my own computer. Intel gave me a chip to encourage me. Eventually the 8080 came out, and it was a much better chip, but I was already heavily into building my 8008, and didn’t accept one when they offered it. The Altair 8800 came out, and the price of the kit was less that the price of the 8080 if you had to buy one.
After a while, I got another job, and the 8008 project foundered. In January 1976, I bought an IMSAI 8080 kit, with 8K bytes of memory and some peripherals. It didn’t work very well when I got it assembled, but having spent 2 years at Intel fixing memory testers, I determined that the memory was bad, so IMSAI swapped it out, and then everything was fine. Bob and I had a great time learning how to use it.
I would occasionally go to a Homebrew Computer Club meeting in Palo Alto on the way home from work, getting home to Livermore pretty late. Bob and I went to the Lawrence Livermore National Labs Computer Club meetings most months. Members would sometimes come over on weekends, and we would play, or I would help them debug their computers.
On one detour to the Homebrew Meeting on the way home from work, I remember Lee Felsenstein reading a letter from Bill Gates accusing most of us of stealing his software. I guess I have to admit that was true.
At that time, the computer cost about $395, before memory and I/O. If you bought an Altair 8800, Altair BASIC cost you $75. If you didn’t own an Altair, Altair wanted $400 for Altair BASIC. There weren’t any Microsoft dealers that we knew of, just the Byte Shops, which were Altair dealers. We and the denizens of the LLNLCC spent many a happy hour figuring out how to get MITS 12K BASIC to run on our IMSAI’s. 12K BASIC would do more stuff, but it was huge and much slower than the smaller version, so I didn’t use it after I got it working.
At another Homebrew meeting, I heard about some new software just coming out, called CP/M, a floppy disk operating system for 8080 machines. That was pretty cool, but disk drives were pretty expensive. Shortly thereafter, the president of the Lawrence Livermore National Labs Computer Club got a request for some help debugging a floppy disk interface to S100 machines like the IMSAI. He said “You should talk to the Sherry Brothers”. That was how I met John Torode.
John Torode, and his wife Patty, lived in Livermore too, and had a company called Digital Systems, that built floppy disk systems out of their house. They actually built the hardware that CP/M first ran on. They wanted to expand into the S100/Altair/IMSAI market, but they didn’t have an S100 machine!
We used my IMSAI to debug Johns S100 interface, and the Sherry Brothers became one of the earliest CP/M users in the Bay Area, as somehow John left one of his machines at our house, in case more debug was needed. Oh man, was this great!
Up till then, we had been using a Tarbell Cassette Tape Interface to load and store programs. The Tarbell really only liked a particular JC Penny portable cassette player, and you had to keep track of where on the tape you had stored a particular program. Soon we had several cassettes of programs like Altair 4K BASIC, Wangs Tiny BASIC, a Star Trek game, and others.
If we wanted to write an assembly language program, we would have to load what we would now call an IDE, or Integrated Development Environment. I think we had something like that from Processor Technology, the SOL computer folks.
CP/M had an assembler, an editor, a debugger, and lots of storage, on the 8″ single density floppy drives we had. Each 8″ floppy could hold a quarter of a million bytes of data! That is like a whole novel of the day. Novels have gotten longer since then.
John Torode and I became friends, and when he was having trouble debugging his Double Density floppy disk controller, I would stop by on the way home, and lend a hand for a while.
I had recently convinced my boss to buy a “logic analyzer” to help us debug our designs. He was pretty reluctant, but he agreed that the first time he tried it, he found a bug he had been looking for for a month. He decided it had paid for itself in that one use.
After work, I would bring the logic analyzer along with me, and John and I would debug his controller, till I was falling asleep. It didn’t take too long with the analyzer before John found his bug, and then he had a double density controller to sell, so the same 8″ floppy could hold half a million bytes.
Using the single density system John had left at my house, I got a copy of the source to Li-Chen Wang’s Palo Alto Tiny BASIC, and started converting it to run on CP/M. Once I had done that, I added a bunch of commands to talk to I/O ports, Memory, and interface to assembly language routines you might add. Between the assembly language source, the listing file, and the object file, it filled a whole single density floppy.
As far as I know, the Sherry Brothers Tiny Basic source I contributed to the Homebrew Computer Club Library is the only computer readable version of Li-Chen Wang’s Palo Alto Tiny BASIC available on the net. It is also the version of BASIC we run on the IMSAI at LCM+L.
All in all, I have many fond memories of using my IMSAI to learn about computers and software. I still have the front panel of mine, but the rest of it didn’t make the trip back from Silicon Valley to the Seattle area. Both of my brothers ended up buying IMSAIs, and you can see my brother Gales in the Museum at REPC in Sodo.
That is the question that one of the blowers in the base of KATIA, our 1967 PDP10-KA was apparently asking itself a couple of weeks ago.
KATIA had been running happily for a couple of months, and when I came in one day, she wasn’t. I tried rebooting her a couple of times, without any luck. With my usual collection of tasty puzzle morsels on my plate, it took me a few days to get around to poor KATIA. It was Friday when I started running diagnostics, and they all passed till it got down to one of the last ones: DAKCB.
DAKCB would fail while trying to test the FDVL instruction: Double precision Floating Point Divide. This was at the end of the day, so since I was scheduled to work half a day Saturday, I could work on it then. That might incite interesting conversations with the Museum Visitors.
After going to my 6:30AM Saturday morning WW meeting, Studying for a test for a while, then taking the test for the highest class Amateur Radio Operator license, and passing, I managed to stagger into work a little after 1PM.
I was assisted by Leda, our Post-Grad Ethnography student, who is trying to figure out why Grown Adults would want to spend their time playing with these ancient computers.
While looking at the machine when it failed DAKCB, we had noticed a couple of things: it had stopped, as in the PC wasn’t changing, but the RUN light was still on! This is a symptom of losing a pulse! In Asynchronous machines like the KA and KI, there is no clock that times the instructions. When you poke the start button, a single pulse is launched into a bunch of logic and delay lines. The little pulses and logic fetch the instruction, examine it to figure out what instruction it is, and wiggle all the logic at the correct time to do whatever that instruction is supposed to do. When it is done doing one instruction, and the RUN light is still on, it will take the pulse coming out of the end of the logic, and put it back in the front, starting the next instruction.
In my Zeal and Enthusiasm, I conned Leda into running the oscilloscope probe for me as we tried to follow this tiny pulse all the way through the logic till it disappeared. Normal trouble shooting practice would be to see the pulse at the beginning, look at the end and not see it, then look somewhere in the middle. We call this a binary search, and it is the fastest way to do this.
Unfortunately I am not sure where the end is, and I certainly don’t know where the middle of this process would be, so we started at the beginning and followed each step till we got to a pulse amplifier at 1F19: we could see the pulse go in, but it didn’t come out! We can just change the module, right?
I did that, and then noticed that the module I had taken out was REALLY warm in my hand, not so hot it burned me, but it shouldn’t have been that hot. Why? Being as how this is an updraft machine, I put my hand above the card cage, and it was nice and toasty warm up there. Where is all the air like is coming out of Bay 2’s cage, keeping all those cards cool? Going back around to the front of the machine, I notice that the blower in the bottom of Bay 1 was not turning, and shut the machine off. Now we know why it quit working!
This is not going to be pretty! Leda and I went down to the basement, and stole a blower from one of the other KA’s we have down there, and installed it. Of course the freight elevator had gotten its tail tied in a knot on Friday and was on strike for better working conditions, so I carried the 30lb awkward thing up the stairs, and managed to get it installed around 5:30PM. We turned on the machine, and luckily this blower was not having an existential crisis, and so it was blowing precious cooling air all through the cards in Bay 1: Yay!
Since we figured that it was probably bad bearings, David took the blower apart to change the bearings, and here is a picture of it sitting on a 2 foot by 3 foot cart:
Looking at the stator of the motor a little closer:
Those wires should not be black! I’m afraid this motor is toast!
I have so far spent about a week trying to get basic instructions to work on KATIA, without a lot of success. What I know so far, is that if I try to run out of fast memory, the ACs that live inside the processor, it REALLY doesn’t work. At a guess, I suspect that the logic receives all zeros for the second instruction. If I disable fast memory, or run in main memory, things work better, but some very simple things don’t work. It will add a constant to an accumulator, but it won’t load a constant into an accumulator. This reduces its usefulness as a computer a little. It is still pretty good as a room heater.
I will try to blog more frequently so you can follow along as we discover what else is wrong with KATIA.
Back in October 2018, our PDP10-KI went down, and it didn’t want to come back up. I ran all the normal diagnostics, and they all worked, but the TOPS-10 would hang when I tried to boot it. That is the definition of Computer Maintenance Hell, Everything works, but the operating system won’t run!
Running the normal diagnostics sounds like an easy thing, but that isn’t always the case! The first bunch of diagnostics run from paper tape, and that is pretty easy. As we continue past DBKAG, the tapes don’t fit well in the reader, so we switch over to getting them off of DECTape, herein lies the rub: the TD10 DECTape controller on the KI is almost always broken when I need it.
After much gnashing of teeth, and tearing of hair, there was enough blood on the floor for the dust bunnies to leave tracks in that pointed to what was wrong with the TD10, and we were off once again. I ran the rest of the usual diagnostics, and they all passed! Still didn’t boot.
I had plenty of things to keep me occupied, so our poor PDP10-KI didn’t get a lot of my attention. During our group session bringing up the KATIA, we played with the KI some, and found that the KI didn’t like its memory! The KA liked it, but the KI didn’t! It would run the DDMMD memory diagnostic for about 10 or 15 minutes, then fail. The KA would happily run the KIs memory well past where the KI would fail. Looking at the errors, it appeared as if things were getting confused about which particular bit of memory it was talking to. It would always start failing at location 0374000 where either it hadn’t inverted the contents of those locations, or it had done it twice.
Now it did’t fail all the time. The part of the test that failed was going through memory incrementing the address by a more significant bit than the LSB, then wrap around to the LSB. When it started with the LSB, bit 35, everything was fine. It worked when it did bit 34. It had to get up to bit 25 before we had problems, we would fail between bit 25, and bit 21, 20 through 18 worked too.
I spent quite a while trying to write a diagnostic that did what DDMMD was doing, but in a quick and repeatable way. I believe I got pretty close, but nothing I wrote would tickle the problem… bother!
Months have now passed, and I broke down and plugged in the logic analyzer. Most of the time, I use an oscilloscope as my main debug tool. ‘Scopes don’t lie as much as logic analyzers do! If the logic is working, procducing 1’s and 0’s as it should, a logic analyzer is a good tool. When things are broken, sometimes you get a half, or a third instead of a one or zero, and this is where the ‘scope is better about telling the truth, and the logic analyzer will lie. Here the machine was pretty much working, at least the diagnostics thought so.
Here is one of the first logic analyzer traces I took, just showing the logic analyzer sample number, the memory operation, and the address:
I did a bunch of work with PERL to go through the 100MB of data that came out of the logic analyzer, and boil it down to what you see here.
Now it turns out that the way this part of DDMMD worked, is that it would fill memory from the bottom to the top stepping by 1, complement each location using the funny addressing pattern, then verify from bottom to top normally. I added the top 8 bits from the CPU’s MA (Memory Address) register to the logic analyzer:
This is where it is doing the final verification, and you can see something funny here: the upper bits from the MA register incremented like I expected them to, but when a whole bunch of them changed, the address going to the actual memory didn’t follow as quickly! Instead of going from 377777 to 400000, it went to 777000! Here we get into a bit of logic called the “Pager”.
A PDP10 can really only talk to 256K words of memory at one time. How can the KI use 4MWs of memory? That is the Pagers job! The Pagers job is to translate the logical address that the CPU provides into a physical address of a hopefully larger memory. While running diagnostics, the Pager should be turned off, resulting in a maximum of 256KW of memory directly addressed from the MA register to the address lines going to memory. Something was going wrong here!
I added another set of 8 probes from the Logic Analyzer, and started moving backwards from the physical address going to the memory, to where the MA register fed into the Pager. When I got to the output of the CAM’s, there was something I didn’t understand.
What is a CAM you ask? CAM stands for “Content Addressable Memory”. What you do is give it the logical address that you want, and it will tell you if it knows about that, with a single line for each location inside itself. All four of them.
I got lucky, the first group of 8 output bits looked like this:
Near as I can tell, there should be only a single 1 in the right column. It is octal, so we can watch which location in the CAM has the data as the addresses change, and when we get to 367631, we get two ones! I believe that output should have been a 002, not 006!
That output came from board 2PR09, so I swapped it and 2PR08, and I couldn’t run the diagnostic at all due to a “Page Fail Trap Error”! Ah, I think we are very close here! I checked the inventory, and we didn’t have a record for an M260 board, so I stole one from one of the machines that came in in September, and Voila, the memory test passed! It can even run TOPS-10 if we don’t try to initialize its serial ports. This could be correct since we stole a bunch of its serial lines to use on KATIA while the KI was asleep.
OK, Since the KI, the KA, and the CDC are all working, I seem to have made it out of Computer Maintenance Hell for now. Give them a little while, one of them will fail.
The PDP10-KI went down sometime in the fall, maybe October. This is the machine just to the right of the CDC6500 as you come in the second floor computer room. I noticed this fairly quickly and tried to reboot it, but it would hang when I tried, several times.
OK, it must be time to run diagnostics, which I proceeded to do. It passed all the diagnostics from DBKAA to DBKAH, which are the ones on paper tape, but the TD10 DECTape controller, just to the right of the console had quit again, as it has done almost every time I need to use it.
This TD10 and I just do not get along very well. It always kicks me around the block several times before it will let me know what is wrong, so I can fix it. Because of this history, it sometimes takes a while for me to generate the gumption to work on it. This time was no exception, since I was busy trying to get the KA working. If you don’t know about gumption and gumption traps, you should read the classic “Zen and the Art of Motorcycle Maintenance”, which doesn’t really talk about motorcycles, or Zen much.
Back to our story: We fixed the KA, moved it down to the second floor computer room, fixed it again, and had a small gathering to boot it the first time. The CDC was being its normal self, but was refusing to be down when I arrived at work, which is my signal to drop everything and work on it. I was running out of reasons to avoid working on the KI.
Contrary to normal behavior, it only took a few hours to figure out the problem with the TD10, so I could run the diagnostics that are very awkward to load from paper tape.
All the normal diagnostics ran except for DBKAL. I think it took a few days to remember that there is a diagnostic for which the binary we have is wrong, and I have to patch a couple of locations after loading, in order for it to work. Now DBKAL and DBKAM work. On to some more obscure ones. All the CPU ones seem to pass, does it boot now? No, it still hangs after the OS is loaded, and we type “GO” to fire up timesharing.
What else can we test? We ran DDRHA, which tests the RH10 disk controller, but it passed. We then ran DDRPI, which tests the disk drives. Now we don’t really use the disk drives it is expecting to test, we use our MDE (Massbus Disk Emulator). We have been using this MDE for about 5 years now, but there could still be a bug hiding in there somewhere.
DDRPI looked like everything was fine for about 20 minutes, while it was doing register tests, seek tests, all ones and zeros tests. When it got to testing the surface of the disk, things started to go wrong. It would get an error where it looked like the data was misplaced, like it was reading the wrong sector or something like that.
How could that be? This thing has been working fine for over 5 years, in fact when we ran DDRPI from the KA, using the KI’s Memory, RH10s, DAS33, and MDE, EVERYTHING was FINE, even the surface test!
How about the memory? It passes my little MARCH memory test, from the KI or the KA. Dragging out the DECTape again, we loaded up DDMMD, which is one of the memory diagnostics. We fired it up, and it ran fine for about 15 minutes, whereupon it started spewing out errors. We have run this a BUNCH, and the errors usually seem to start at location 374000, and the data seems to be inverted from what it should be. The test complains about address bit 24.
We run the same test from the KA against the KI’s memory, and it works fine! It is handy to have the KA right across the room to enable this kind of testing.
What is really going on here? Let’s look at the console:
OK, TN=2, that means it is doing “Address” test. AS=F24 turns out to mean that it was doing fast addressing on address bit 24. “What does that mean” you may ask? I did! After much grovelling over the DDMMD listing, and consulting with Rich Alderson, I found that they would fill memory with the address and its complement, and then go through reading a location, verifying it was correct, and writing the complement in it, then go back and read and verify that they all had the complement. Ah, but what about that F24 part? When they are reading and complementing the data they start skipping locations, by changing which address bit they increment first. The first time they do this, they use bit 35 as the lsb, but next time through they shift the lsb over to bit 34, then 33 etc. When they get to bit 24, we run into this problem.
How do I figure out what is really going on here? I decided to write a version of my MARCH that does this, MRCHFA. It took a while, but I finally got it to work on the KA, and tried it on the KI: Unfortunately the KI passed it too! What else are they doing differently? OK, more grovelling over the listing: They are stuffing all their inner loops down in the Fast AC’s to speed them up. On to MRCHF3, which pushes the inner loops down to the Fast AC’s. Does the KI fail that one? Nope!
I’m running out of ideas, where do we go from here? I decide to just watch it for a while, and see what happens AFTER it starts to fail. I see it fail bit 24 from 374000 to 377777, both bit 23 and 22 over the same range, then it starts failing location 700000, in the same way. Shortly thereafter, the program gives up in disgust, and stops printing the results, and starts ignoring the errors. Now I just watch the lights on the ARM10 memory.
I got used to the way the lights blink while working on MRCHFA, and MRCHF3, as the LSB moves up the address, the slow blinking address follows it.
But wait, that isn’t what I am seeing: I see the lights increment from the bottom, do the FA thing, then increment from the bottom again, and then shift the LSB over. What is going on here? Is there anything on the ARM10 that will tell me anything? Yes, there are the read and write lights. While writing, both lights come on, but a read only lights the read light. After the FA thing, it just reads! Back to grovelling over the listing some more.
Ok, what they do is: fill memory from the bottom to the top, check and complement using the shifting LSB, and then check from the bottom to the top. Another new test called THEIRS, which of course doesn’t catch the problem either. I am running out of hair to pull out here!
As I write this, both machines are running DDMMD against the others memory, happily, no errors. No happy ending… YET!
Back in September 2018, we got a new addition to the LCM+L computer collection, but we didn’t talk about it. This was something that we had been looking for for 10 years or more. We knew where a couple of these machines were, but were not able to convince the owner of that collection to part with one. We kept looking, but the ones this collector had were the only ones we could find. Well, that isn’t quite true, Stephen Jones and I went down to the San Francisco Bay area to visit the Computer History Museum, and photograph one they had in their warehouse, but they weren’t willing to part with it. We thought maybe we could build one.
Stephen went to Australia on a hunt, and came back with some bits of one of these, but not a whole machine.
Stephen finally got the owner of the big collection in Sweden on the phone, and got him, Peter Lothberg, to talk to us again about his collection in late spring last year. A bunch more negotiations happened. Paul Allen was brought in, and a contract was signed.
A couple of our Archivists, Cynde Moya and Ameila Roberts, along with Stephen Jones and Jeff Kaylin, from Engineering, went to Stockholm in August to go over the whole collection, and pack up everything Peter was willing to sell us, put it in containers, and get them on a ship.
In September, the containers came in, and around 8 one morning, the first container arrived at our loading dock. Stephen sent Paul Allen a picture of the open container, and he was here by 8:30. He was very excited, because he had been waiting for one of these for over 10 years. He asked Stephen “Can you have it working in 3 weeks?” Stephen said something like “How about by the end of the year?”
The first thing we had to do to KATIA to get it running was to upgrade the power supplies. For a lot of our older mainframes, we have decided to leave the old supplies in place, and just move the wiring over to new efficient and reliable switching power supplies. David Cameron and I spent around a week figuring out how to do this, this time around, and I think it looks pretty good!
The silver bits, are the new supplies and their mounting plates, mounted either in front of, or next to the original power supply filter capacitors.
We also had to upgrade the power supplies in the memory, where we have designed new modules to replace the old modules, but the parts were scarce, so we just replaced all the old aluminum electrolytic capacitors in the existing modules.
We re-configured the main blowers to run on lazy American electrons (120V), as opposed to those very energetic (240V) European electrons that they were set up for.
We powered up the machine, and I started into the debug process. I started by adjusting all those new power supplies so that all the correct voltages appeared at the correct places, and they all shared the load as best I could.
Now, what can it do? One of the first thing we will need is to be able to read diagnostics on the paper tape reader. With the KI, I had discovered this process requires the adder part of the CPU to work, and so I fired up the Way-Back machine to recover how I had tested the KI back then. It only took two instructions, and ADDI (add immediate) and a JRST (jump). But wait, they have to be stored somewhere! I had to fix some memory.
In PDP10’s there are two kinds of memory, Accumulators (the first 16 locations of memory), and main memory. The Accumulators in most PDP10’s live in special logic inside the CPU, called “Fast Accumulators”. I had to steal a board from one of the other 2 KA’s that came with KATIA, in order to get the Fast AC”s to work.
I then started poking at the MG10 128K word memory box we had hooked up to KATIA. I got 32K of it working, and that was enough to run the paper tape diagnostics. Had to replace the light bulb in the card reader.
Now to check the adder: I loaded the two instructions with the front panel, and no, the adder wasn’t working. It couldn’t propagate a carry past bit 20. After a bunch of poking around I found that there was some kind of a connection problem between cards 2A29 and 2A30, which are in chassis 2 (the middle one), on the top row, about 3/4 of he way from the left. Swapping them and swapping them back seems to fix it.
After about another 4 weeks, with plenty of tearing of hair and gnashing of teeth, lots of board swapping and transistor changing, I finally got KATIA to run the KA versions of diagnostics that we run from paper tape on the KI! From that point the tapes get too big for the paper tape reader to handle easily, so on the KI we run those from the DECtape drives.
It took another couple of weeks to get the DECTape drives and controllers working on both the KA and KI. We needed the KI to work so that we could write the proper diagnostics onto a DECtape for KATIA to read. The problem with that, is that the KI hasn’t been able to boot in a few months. It passes all the paper tape diagnostics, but there is something fishy with the disk interfaces. Bother!
I fixed the rest of the MG10 so we had 128K words of memory working.
While I am working on the KI on a Thursday afternoon, it gets decided we should get KATIA down from the 3rd floor, and on display in 2nd floor computer room before opening on Friday! This involves detaching the DECtape cabinet from one end, the memory from the other end, and disconnecting a whole bunch of cables that go between chassis 3 and the other two chassis’. We also have to make space by moving the IBM 360/30 out of where we want to put KATIA.
We start by disconnecting all the required stuff Thursday afternoon, and getting the 360/30 put back together enough to move, and we move everyting.
By Friday all KATIA’s boxes have power again, and I carefully put all the cables back where they came from the day before. Half of the Museum shows up for powering up KATIA in its (her?) new home. I turn it on, and poke the Read-In button to load up my simple memory diagnostic, and… nothing happens!
Has the adder stopped working again? Well, yes. I wiggle the cards, and voila, KATIA can add again. Yay! I poke the Read-In button again and… not nothing! The tape goes through the reader, but the lights keep incrementing, even after the end of the tape has fallen out of the reader. THAT is not Correct!
Finally on Friday morning two WEEKS later, I have grovelled over the gospel according to DEC, and abased myself before the computer gods to almost understand how Read-In is supposed to work. Hmm, is that a string to pull on?
Read-In works by loading a BLKI instruction into the instruction register of the machine and executing it. PDP10’s are NOT Reduced Instruction Set Computers! The BLKI instruction reads a memory location, a pointer, adds 1 to each half of the data there, puts it back in memory, then does the I/O instruction implied by the “I”, uses the right half of that memory location to point at where in memory to put the data it read from the paper tape reader. Then it looks at the left half of the pointer, and if it is zero, it skips the next instruction, if it wasn’t, it will execute the next instruction.
As I was watching it try to do this with the oscilloscope, it didn’t appear that the part where it was reading the pointer location was taking as long as it should have, considering that pointer was located in core memory at the time, and reading and writing it should have taken a whole micro-second!
The I/O instruction was decoded over in chassis 3, but the instruction timing and memory control was over in chassis 1. Eventually I noticed that the part of the instruction it was skipping was the memory access, which was supposed to be started by a signal called “IOT BLK”, which wasn’t making it to chassis 1. I could see it in chassis 3! The chassis 3 end of the cable had been loose to move the machine, let’s take a look!
I replaced the broken resistor, but it still didn’t work! OK let’s look farther up the cable, did anything else happen in the move?
That doesn’t look right! I suspect that somehow, in the move, this cable escaped momentarily enough to get caught under a caster and rubbed away, because it used to work, and the signal in question is one of the top two signals in the cable. The consequences of doing things in a hurry.
Do I pull this cable from one of the other machines we have or do I try to repair it?
As soon as this was done, KATIA went back to behaving for me. I ran all the diagnostics we had run, and a few more that we ran from DECtape on the KI, till I got to the one that also failed on the KI. We found the locations in this one that we had to patch for the KI, and it now runs too!
Did I get it done in 3 weeks? Unfortunately no, so Paul Allen didn’t get to see it run, and play with it. He must have known back when he asked about 3 weeks, because it was very close to three weeks from then that he left us.
In my last blog on the CDC, we were having a slow refrigerant leak in Bay 1, and we were waiting for parts. The new parts came, but they were wrong, then they came again, and they were wrong again. Eventually the right parts were delivered, so we took the CDC down yesterday morning to work on the cooling system.
Cole, from Hermanson, arrived at 7AM, and we proceeded to shuffle some of the chiller plumbing around, so I can always be able to restart the computer after a power fail, and it took about an hour, and seems to work fine!
A little later, Jeff Walcker, also from Hermanson arrived to install the new compressor parts that had arrived. We opened the 3phase 60Hz breaker for Bay 1, and Jeff started un-hooking the power. We had discovered that it was leaking where the compressor power wires go into the case. As Jeff was taking it apart, all the little bits were turning into powder. The compressor is 52 years old. It was decided that we needed a new compressor. Bummer!
Contrary to recent experience, not only was a replacement compressor available, it was in stock, IN SEATTLE! Jeff was back from getting it almost before we were done deciding that was what we should do! Compared to 102 days for the chiller, this is amazing.
He did a bunch of wrestling with the compressors to get the new one in, and we don’t have the motor cooling loop hooked up yet, because the pipe didn’t want to go on the new one quite the same as it came off the old one.
You can see the disconnected copper cooling loop wrapped around the compressor in the above picture. Jeff will bring a slightly longer hose when he comes for our quarterly maintenance next month.
The CDC is happy again, and hopefully not leaking!
The CDC 6500 has led a rough life over the last 6 months or so: way back on the afternoon of July 2, 2018, I got an email from the CDC’s Power Control PLC telling me that it had to turn off the computer because the cooling water was too hot! A technician came out and found that the chiller was low on refrigerant. He brought it back up to the proper level, and went away. Next morning it was down again.
After much gnashing of teeth and tearing of hair, it was determined that the compressor was bad in the chiller. “We’ll have a new one in 5 weeks!” The new one turned out to be bad too, and so another was ordered, that was easier to get. Only about 3 weeks, instead of the 8 that the official one took. That worked for a few weeks, and the CDC went down again because the water was too hot.
This time it was very puzzling because as long as the technician was here, it worked fine. He spent most of a morning watching it, decided it was OK, and left, but he didn’t make it to the freeway before it went down again. He came back, and watched for the rest of the afternoon, and found that the main condenser fan would overheat, and shut down, causing the backup fan to come on. The load wasn’t very high, so the backup fan had to cycle on and off, while the main fan motor cooled off. This would go on for a while, till both motors were off at the same time, and then the compressor would go over pressure because the condenser fans were off, and the chiller would stop cooling, resulting in the “Water too HOT” computer shutdown.
Another week went by waiting for replacement fan motors from the chiller manufacturer, with no luck. Eventually we gave up and got new fan motors locally, installed them and the chiller has been working since. While the CDC didn’t seem to mind being off for 102 days for the compressor problem, it didn’t like being off for 3 weeks while we fiddled with the fans.
Both when it was off for 102 days, and this time, we found that Bay 1 was low on refrigerant. The first time we just filled it up, but the second time we looked closer, and found that there is a small leak where the power wires go into the Bay 1 compressor. The compressor manufacturer, the same guys that made the chillers compressor, will gladly sell us a new compressor, but the parts for the 50 year old, R12 compressor, are no longer available. We are working on that, but I haven’t heard that we found the parts yet.
Back to more recent times: now that the chiller is chillin’, and the CDC’s cooling system is coolin’, why isn’t the computer computing?
Let’s run some diagnostics and see what happens: I try to run my CDC diagnostic tape, but the machine complains that Central Memory doesn’t seem to be available. No, I didn’t run the real tape drives, I ran the imaginary one that uses a disk file on a PC to pretend to be a tape drive. Anyway, that didn’t work, so I flip the zillion or so Dead Start switches in my emulated Dead Start panel, to fire up my Central processor based memory test, and get no display at all! This is distinctly unusual. Let’s try my PP based Central Memory test: That seems to work till it finishes writing all of memory, then the display goes blank. Is there a pattern here?
I put a scope probe on the memory request line inside the memory controller in Chassis 3, and find that someone is requesting memory every possible cycle. There are four possible requests: the Peripheral Processors as a group get a request line, each Central Processor gets a request line, and the missing Extended Core Memory gets a request. Let’s find out who it is: the PP’s aren’t doing it, neither of the CP’s are doing it, and the non-existent ECM isn’t doing it. Huh? Nobody is wants it, but ALL of it is getting requested!
I am going to step back a little bit, and try to explain why it sometimes takes me a while to fix this beast. This machine was designed before there were any standards about logic diagrams. Every manufacturer had to come up with their own scheme for schematics. Here is one where I found a problem, but we will get to that in a bit.
Now when there are two squares with one above the other, and arrows from each going to the other, those are flip-flops. When you have a square, or a circle with multiple arrows going into it, that is a gate. Which one is an “or” gate, and which one is an “and” gate? Sorry, you have to figure out that for your self, because the CDC documentation says either one can be either one. The triangle with a number in it would be a test point on the edge of the module. The two overlapping circles, kind of like an elipsis, indicate that is a coax cable receiver, as opposed to a regular twisted pair signal. A “P” followed by a number indicates a pin of the module.
This module receives the PP read and write signals from the PP’s in chassis 1, on pins P19 and P24. On the right side of the diagram, you can see where all the pins connect. If we look at pin 24, we can see it connects to W07 wire 904, and pin 19 is connected to W07-903. The W “Jack”s are coax cables, the other letter signals go somewhere inside this chassis.
Really, what we are looking at here, is that a circle, or a square is the collector pull-up resistor of one or more silicon NPN transistors. the arrow heads are the base of the transistors, and the line coming into the head has a base resistor in it. If there are three arrows coming into a square, like at the bottom, those three 2n2369 transistors all have their collectors tied together, with one pull-up resistor. I could be slow, because it took about 6 months before I felt I was at all fluent in reading these logic diagrams.
Now we have to talk about the Central Memory Architecture a bit. The CDC has 32 banks of 16K words of memory. Each of these banks is separate, and they can be interleaved any way the 4 requestors ask for them. At the moment, I am only running half of them, because there is something wrong with the other half. Each of these banks does a complete cycle in 1uS. The memory controller in chassis 3 can put out an address every 100nS, along with whether it is for a read or a write. This request goes to all banks in parallel. If the bank that it points to is available, he will send back an “accept” pulse, and that requestor is free to ask for something else. If the controller doesn’t get an “accept” he will ask again in about 400nS. There is a bunch of modules involved in this dance, and it is a big circle.
A little more background: This machine was designed before there was such a thing as plated through holes on printed circuit boards. The two boards in each module were double sided. What they did when they needed to get to the other side of a PCB, was they would put a tiny brass rivet in the via hole, and solder both sides.
What I eventually found was that the signal from P23 of the module in 3L34, wasn’t making it to pin 15! There was a via rivet that wasn’t making its connection to the other side of the board. I re-soldered all the vias on that module, and now we were only requesting memory when someone wanted it!
Now that we can request memory and have a reasonable chance of it responding correctly, it is on to testing memory. I loaded up my CP based test, and it ran… for a while. Then it quit, with a very strange error. The test uses a single bit, and its complement to check the existence of every location of memory. It will read a location, and compare it with what should be there, and put the difference in a second register. Normally I would expect a single bit error, or maybe 12 bits if a module failed that way. The result looked like 59 bad bits, or the error being exactly the same as what it read. Usually this is because the CPU that is running the test is miss-executing the compare instruction.
While I was thinking about that, I ran Exchange Jump Test to see what that said. A PP can cause a CP to swap all its registers, including the Program Counter with the contents of some memory that the PP points to. This is called an Exchange Jump. The whole process happens in about 2.6uS as it requests 16 banks of memory in a sequence. This works the memory pretty hard. Exchange Jump Test (EJT) would fail after a while, and as I looked at the results, I noticed that it was usually failing a certain bit in bank 7. I checked, and noticed it was an original memory module, so I looked at my bench and found I didn’t have any new ones assembled, so I had to put the sides on a couple of finished PCB assemblies, and test them. I then swapped out the old memory in bank 7 with a new semiconductor memory, and EJT passed!
I then checked to see if my CP based memory test worked, and it did too. We are back in Business after over 5 months. I am keeping my fingers crossed in the hope that the chiller stays alive for a while.