6502: Coding – Just because it works doesn’t mean it’s correct

I ran into an interesting problem the other day trying to upgrade my 6502 build 4 to work with the 65816. I plan to write more on this in another post, but after just one minor change I got build 4 with a 65816 to almost startup my 6502-based Forth operating system. For some reason, the startup failed midway through loading the second Forth code block. The initial splash screen and the error code provided showed that the system was at least partially working.

I find this somewhat amazing. The 65816 is not supposed to be a drop-in replacement for the 6502 (see the 65802 for that). But if you don’t care about memory beyond bank 0 or the features available through the other specialized 6502/65816 pins, it seems like a simple modification might be all that’s needed to get a 65816 to run in a 6502 system.

I ran the build 4 code through my 65816 emulator to help shed some light on the problem. I was expecting to get the same error code from the emulator, but instead, after displaying the initial startup splash screen, the emulator dropped into the monitor program without displaying an error code. Strange. I got the same results running the code on my 6502 emulator. Stranger! Remember that the build 4 code runs successfully on build 4 with a 6502. It understandable that the build 4 code could have problems running on build 4 with a 65816, but not that it wouldn’t run on the 6502 emulator after successfully running on build 4 hardware with a 6502.

I verified that the 6502 emulator worked correctly with my build 3 code and verified that the build 3 code only differed from the build 4 code by the changes in the memory map, basically moving the VIA and ACIA addresses. My main clue was that the program counter was set to a region in RAM when the emulator dropped into its monitor, not exactly surprising since the emulator breaks to the monitor when it encounters a BRK opcode. So the build 4 VIA and ACIA addresses were causing the emulator to run off into uninitialized memory.

Stepping through the code in the emulator, I discovered the problem was in my modeling of the emulator’s ACIA receiver interrupt. To handle the receiver interrupt, I simply called the emulator’s step function a sufficient number of times to process the entire interrupt service routine. The problem was I started processing the receiver interrupts, triggered, as with my hardware build, by writing an escape code sequence to the ACIA transmitter, before the emulator had returned from the triggering event. Of course, the actual 6502 will always complete the current instruction before processing an interrupt.

The first thing the 6502 does when processing an interrupt is push the program counter onto the stack. My problem was the program counter hadn’t been fully advanced in the emulator since I was still in the middle of the triggering event. In essence, I was reentering the emulator step function with an incorrect program counter and thus pushing an incorrect return address onto the stack. When the interrupt service routine finished it would pull this incorrect address from the stack and essentially start executing garbage. At some point it would execute a BRK opcode and dropped to the monitor.

But why did everything work correctly with my build 3 code? Recall that the code was the same as the build 4 code except for the different VIA and ACIA addresses. I didn’t fully track this down, but essentially, the build 3 addresses were interpreted in such a way to allow the code to continue running so that it got back on track after completing the interrupt service routine. This didn’t happen with the build 4 addresses before coming on a byte that was interpreted as a BRK instruction.

The bottom line? Don’t assume that some code is correct just because it works in a particular situation. Robust testing is needed. Common advice, but not something I followed in adding interrupt capability to my emulator. I’m working to make the ACIA interrupt triggered in a separate thread, similar to what I use for the VIA. This has a bit of a downside as I’ve noticed the emulator is pretty slow when adding an additional thread. Perhaps this won’t be much of a problem since the ACIA interrupt is used for file access, where speed isn’t as critical.

Postscript

I reran my build 4 code after fixing my emulator’s interrupt handling. It ran just fine in both the 6502 and 65816 emulators, as I expected it would. This shows the problem I’m having in getting build 4 to run with a 65816 is hardware related, likely a bus timing issue. If it’s related to needing to latch the data bus, becuase the bank memory byte is multiplexed on it, then an easy modification of build 4 might not be possible. I want to keep build 4 working for a 6502, so I’ll have to do a separate 65816 build if I can’t find an easy fix. More to come.

Post Postscript

Revisiting my build 4 running with a 65816, I looked for what might be causing the problem. The error I was getting implied that things weren’t being properly written to RAM. I looked at the data bus and write enable signals and compared them with those with the 6502 installed. I didn’t see any difference. Just for kicks, I removed the capacitor on the write enable line that I had to install for the 6502 build to run properly. Bingo! The build with the 65816 started up normally. Very strange. The capacitor was needed for the 6502 to run properly but kept the 65816 from running properly. Unfortunately, I can’t get a good oscilloscope trace of the problem becuase hooking up the oscilloscope causes a startup error. More on this in my next post.