65816: Running C-based Code Outside Bank 0 on the 65816

In my last post I showed that you could run cc65 compiled C-based code on the 65816. The main limitation was that the C-based segments had to be located in Bank 0 (and of course that the standard C-library functions were running with 8-bit registers and none of the 65816 added instructions or address modes). Trying to place these segments outside of Bank 0 lead to problems. See my previous post for details.

I ended my previous post wanting to try moving the C-related segments of my N-Queens example project out of Bank 0. The project uses several segments for C-related items including the DATA, BSS, RODATA, CODE, and ONCE segments. C-based code also uses the ZEROPAGE segment, but that obviously can’t be moved from Bank 0.

In my mind, Bank 0 is better left to things that must be or at least greatly benefit from being in it. This is a personal preference, not a hard rule. Others have argued that since I can currently place everything in Bank 0, I should just opt for best performance and avoid using other banks. But I’d rather plan for future possibilities where Bank 0 might be better utilized. I’d also like to gain the experience of utilizing the expanded memory addressable by the 65816 for more than just heap storage.

In approaching this, I wanted to make as few changes as possible to the standard C Library modules. As such I didn’t want to modify the C-related segment names, but only move their position in memory by making changes to the configuration file.

Only a handful of changes to the C library standard modules were required. Further changes were needed to my system startup code to properly initialize the C data segment. The data bank register also needs to be set to the C data bank prior to calling any C function.

Required Configuration Changes

Based on the configuration of my current 65816 build, I decided to place C-related code in Bank 1 to the extent possible and C-related data in Bank 3. I like the idea of putting C-related code in a dedicated bank, but I don’t have a separate ROM bank for that. If I wanted to get fancy, I suppose I could load a C-related binary to RAM and keep all C-related items together. This is beyond the scope of my work here though.

As I wrote in my previous post, assigning the C-related segments to memory areas in other banks requires many changes to the C library modules, often simply due to the fact that symbols in the upper memory banks are 24-bits wide rather than 16-bits. It isn’t difficult to accommodate this by using the loword assembler function, but this involves many more changes to the standard C library which I wanted to avoid. The DATA and BSS segments had this problem for the N-Queens project. Other modules and segments not used in the project could have the same problem.

The DATA and BSS Segments

The cc65 compiler and C libraries use the DATA and BSS segments for their standard purposes, to store initialized and uninitialized data respectively. The DATA segment initialization data is stored in ROM and copied to RAM at startup. We specify locations for these with the load and run segment attributes in the configuration file.

Instead of assigning the runtime DATA and BSS segments to Bank 3 we can simply leave them in Bank 0 and change the data bank register to bank 3 prior to calling a C function. Their location in Bank 3 will be at the same 16-bit address that they are assigned to in Bank 0. Note thank Bank 0 is not affected by their assignment there. You should place them at whatever addresses you dedicate to them in Bank 3.

One complication is that the corresponding data area in Bank 3 needs to be initialized. I used the following routine, rather than calling the standard C function copydata to initialize this area.

Routine to copy the DATA segment from ROM to Bank 3 RAM

        ; copy DATA segment to Bank 3 (C library stack, data and heap)
        ldx #.loword(__DATA_LOAD__)    ; source address
        ldy #__DATA_RUN__       ; destination address
        lda #__DATA_SIZE__      ; bytes to copy less
        dec                     ; MVN needs bytes-1
        mvn #1,#3

The code above is hardwired to specific banks for this example, from the DATA segment load area which I placed in Bank 1 ROM to the C-related data Bank 3. Better would be to define the C code and data banks in the configuration file and use the related symbols in the code above.

Note that the linker will “dump” each segment in the order it is placed in the configuration file. Depending on how you structure your system, the position of the DATA segment in the file may be critical for proper operation. Both the load and run segments will be dumped. I couldn’t determine a rule for its placement. If your code fails to run, but otherwise seems ok, try adjusting the position of your DATA segment. (I suppose a little troubleshooting would tell the story, but that’s deeper than I want to dive here).

The BSS segment is initialized to zero by the C startup code when the C library is initialized. No change is needed for this other than switching the data bank register to Bank 3 prior to calling the initialization function.

Calling init_c

        ; initialize C library
        reg8
        lda #3
        pha
        plb
        jsr init_c
        plb
        reg16

The init_c function is system specific. For the N-Queens example project it looks like.

init_c

.segment "ONCE"

init_c:
        ; set up C stack
        lda #<__STACKSTART__
        ldx #>__STACKSTART__
        sta sp
        stx sp+1

        ; initialize BSS segment
        jsr zerobss

        ; call module constructors
        jsr initlib
        rts

Notice that init_c is place in the ONCE segment which I’ve place in Bank 0.

The ONCE Segment

The cc65 compiler and C libraries use the ONCE segment “for initialization code run only once before execution reaches main()”. As in my previous post, I was not able to move the segment from Bank 0. If it is moved out of Bank 0, the cc65 linker, ld65, fails with

ld65 error after moving ONCE segment to Bank 1

Precondition violated: Index < C->Count, file '.\common\coll.c', line 188

Note that the standard configuration places the CONDES constructor table in the ONCE segment. I don’t think there is anything special about the segment name. It’s only referenced in a handful of C modules and can be changed to RODATA or some other segment as desired without problem. But the segment used for this code can’t be placed outside of Bank 0. My guess is that the linker is failing to properly create the constructor table when it has long addresses associated with it and thus the Index associated with the table is not less than its Count (they’re probably both zero).

I’m not going to try to debug or modify ld65 to overcome this. For now, the code in the ONCE segment (129 bytes) fits well within the small about of ROM I’ve dedicated to Bank 0. If it was a lot more, I’d be inclined to determine the cause of the problem. Still, I’m reluctant to maintain a separate version of the cc65 toolset so I’ll leave it here.

The RODATA and CODE Segments

The cc65 compiler and C libraries use the RODATA and CODE segments to store C-related code and read-only data. These segments can be assigned to Bank 1 with minimal changes needed to the standard C library. Doing so requires changes to how we call C functions and access associated read-only data which the cc65 compiler places in the RODATA segment.

With the data bank register set to Bank 3, any references to symbols in the RODATA segment in Bank 1, including ones auto-generated by cc65, will not be accessible with normal addressing available to the 6502 or 65C02. The easiest way to accommodate this change without modifying the standard C library is to set aside an area in Bank 3 for this data and initialize it during startup. I reserved space at the start of Bank 3 for RODATA and use the following code to initialize it.

Code to initialize RODATA in Bank 3

        ldx #.loword(__EXROM_START__)    ; source address
        ldy #.loword(__EXROM_START__)    ; destination address
        lda #__RODATA_SIZE__    ; bytes to copy less
        dec                     ; MVN needs bytes-1
        mvn #1,#3

As above it’s better to not hardwired this to specific banks.

This is a bit wasteful compared to having C-related read-only data and code in a single bank. The C-related read-only data is about 326 bytes in the N-Queens project. This amount of memory can be considered wasted with this approach.

Another downside to this approach is that the standard C-related segment names can’t be used for other program elements, at least as long as we want to keep the C-related items separate. Doing this is probably overkill in most cases and might not be possible depending on the system. It probably makes sense to put non-C-related RODATA in another segment though if it is large. This will prevent it from being needlessly duplicated in the C data bank.

We need a long subroutine call to call a C function in Bank 1 from code running in another bank. However, we can’t simply use the 65816 JSL instruction since the standard library function will return with the normal RTS instruction. To get around this for the N-Queens project, I created a stub, main, that my startup code calls to transfer control to the C module.

Call to stub to transfer control to the C module in Bank 1

.segment "HWCODE"   ; Bank 0

reset:
        ...

        ; transfer control to C module
        php
        reg8        ; switch to 8-bit registers
        phb         ; save data bank
        lda #3      ; switch data bank register to Bank 3
        pha
        plb
        jsl main    ; call stub in Bank 1
        plb         ; restore data bank
        plp         ; switch back to 16-bit registers

        ...

.code
main:
        jsr _main   ; C main function 
        rtl

I could have used a long jump instruction if I wanted to permanently transfer control to the C module, but the above method allows calls to individual C functions at the cost of additional overhead. This is avoided in my Forth operating system as its code is assigned to Bank 1 as well.

Contrary to what I thought in my previous post, the C stack can be placed outside of Bank 0. Notice that in the code above I switched the data bank register to Bank 3 prior to calling the C main function. This means that any references to the C stack, data and heap will be to that Bank.

Required Changes to the Standard C Library Modules

Moving C-related code and data from Bank 0 causes a few problems with the cc65 C library which was created for the single bank 6502 and 65C02. The problems generally fall into two categories: (1) references to long addresses, and (2) the library’s use of self-modifying code in the DATA segment.

For the N-Queens project, the first category is limited to a reference to the out routine in the vfprintf module. As discussed above, solving this is a simple matter of using the loword assembler function to just take the lower 16-bits of out‘s 24-bit address.

Both the condes and printf modules use self-modifying code to call or jump to code determined at runtime. The condes module changes the condes routine in the DATA segment to loop through all of the constructor and destructor routines for linked modules. The printf routine changes the target of the address jumped to in the CallOutFunc routine in the DATA segment as specified in the calling function. However, this method no longer works with the CODE and DATA segments in different banks. The call or jump changes program flow to the correct 16-bit address, but in the wrong bank. As such, these routines need to be modified.

The condes routine becomes:

.importzp ptr1,ptr2,tmp1
.segment        "ONCE"
.proc   condes

        sta     ptr2
        stx     ptr2+1
loop:   dey
        lda #0
        pha
        plb
fetch1: lda     (ptr2),y
        sta     ptr1+1
        dey
fetch2: lda     (ptr2),y
        sta     ptr1
        sty     tmp1
        pea .loword(index)-1
        lda #3
        pha
        plb
jmpvec: jmp     (ptr1)
index:  ldy     tmp1
        bne     loop
        rts

.endproc

Note that for reference I’ve retained the symbols from the original condes routine. Most of them are no longer needed. The symbols fetch1, fetch2 and jmpvec are where the original code modified itself to loop through the applicable constructors and destructors. This routine could be cleaned up with the use of 16-bit registers but would require other modifications to the C library. Given that it’s only called at startup, a little bit of inefficiency isn’t very noticeable.

It’s a bit harder to summarize the printf routine changes. Basically, I moved the self-modifying code from the DATA segment to an indirect jump in the CODE segment.

CallOutFunc:
        jmp (CallOutFuncPtr)

This required a new zero-page symbol and modifying the original self-modifying code to initialize it.

.zeropage
CallOutFuncPtr:    .word 0

...

; Get the output function from the output descriptor and remember it

        iny
        lda     (OutData),y
;        sta     CallOutFunc+1
        sta     CallOutFuncPtr
        iny
        lda     (OutData),y
;        sta     CallOutFunc+2
        sta     CallOutFuncPtr+1

...

I tried using some of the predefined C-specific pointers on the zero-page instead of creating a new one, but at least some are used by other routines called by printf and thus aren’t available for use.

One other minor change to the C library that was needed was to switch the zerobss module to the ONCE segment from the CODE segment. I’m not sure why this routine was placed in the CODE segment given that it’s only called at startup. Here though it needs to be placed in Bank 0. Otherwise, the C library initialization routine, init_c, needs to be modified.

Wrapping Up

That’s it. With those changes I can run the N-Queens project with most of the C-related code and data outside of Bank 0. In the image below you can see the C-based code running in Bank 1 (K=1) and accessing data in Bank 3 (B=3).

Update: 1/17/2023

I’ve added a branch to my fork of the cc65 GitHub repository with the changes discussed above. I’ve included a sample configuration file where you can specify the desired C code and data banks. See FORK.md for more details. My next post will be a running update of the new library.