Warlock's Tower - a Knights of the Round mod/ROM hack (arcade, CPS) - custom edition

I created the Warlock's Tower edition of Knights of the Round (1991 arcade game, by Capcom) by first reverse engineering the original game (from binary), and then designing a series of modifications to craft a new experience. This was also a chance to learn Motorola 68000 assembly language.

Warlock's Tower includes the following modifications to the original game:

it is a single player game - only player one can start
stages have no time limit, and don't push you forward with annoying sounds
it is a one-life game; additional lives cannot be gained and continuing is disabled
player is allowed one magic attack (fire+jump) per stage
original game's intro sequences have been removed to begin playing faster

DOWNLOAD THE ROM SET HERE.

To run it in an emulator (such as MAME), start the game Knights of the Round (World 911127).

If MAME won't run it, then use:

mame.exe knights

Work environment

ROM set: Knights of the Round (World, 911127)
ROM set filename: knights.zip
Debuggers: MAME 0.230 built-in debugger (primary, used for program analysis), WinKawaks 1.65 built-in debugger (secondary, used to test modifications quickly)
Assembler: LEA (for assembling binary patches)
Hex editor: HxD
OS: Windows 10
Patching, byte swapping: custom-built tools

Technical details

From start to release, Warlock's Tower took several hundred hours of work. The first half was spent reverse engineering Knights of the Round. It consisted of analysis of the binary, data structures, overarching patterns, and then documentation and annotation of the disassembly generated by MAME's built-in debugger.

From there on, my time shifted more to development of new features. Analysis of the original program continued, though not exclusively like during the first month.

Achieving the modifications relied on patches, which were assembled via the LEA assembler, and then applied to the original game binaries via additional tools I created.

Before I started any serious coding, I designed the build script such that by running a single .bat file, all patches were assembled and applied, and then the output ROM set .zip was run in either of the two emulators.

Warlock's Tower relies on approximately 70 patches - most of them very small. Each patch replaces a contiguous portion of the original ROMs. Here are a few types:

short circuit: These are few bytes in size and either NOP out a call, or RTS (return) early, to inhibit/disable certain functionality from the original game. Example: disabling the annoying gauntleted hand which pushes you forward
custom main: This is a single, large patch with various entry points, meant to be entered from the original program. It resides in an unused portion of the 1 Megabyte program ROM, and writes/reads its state from an unused portion of the RAM
hooks: Their role is to be injected into the original program code and to JMP/JSR (jump or call) into the custom main patch, at its many entry points. Examples: "on stage start", "on stage end", "on animation advance".

Development log

Back when I translated the game to Romanian, I wondered how hard it would be to learn enough about how the game worked to make modifications.

Got two different debuggers working. Both of them can step through Knights of the Round. One is built into MAME 0.230 and the other is built into WinKawaks 1.65.

I have to become familiar with 68000 assembly language; I've never even seen it before.

My main debugger will the the MAME one. It is simply a more complete debugger than the one in Winkawaks.

The WinKawaks one has a handy "dump source" feature, which I used to dump a disassembly of the program ROM. I cross-checked in the MAME debugger to ensure everything lined up. My plan is to make all my annotations and record findings in this source code file.

Discovered that I can also dump more reliably from the MAME debugger, so I switched to that source file.
Studied the CPS1 memory map from MAME source code. Fortunately, KOTR is fairly standard CPS1 mapping, while some other CPS1 games - Street Fighters 2 Champion Edition - stray from it.

WinKawaks actually lets you POKE memory, so it's great for trying things out. Therefore I am using both:
- MAME debugger for program analysis
- WinKawaks debugger for modifications/observations

Found a first useful breakpoint: inside the "uncancellable action" routine, which cleans up bodies, breaks barrels, makes soldiers run. Traced it all the way up to one of its higher-level call sites, which handles/renders all extra objects (treasure, barrels, barricades).

Found call sites for sm_0000001_handle_extra_objects
if nop'd out, no extra game objects appear (treasure, barrels, barricades, Merlin, hit clash animations)

Found how extra object slots are skipped (byte 0 = 0)
if nop'd out, the game will attempt to animate all kinds of garbage, with hilarious results:
Merlin continually appears and disappears at the start of stage 1
hit clash animations persist for a while
sometimes hitting soldiers turns them into tigers!
to complete stage 1 first half, use a power attack because the last 2 soldiers won't charge in and are offscreen
near Scorn, the game most likely crashes spectacularely

Most of today's work was spent understanding the storage and workflows around the "extra objects" which come and go and must be animated and handled appropriately while on screen.

I find that I'm learning Motorola 68000 (m68k or 68k) assembly language quickly. The debuggers let me try out instructions I haven't seen before with great ease - I use both documentation and observation. I can't imagine how difficult this would be if I hadn't programmed in assembly as much as I have.

Important note on memory: top 8 bits of logical addresses are dropped to compute physical addresses. Example 11FFCB7C logical maps to FFCB7C physical.

With a better understanding of "handle extra objects", I documented a few more handlers. It seems that they reused handlers, such that one of them is used for many things, like "lancelot falling down" and "soldier starts running".

It seems like start of arrays are always offset from A5's constant value of FFFF8000 (FF8000 physical).

I've found a nice routine which clears many arrays to zero! This taught me of these arrays, though I don't yet know what they hold. However, I've found all places where these arrays are operated on. A few breakpoints there should soon unlock their purposes.

The "soldier starts running" extra object is actually just the dust at his feet when he starts, not the entire soldier.

Found 2 sets of 3 routines, near each other. The first set subtract one to a global variable, and the second set add one. Upon watching the most invoked (by call site count) one, I found that this tracks remaining capacity in the extra objects array.

I have no doubt that Knights of the Round was written in Motorola 68000 assembly language by hand. I have at least three reasons for this:
1. There are routines that are reached by both jumps and calls. I have not (yet) seen anything like this generated by compilers.
2. So far I've found no routines which have calling convention prologue/epilogue (e.g. parameters setup/cleared from stack).
3. I've also found evidence of array iteration copy/pasted code. The third copy of an array-traversing loop was pasted there to iterate over a 2-element array. I'm sure even a modest compiler would have unrolled that loop. This is despite the fact that SOME loops are partially unrolled - so at least one programmer knew this pattern.

The overall code quality is poor. There is much code duplication. For example, there are at least two routines for animating characters and extra objects - but they are almost identical.

I've focused in on a routine I found to be animating soldiers. This led me to a more general (500+ call sites) animation frame routine. A good result from here would be to figure out some of the arrays holding enemies.

This is a good find, because it is the first significant routine I find which is a leaf - it calls no other routines.

This routine (now bearing the label sm_0000049) has a tell-tale "subq.w #1, ($38,A0) followed by a "bne $3982". A decrement followed by a zero/non-zero check screams "frame counter". A quick breakpoint showed me that the conditional branch is taken 1-of-N times and not taken (N-1)-of-N times. The value of N changes depending on the animation (attacking vs. standing still). This must mean that the word at offset 0x38 represents "remaining video frames until next animation frame".

I am noticing register A0 being used to pass in pointers to the start of a multi-field entity, and then used by offsetting into it. It would be useful to map out and figure out various fields. Offset 0x38 is a good start.

I used the A0 value for the player to find the start of the "animated entities" array, by the simple assumption that the player is likely element zero.
I don't think this is a big array for all animated entities. I think there might be one for enemies, and separate single slots for player, merlin, etc.

Since the MAME debugger is able to break on vblank, it was easy to identify the beginning and end of the vblank (video sync) interrupt handler routine. It contains calls to some interesting routines for things like reading user input, setting up the video frame, etc.

I found the routine which reads player input (every frame) from the hardware. Player 1 and 2 input is read from a single word at adress 0x800000 (CPS1 player inputs area). However, player 3 input is read from the byte at 0x800177 (CPS-B custom area) - this is because the JAMMA standard specifies only two players, the third player being Capcom's extension to the standard. The routine saves these button states to RAM.

Found a bad bug in WinKawaks which makes it crash when poking values into RAM.

Switched to using the newest MAME ROM set with WinKawaks, so the same ROM set is used in both MAME and WinKawaks.

I switched things up a bit and started creating a toolchain which will allow me to easily assemble (from source) and apply patches to the Knights of the Round ROM(s).

It was difficult finding an 68000 assembler which could generate flat binary files. Finally settled on LEA.

After a bit of searching for a simple binary patching tool, I realized I would be better off just writing my own - which I did.

Work continues on a tool chain that takes care of assembling patch source code and injecting it into the program ROMs. I'm designing it to also make it easy to add future patches.

The tool chain needed two more utilities: one for byteswapping a binary file, and one for splitting a binary file.

The tool chain assembles and injects all patches into the Knights of the Round program ROM. Since the program ROM is split into two 512kb files, I had to merge them prior to patching, so that patches across the 512kb boundary can be applied properly. After patching, the program ROM is split into the two 512kb files again.

The tool chain now starts from knights.zip and patch directories containing patch source code and produces a modified, patched knights.zip.

Added convenience assemble.bat scripts in each patch directory, to help with development.

While creating a simple text modification patch, I learned that the first (at least) one byte BEFORE the actual text holds attribute information used when printing:

index (from first char)
-1 bit 0-2: colour (0=blue, 1=red, 2=green, 3=orange, double width, 4=red, double width, 5=blue/red gradient, 6=red double width)
-2 Y position of text, with 0x05 being left most column
-3 X position of text, with 0x02 being top most row

Looking at it again, it seems like there are 3 bytes before that are meaningful. Two of them hold X, Y coordinates for the text and the third holds attributes.

Watchpoints are extremely helpful. The following watchpoint seems to be leading me to the game's "print text" routines: wpset 74879, 1, r

Found the "print character" loop. Identified the game's string terminator to be '/'.

With a bit of experimentation, I figured out the entire sm_0000074_print_string routine, as well as string formatting and lookup. Printing ROM strings uses as argument an index into a string pointers array. This makes it unsuitable for custom strings, since I cannot add to that array easily.

The best solution I can think of is to create my own routine, which doesn't work with those indexes - instead, it will take in a string pointer. It will then invoke existing print logic, skipping Capcom's index-to-pointer translation. This routine will be more general, but I need to find a large chunk of unused ROM to serve as my target for patches.

Judging from the "Insert Coin" blinking text during attract mode, it looks like sm_0000074_print_string writes tiles on layer Scroll 1. This makes sense since scroll 1 is the highest priority (drawn last, "on top") - and we want nothing to cover our text. Verified this with WinKawaks's "disable layer" functionality.

This led me into trying out a few things to see how tiles work. Found some CPS1 documentation and verified things like horizontal flip bit and vertical flip bit which will flip the letters of the printed text.

They've saved a lot of room by using register A5 (with a constant value throughout the game) as a base to calculate addresses. Many instructions that would otherwise take up 3 words only take up 2 this way.

I want to investigate more about scrolls' X/Y position. I started with a search for , $8001 which should find things like move.w ($241a,A5), $800100.l that is, CPS-A/CPS-B port writes. These are probably what shifts scrolls.

The attract mode animation of the insertion of Excalibur oscillates scroll 1's X between 0 and 1 to obtain a "sword is shaking" effect. Sword is on scroll 2, stone is on scroll 3, portion of stone which is "in front" of sword is a sprite.
Found where scroll 2 X shift is xor'd with 1 to oscillate intro sword left-right
Found where scroll 2 Y moves down by 3 pixels per frame so that sword descends

I figured out most of the scrolls 2 and 3 (used for backgrounds in-game) and how they shift. Documenting these newly-found variables and ports took quite a bit of time.

The 68000 is the first processor I work with, whose exclusive OR opcode is not xor, but eor.

Scroll layer position is initialized to 0 in 6-7 different copy/pasted code blocks. This could've easily been cleaner.

I like how they approached string erasing, except for the copy/pasting: the "erase string" routine looks just like the "print string" routine at, except it draws a transparent tile instead of whatever ASCII current character is.

I found a few more copy/pasted "print string" routines. sm_0000109_delayed_print_string is particularly interesting because it inserts a delay after each character it prints. This might lead me to learn about delays. A first glance at this routine indicates it is based on trap 3 (interrupt vector 35). Another benefit of this is that I've been wanting to figure out trap 3 for a while.

Trap 3 (interrupt vector 35) is a busy wait routine. It takes one argument in D0: length of delay in frames.

Figured out some more print string routines.

Found the strings array which holds enemy display information (name, portrait, attributes). The string formats here are inconsistent with the other strings array. For example, they store attribute words instead of attribute bytes. Also, they use '/' as string terminator.

Found the routines which display the player info (at the top of the screen) for the three heroes. Each of the three (Lancelot, Arthur, Perceval) have their own copy/pasted higher-level routine which invokes smaller routines for each of the components of the info panel (life bar, level, score, etc.).

I went digging around the extra objects array I had found two weeks ago and found other "allocate" routines, which look similar but operate on other arrays. I guessed that those probably store enemies. Then I ran into a nifty trick. If using D0 to return success=0 or failure=1, then using dbra D0, addr can be used as an if/else.

I found what looks to be a high-level allocator of entities, which can invoke various allocation routines. This could lead me to how levels are laid out, allowing for possible modifications to the enemies that are encountered, as well as dynamic encounters.

If a higher value than 1 is used as the immediate value passed into trap 3, all animations in game are slowed down.

After much work, I've discovered most details of how pre-placed entities are spawned on a level. There exists a global "entity spawn state", which holds a pointer to the next entity in the registry to consider spawning. As soon as player's scroll X shift reaches this pointed-to entity, it is spawned, along with any other that might need spawning - since there could be multiple entities with the same X shift spawn point. Each level has a "registry" of entities to be spawned, at specified X shift values (of scroll 2).

At the start of each stage (or sub-stage), the global entity state is initialized with a pointer to the first pre-placed entity in that stage's registry. I verified my knowledge of spawning by moving merlin forward, requiring player to scroll X a bit before merlin appears and vanishes.

Poking value 5 (old 4) will turn the first soldier on stage 1 into a incorrect-paletted, ghostly white-faced, unkillable Braford. He scared me as he charged from off-screen to attack me. That byte represents the "enemy type" byte of the first soldier in stage 1. By trying different values for that byte, I mapped many of the game's enemies to integer values - even some bosses like Muramasa.

Enemy type value 0x0F is particularly interesting because it reveals that Capcom had most likely planned a regular enemy version of Balbars, called MC Hammer.

I created a patch which contains the pre-placed registry entity info for the first soldier and two barrels in stage 1. This has allowed me to figure out quickly most fields in the 12-byte entity entry.

I've spent a few days just trying things out in the entity registry, sometimes achieving funny effects. I still don't fully understand some of the fields, but I have enough knowledge to spawn a variety of enemies, in a variety of ways (run in, fall from sky), as well as full understanding of how to spawn treasure in barrels and barricades.

Having developed the patching toolchain beforehand was an excellent idea. It let me try out and find all sorts of interesting animations. It also help me figure out a number of flags used in entities.

These registries are not just for pre-placed entities; they also include run-ins.

Each stage's entity registry seems to have a number of "special" entities at X shift 0. Being at 0 means that they get allocated before the player can even move, at the start of each stage. For example, at the end of stage 1, there are two soldier who run in. They are represented by a single entity, which holds a "batch ID".

There are many such batches, one for each run-in encounter throughout the game - including the funny soldiers falling from the sky before Arlon.

Figured out one of the long-standing unknown arrays: the entity batches array, where each entry represents a batch of enemies who will run in at a given time - such as the two soldiers who run in at the end of stage 1. This was found after understanding entity registries and noticing that all entity batches used allocation function 2. This was further resolved in a table of allocation function pointers, the respective function operating on the previously-unknown array.

Before the entity registry of the first stage of each level is a table of pointers to each of the stages' entity registry. This probably means that entity registries are easily relocatable.

I no longer think entity palettes are referenced from entries in entity registries. This is because when replacing stage 1's pointer with stage 2's pointer, objects such as the broken flag at the bottom of the screen were rendered with the wrong palette.

Found and documented all entity registries, including the three attract mode ones, and one unused used for development.

Palettes are actually loaded per-stage (level, not sub-stage). This makes, for example, all soldiers in later stages look different than the "weak" ones on stage 1-2.

The routine which sets a pointer to the entity registry that is appropriate for the stage/substage that is starting will probably lead me to further interesting findings around how stage/substages are set and managed. I remember seeing a video of an older Knights of the Round hack which swapped stages around. It didn't make the game more interesting, though.

Capcom's code is spaghetti; I think that their programmers cared little for encapsulation. For example, the "current stage" (global) variable is referenced 104 times throughout the program. Not to mention that certain stages/substages are treated specially.

After identifying the game-initializing setting of stage and substage to 0, I was able to have the game start on a different level.

Very strange... Knights of the Round score is stored in BCD. This explains why I've been unable to find the level up table by first converting what I was seeing on screen to hex - I simply should have looked up BCD.

I found the level up table. Depending on the modifications I make to the game, I suspect characters will have to level up more quickly. However, I've seen a hack of this game which levels up the character after each score increase, which I found silly.

Rendering the player's current score was more difficult than I thought. It's actually split into an initial rendering routine, and then a different routine when the score changes.

More work around how life bars are drawn, since I might want to extend the player's life bar on level up.

Figured out how large text (like INSERT COIN flashing at the top of the scren) is written. It took a while, since I started from finding the tiles among the game's graphics. However I modify the game, I will remove them because they're annoying and take up too much space at the top.

Identified the gameplay phase's main loop! This is a good breakthrough.

After 3 tries, I still can't figure out how the gameplay phase main loop is entered. It's probably a series of jumps to various function pointers in various tables.

It is very fortunate that the last 3 words of the main loop are not referenced by anything explicitly. This means that they can be replaced by a jsr to my own injected code (a jsr XXXX.l takes up 3 words), so my code can be called during the highest-level loop of the game. My code's epilogue can then perform whatever those 3 original words performed.

My latest visits to the game's test menu has made me realize that its sound test submenu is a perfect avenue to learn about how to play sounds. Sound playing is based on a queue. Sounds are added to the queue, and the vblank interrupt outputs to a few hardware ports, as needed. I reached an "enqueue sound" routine which happened to be immediately after vblank interrupt's "check queue" routine - that I had found in the past. It was akin to a bridge being built from each shore, and reaching the middle harmoniously.

Once again, the patch development toolchain pays off its initial investment. Within not long, I wrote three tiny patches which reduce boot-up time (up to when the game can be started) from about 16 seconds to about 1 second, by skipping CPS boot tests and all delays up until when the sword is inserted into the stone.

Trap 0 seems to be used to register callbacks. By pure chance, I NOP'd one of the many calls to trap 0 only to realize I cannot enter test menu any longer. I verified that one of the arguments passed into the trap 0 invocation was the address of the start of test menu logic.

Found how player 3 coin and start are read - they are bits 6 and 7 (which are unused for the player 1 and 2 button status) of player 3 button status. Player 1 and 2 start and coin are in a separate register, along with test mode.

I'm sure trap 0 holds the key to understanding transitions between high-level game "components", such as attract mode, attract special move demo mode, screen transitions during gameplay (e.g. finishing a stage or a substage). Each component which is registered begins by setting A5. A5 is a presumed-constant value register used to reference data in RAM via word offset. The only times in the game when A5 is set (to the presumed-constant value) is at the entry point of such components.

This means that components make few assumptions about register values, which means they are very high-level. They might be the key to creating new arrangements of components, if indeed switching to a new component "resets" stack, etc. However, I'd bet that even leaking a bit of stack memory (transitioning between components in an unintended way) would be fine, as long as it doesn't happen too often.

After much more work around the mystical trap 0, I am now fairly confident that all of the traps used in the program deal with tasks. The program contains a loop that begins shortly after the CPS startup boot tests. The loop iterates infinitely over all task slots, invoking the tasks' callbacks. Example traps: "task add", "task yield", "task yield delayed", "task remove".

I found an interesting, often occurring pattern: 1. add screen transition task (fade) 2. yield loop until no transitions are active.

The sound test menu is great. It allows finding various places very quickly, as long as you can recognize the sounds easily. In a very short time, I wrote a patch that completely removes the annoying "GO" hand that pushes the player forward.

Wrote a patch which silences the time remaining countdown (by second).

Something I've put off for some time is finding a large, contiguous chunk of unused program ROM. Its purpose will be to host my modifications, with hooks from various places in the original program. I got this solved, but I would like to design and organize my code changes into this chunk in such a way that the hooks in the original program don't have to change much over time.

An idea is some sort of a routine catalogue right at the beginning of the chunk, containing entry points. Since they're at the beginning, their addresses almost never change. Hooks in the original program invoke these entry points, consequently remaining mostly unchanging.

Laid down the organization of the main patch, which will contain all of the modifications, each exposed as a routine that is either jsr'd (called), or jmp'd (jumped).

Successfully tested the first hook from the original program into custom code: vblank interrupt handler epilogue, which gives the custom code a chance to act right before the end of the vblank interrupt handler.

Decided on a chunk of 0x8000 bytes, starting at 0xF7000. The first 0x600 bytes are reserved as entry points - immovable over development time.

By porting over and improving the flexibility of "print string" and by hooking the gameplay main loop, I have been able to write text on screen.

MAME's disassembler follows a slightly different syntax than LEA, so I had to adjust some of my recent source code.

I liked how the Capcom programmers offset everything from A5. This technique has an additional advantage for me: if whatever I pick as the base for my RAM usage happens to collide with the original game's RAM usage, I can relocate all of my usage by setting a new value into my base register, in one spot.

I've set up several hooks into the original program. Each sets a "last transition", depending on the hook's location. Examples: hero selected, stage started, stage ended. The result is state which tells me if actual gameplay is taking place. This state allows for modifications which only appear when the player is in control of gameplay.

Through analysis of the EXIT option of the test menu and knowledge of task management, I was able to write a routine which reboots the program.

Wrote a patch which keeps the game on the title screen, preventing it from moving on to any attract mode hero presentations, demos, etc.

Wrote a patch which skips over the attract mode screen which shows the gauntleted hands inserting excalibur into the stone, at game startup.

Wrote a patch which skips over the high scores list (after a game over), going directly to the title screen.

Found the routine which prints a large digit.

Wrote a patch which disables continuing after losing last life, irrespective of DIP switch setting.

Wrote a patch which forces the initial number of lives to 1, irrespective of DIP settings. Replaced patch soon after with a few lines of code which repeatedly set player 1's reserve lives to 0, nullifying the effect of further coins during gameplay, as well as the effect of in-game 1ups and 2ups.

Wrote a patch which disables all player 2 and 3 controls. Improved it to also disable player 1 start and coin when game phase is not attract mode. This has the welcome side effect that after the first credit, no subsequent credits can be added by player 1.

Wrote a patch which limits the credits obtained by inserting a single coin to one, irrespective of DIP switch setting.

I've invested the time into the above changes because I would like a game mode where the player gets a single life, without the possibility of continuing. I have a few ideas I'm entertaining, but common amongst them is the notion of single life play.

Wrote a patch which disables the flashing "insert coin" and "press start" at the top of the screen, in the slots in which there is no active player. The game looks a bit better now, since more of the background is visible.

Wrote a patch which disables the "press start" screen's information panels - which might contain "press start" or "insert coin" for player 2 and 3.

I'm becoming better at locating routines by first looking up the tiles they might display via the game's test menu. Then looking for the sequence of tiles in an array, followed by a watchpoint on that memory (read).

Found and documented DIP switches.

The "press start"/"insert coin" screen now acts as if the game was single player. Also, it accepts only one coin, awarding one credit, only prior to game start.

Removed substage time limit.

Wrote a patch which skips over the stage intro screen.

Fixed a bug whereby the player marker still appeared during the transition skipping over the stage intro screen.

Fixed a bug whereby one coin could yield multiple credits in 2-player mode (DIP setting).

Removed a reference to inserting a coin to play a 2-player game. Found a great routine which prints large font (2x2 tiles per character) strings. Like the ASCII string printing routine, this one offsets pointers to strings from a hardcoded value in ROM. This means that I will have to re-write a more general one, which works with strings from anywhere in memory.

The code for handling coins, credits, and displaying various "insert coin" and "press start" messages is terrible. It's duplicated everywhere, with tiny modifications which surface when you start modifying DIP settings. To turn the game into single-player, one life, has taken me about a week.

I will definitely use these in my modification. If I introduce an additional "task yield delayed" in the gameplay main loop, everything is sluggish. If I replace the existing "task yield delayed" from the gameplay main loop to "task yield", everything becomes fast!

An often-used pattern through the code is for the routines which take in a pointer (maybe to a player structure, maybe to an animated object structure) to be followed by a table of pointers. At a known offset, the structure passed in will feature an index into that table. This works as a switch statement which uses less code, at the expense of readability.

Wrote a patch to hide the top score displayed during gameplay in 2 player mode.

Spent a good 20 minutes trying to figure out why relocating an entity registry failed with an address error. The cause was misaligned data. The 68000 requires pointers to be word-aligned, and one of my strings (which came before the registry in memory) had an odd character count.

By hooking the place in the original program where the current substage's entity registry is set, I am now able to control the enemies that appear on the current substage. This not only makes research into enemy types and how to spawn them, but is also salient to the modification I intend to make to the game.

I answered a long standing question: how does the character exit/finish a substage? The answer is in a special entity which is defined in that substage's entity registry.

7 week update:

By this point, I believe I have figured out 95% of the necessary portions of original program. That is, necessary to complete my modification. My work has transitioned over time from pure investigation and research to development of changes. At this point it's mostly new development.

Each stage has a special entity in its entity registry which handles the hero's animation at substage finish. For example, the one in the king's castle makes the hero go up the stairs.

I've spent few hours trying to turn a boss into a regular enemy. The main issue I faced was that no matter what I tried, after the boss's death, I could not get the substage to end. As much as I wanted to have multiples of bosses to fight, I abandoned that idea and resumed work around stage transition.

I've decided on Knights' Festival as the introduction stage. The hero will walk up (controls disabled) to the stands, where the ladies there will inform him of the mage who has gone insane and is sending out troops from his tower. He performs strange experiments in his tower, one on each floor. He experiments with traveling to other times and worlds. The hero says "I will take care of him!", arrows fly in, hero tells his army "Stand back! I must face these challenges alone!".

From there on, the action takes place only on stage 3-3, whereby the hero keeps climbing up the stairs to new floors of the tower. I don't know yet what happens at the end.

Wrote functionality for displaying text in-game at specified time intervals and locations. This is necessary for dialogue during the introduction stage, as well as perhaps advice from merlin, or hero's own thoughts. In any case, this will be a difference between the silent protagonists of the original game and my modification.

I know have the pieces in place to start in an introduction stage and then progress through multiple tower floors. I will have to devise some sort of an ending... even if it has to restart the machine.

I spent some time investigating more entities that can spawn. I finally worked out falcons - the aggressive ones that remain on-screen and fight you. I also figured out the ones that perch until you get too close.

Added functionality which allows each custom substage to specify the palette of an original stage.

I wanted a more persistent fire, but unfortunately, the one on Muramasa's stage (with the fat men and magician laying down around it) seems hardcoded somewhere, and is not an entity in the substage's registry.

Printing and erasing text now supports line breaks.

Wrote a patch which removes the drawing of any graphics (e.g. two knights over a table) and stage-specific text from the game over screen. This is because those original game locations are not really part of my modification.

Documented most of the game's music. I intend to play different music during gameplay, because otherwise every floor would have the same music.

Found a good hook spot for where a substage's music is scheduled.

Cleaned up boot start-up so that no text at all even flashes on the screen before it gets to the title screen.

I found a great ending! The hero will go to the meadow where the sword had been, with an explosion behind him. He has been transported away from the tower through magic.

Wrote functionality which lets me specify music for each floor.

Took on management of silencing sounds between stages so that I can continue playing a song if it spans multiple consecutive floors. The reason here is that my floors will be short compared to entire substages in the original game, so I don't want the music to restart too often.

Implemented code which holds the player for a while longer at the start of a substage by disabling player 1 controls for a set number of frames. The reason for this is so that the hero can "speak" in some substages, before he starts.

Wrote a system of holding the player - that is, disabling his controls while enforcing some hardcoded controls. This is useful when advancing story. It is also useful when the hero has to speak (or be spoken to) at the start of stages. Another place I can think of is at the start of the last battle - the mage could say something.

Fixed a bug in the music selection routine which incorrectly did not stop "select hero" music when starting on a custom substage other than the first.

Forcing a starting hero level is broken. Next level values past level 2 are incorrect. My current method of doing it is insufficient and more research into the original's game initial value is needed.

The game forces a higher starting hero level depending on the stage. I believe this is so that if a second player joins in late in the game, he is not starting from hero level 1, but from a more appropriate hero level.

Removed my hack (or I guess, my hack's hack) of forcing player level to 1 at the start. Instead, I identified the proper routine which takes care of this and nop'd it.

Changed move.l to move.w to fix a bug in keyframed text and player hold logic which prevented text in later substages from being displayed.

The boss now laughs on spawn and every time he hits the player.

Fixed an issue whereby text from the next substage could appear at the end of the previous, during the transition to next.

The boss now has his own unique name.

Improved method of matching current interacting enemy to override boss's name with the custom one.

Changed reboot code to account for some uninitialized data in the original program which prevented coins from registering after the reboot from the victory substage.

The victory substage is now fully implemented.

Disabled the mega move (jump+fire) during the boss fight.

The boss is now much quicker than the normal enemy on which he is based. He provides a solid challenge.

The title of the modification and floor number are now printed on screen during gameplay. I used a conversion to BCD to make printing easier.

Much fun can be had when hooking a place in the program by replacing bsrs. From the custom, called code, the call would be too far for bsr, so we naturally change it to a jsr. However, when one forgets to make it an explicit long jsr, bad things happen.

Title screen now displays the modification's final name: Warlock's Tower. Properly displaying this so that it doesn't linger across transitions took about two hours. It is using a version of "print large font text" which I adapted from the original program.

Fixed up all saves/restores of registers to also save D0.

The boss fight is difficult and I think this is how it will remain.

The title screen now plays the music played during the "how the story ends" scroll at the end, in the original game.

Added a few patches to silence all music but the title music until game starts.

Title screen and hero selection screen now play music as I've intended.

Disabled test and service mode by hardcoding what is read from the DIP switches. The reason for this is to bring the game closer to a single player game.

Implemented a few fun changes to gameplay: fast attacks, slow attacks, reverse controls.

Hardcoded the DIPs to the hardest difficulty (enemy attack frequency and enemy attack power). I will balance each floor around this.

I believe I have implemented all of the code and logic changes I have been planning. From here on, most of the work will be towards floor design.

Fully scripted the introduction stage.

Fully scripted the first floor, where merlin explains.

Completed up to floor 12 today. It's very enjoyable, and I came up with a few new ideas, including a very challenging floor with six tall men and spiked balls.

Fixed a bug whereby BCD values past 15 were not computed properly.

Completed up to floor 18 today. Some are fun and light-hearted, some are very difficult.

Found many more interesting entities to use, such as broken barrels and food/treasure that can be placed directly on the ground (and not in a container).

Completed up to floor 21 today. All special floors are done (controls swapped, fast attack, slow attack).

Replaced my previous approach of limiting magic attack with a more reliable one. Player now gets one magic attack per level.

Whether the magic attack is available is now clearly displayed at the top of the screen, in large font text.

Magic attack is also initiated from a totally different spot, which handles the case when fire+jump are not pressed right at the same time, but fire first, then a delay, then jump. I think this is for the magic attack to be more lenient with regard to player input.

I've discovered that each hero has his own, copy-pasted pair of magic attack routines. Lancelot is done, but I still have to do the other two.

Implemented patches to account for each site. Magic attacks (fire+jump) are now limited to one per floor.

Fixed a bug in the player holding routine which momentarily allowed player control during the last video frame of the holding entry.

I made sure boss is beatable with all 3 heroes. Also toned down his aura by 16%.

I was unable to easily bring the player to stand in front of the boss. The problem is that Perceval walks the fastest, then Arthur, then Lancelot. Without finding a good place to stop them so that all 3 could see the boss, but not be attacked by him, I decided to keep complexity low and stop them before they see him.

Busters made the fast floor way too difficult (I couldn't beat it in 5-6 tries), so I replaced them with different enemies.

Played through and documented all average score gain per floor.

Fixed an issue on the magicians floor, whereby the broken barrels didn't show up.

Patched the level up table with my own values, calculated based on the average score the player will earn on each floor.

The game is ready for release.