Register | Login
Views: 19364387
Main | Memberlist | Active users | ACS | Commons | Calendar | Online users
Ranks | FAQ | Color Chart | Photo album | IRC Chat
11-02-05 12:59 PM
1 user currently in Rom Hacking: hukka | 2 guests
Acmlm's Board - I2 Archive - Rom Hacking - Working on a disassembler for 65816 | |
Add to favorites | "RSS" Feed | Next newer thread | Next older thread
User Post
MathOnNapkins

Math n' Hacks
Level: 67

Posts: 2038/2189
EXP: 2495887
For next: 96985

Since: 03-18-04
From: Base Tourian

Since last post: 1 hour
Last activity: 32 min.
Posted on 07-04-05 08:14 AM Link | Quote
Now don't get your hopes up, disassemblers were never meant to fully decompose a rom with the push of a button. Tracers can do that to an extent, but they don't encapsulate routines like C would. What I'm working on is meant to be a clear and superior alternative to console apps written for the same purpose - it's intended to take a routine in the file and encapsulate it.

EDIT: for those of you who like pictures:




It currently already works quite well, with the following features:

-Linear Tracing (right now it just goes from top to bottom)

-Autogenerates english labels with a variety of preset names, "ALPHA", ..., "MAGUS", etc...

-has a number of failsafes including
1. will alert you if it detects a BRK or COP instruction. These instructions are quite rare, and about 100% of the time signal an error on your part in choosing boundaries for disassembly or initial settings.

2. does an analysis of branch destinations, and will determine if they are misaligned (that's very bad). That means one of your branches goes to something that has determined to not be an opcode.

3. saves you time by determining if there are branch destinations to the end of the region you are trying to disassemble. (examples of this will be in the help file, whenever I get around to making it, with pictures).

-selectable initial 16-bit accumulator or X/Y registers.

-support for 512 byte headers

With all the above options it is already better than any existing disassembler I have seen, including the one in Geiger's Debugger. And there's probably other diagnostic messages I cant' remember atm.

Here's a list of features I'm currently working on adding and improving, straight out of my .cpp file:

List of things to add:

edit: strikethroughs are things that have been completed.


1. Save user preferences, possibly add the ability to hide the applicationp, and other Gui niceties.
-probably want to use a file to store settings.

2. -Warning on unconditional Program Counter changes without following branch
-Option to include "Alternate Entry Point" labels
-Generally I have decided not to accept the notion of routines
with internal data, though some of my manually typed disassembly may
allow this, I no longer do it


3. Conversion from Linear to Dynamic Tracing - outpuf buffer and many other things will have to be
revamped


4. Calculation of in rom addresses for JSR, JSL, and branch operations with no internal labels.
-Note, LoRom, HiRom, and FastRom should be specified as needed


5. Streamline spacing of operations to save time in various things.

6. Jump table generation - with conversion to Rom addresses
-Absolute Linear
-Long Linear
-Absolute Indexed
-Long Indexed

7. Possibly generate (i.e. rip from the rom) data tables with specified types of spacing

8. Handle RTF API to generate colorized text

9. Reconfigure the maskbits array and create new ones to store branch destination and source information.
-incorporate data concerning register width at each particular opcode.


10. *optional* add support for PHP and PLP and stack balancing diagnostics. not sure if I need that and it would probably get pretty complicated with more hairy routines.


Ok, so I'm looking for suggestions and input, as well as some Beta Testers.


(edited by MathOnNapkins on 07-03-05 11:37 PM)
(edited by MathOnNapkins on 07-03-05 11:58 PM)
(edited by MathOnNapkins on 07-04-05 07:46 AM)
(edited by MathOnNapkins on 07-06-05 06:24 AM)
d4s

Panser
Level: 29

Posts: 244/325
EXP: 142151
For next: 5734

Since: 03-23-04

Since last post: 13 days
Last activity: 1 day
Posted on 07-04-05 09:04 PM Link | Quote
looks really sweat!

dont have too much input for you currently, but here are some tidbits i noticed:

-add STP to the never-used opcodes.

-my personal preference of labelnaming is not to use any random names, but to use Label+actual offset in the snes' memory map.
makes it easier to look up stuff in the rom if something with the disassembly went wrong.

-hdma table logging.
might be impossible without keeping track of the dma channel regs now that i think about it.

i especially like the jump table generation part.

php/plp inside the current routine would be nice, yeah.

im really looking forward to this.
keep it up!
MathOnNapkins

Math n' Hacks
Level: 67

Posts: 2040/2189
EXP: 2495887
For next: 96985

Since: 03-18-04
From: Base Tourian

Since last post: 1 hour
Last activity: 32 min.
Posted on 07-05-05 12:16 AM Link | Quote

-add STP to the never-used opcodes.


Good suggestion. I'm considering including STP and WDM as other failsafe triggers. I might just collect a grouping of rare opcodes and call them that.


-my personal preference of labelnaming is not to use any random names, but to use Label+actual offset in the snes' memory map.
makes it easier to look up stuff in the rom if something with the disassembly went wrong.


Well, right now it has the capability of making up to 128 labels, though you'll probably never need more than that. Branches to labels that are not internal to the routine I just write as follows "BRANCH_$XXXXX". Now I use Rom-addresses for those. But you're saying you would prefer snes memory mapped addresses?


-hdma table logging.
might be impossible without keeping track of the dma channel regs now that i think about it.


I'm willing to give anything a shot if it will be useful. If you want to, e-mail me *edit: sorry now my e-mail address is listed for logged in users* with some rough details on how you think I might go about this, or post them here for all I care. I know a reasonable amount about dma and hdma, but I'm none too sure what you're referring to. Maybe a example of the type of output you're looking for would help.


i especially like the jump table generation part.


Me too


php/plp inside the current routine would be nice, yeah.


Well the only reason I was considering it was to keep a more accurate eye on the A and X/Y register width. But I've never seen it used that way. Of course, I wouldn't rule out that someone might have used it for changing register widths.

Okay, now back to work. Thanks for your input.


(edited by MathOnNapkins on 07-04-05 04:38 PM)
Jathys

Red Goomba
Level: 11

Posts: 27/48
EXP: 5916
For next: 69

Since: 12-21-04

Since last post: 8 days
Last activity: 8 days
Posted on 07-05-05 05:13 AM Link | Quote
>Good suggestion. I'm considering including STP and WDM as other failsafe
>triggers. I might just collect a grouping of rare opcodes and call them that.

Maybe you could include the opcodes in a seperate file. If the user wants certain opcodes included/excluded, they could adjust the list as needed???
neviksti

Goomba
Level: 8

Posts: 14/25
EXP: 1510
For next: 677

Since: 06-09-05

Since last post: 36 days
Last activity: 30 days
Posted on 07-05-05 12:07 PM Link | Quote
Somethings I'd love to see in a disassembler:

1] For massive disassembly / reverse-engineering of the code, it would be nice if the disassembler could use the information the user has learned from the code to make sections more readible. For example, as I read through the code I could make a table of variable names, routine names, and even "macros" (some code was obviously generated with macros and it's annoying to see similar chunks again and again ... when the only info you'd really need is just the macro name).

Then when dissassembling other sections of the ROM (or even re-disassembling the same sections to make it more readible), the code will be much much more readible.

To some extent the variable and routine names thing can often be done manually by find/replace in a text editor. But this gets quite time consuming for each new disassembly as the list gets longer. I am particularly interested in the notice "macro" feature, as it would make code much much easier to read, and currently there is no easy way to do this. I have to do it entirely by hand.


2] If you are using tracing abilities ... I'd love something that notices DMA loads and marks that section of the rom as data (and maybe even mention where it is loaded to, so we have an idea of what the data is).

-----------
Comments on previous posts:

Branches to labels that are not internal to the routine I just write as follows "BRANCH_$XXXXX". Now I use Rom-addresses for those. But you're saying you would prefer snes memory mapped addresses?

Yes, I would prefer snes memory addresses as well. These are the addresses in the actual opcode data (except for relative jumps), and for debugging purposes (putting in a breakpoint) this is the address needed. In short, the system address is usually the relavent address.


> php/plp inside the current routine would be nice, yeah.

Well the only reason I was considering it was to keep a more accurate eye on the A and X/Y register width. But I've never seen it used that way. Of course, I wouldn't rule out that someone might have used it for changing register widths.


I have seen this used. Not often within a routine, but quite often between routines.

In most coding styles that choses to change the register widths frequently, I see something like: routines will php, set to the widths they need for that routine, and then plp at the end of the routine ... for "linear" disassemblers this often messes up the widths for the code after that routine (since they don't do any tracing, or handle php/plp stuff).

The worst rom I've ever looked through as far as trying to keep track of the reg widths was the SF7 Bios. The coding style used there had no "standard" width and the routines didn't even always return with the same widths as when they were called. I'm not sure how the programmer kept that all straight in his mind. It was a horror to disassemble.

-------------

There are a couple projects I am working on now that require large disassembly of 65816 code. I can test your disassembler by using it, however I don't really have time to come up with extensive tests specifically to look for bugs.

The code I'm working on understanding right now was actually written in C, so there are all the lovely stack frames, pushed parameters, standard regs to return stuff in, etc. It takes quite awhile to go through since the code is not nice compact asm ... but sprawling "translated to asm" C code. If you could write a crude "decompiler" instead of just a disassembler, that would be incredible. But that's pretty much a different project entirely.

Anyway, good luck on your project. I look forward to seeing how it turns out, and if you want me to help test it, just let me know.


(edited by neviksti on 07-05-05 03:28 AM)
MathOnNapkins

Math n' Hacks
Level: 67

Posts: 2043/2189
EXP: 2495887
For next: 96985

Since: 03-18-04
From: Base Tourian

Since last post: 1 hour
Last activity: 32 min.
Posted on 07-05-05 02:10 PM Link | Quote
For massive disassembly / reverse-engineering of the code, it would be nice if the disassembler could use the information the user has learned from the code to make sections more readible. For example, as I read through the code I could make a table of variable names, routine names, and even "macros" (some code was obviously generated with macros and it's annoying to see similar chunks again and again ... when the only info you'd really need is just the macro name).

Heh I feel your pain like no other. That's certainly something I have though about, but am not sure will be incorporated into this program. Once I've got the next major source of updates I'll probably make this open source. But yeah I hate seeing:

PHDB (push data bank)
PHPB (push program bank)
PLDB (pull data bank) <-- forgive my unorthodox mnemonics. The normal settings use the standard ones. The toggle button in the picture triggers my own personal settings.

80000 times in a row. It gets a little redundant and is a waste of two extra lines. So yeah, it's an idea, but I'm not sure if it would go in this project or another project at this time. The only problem is that once you learned more variable names you'd have to do your disassembly again, right? One way around that is to keep a usage file, so to speak, so that it can dump into one big file on a whim. Perhaps including a hex edit control to do some things visually would also help...

I have to do it entirely by hand.
up until a few weeks ago I was doing all disassembly by hand

2] If you are using tracing abilities ... I'd love something that notices DMA loads and marks that section of the rom as data (and maybe even mention where it is loaded to, so we have an idea of what the data is).

As I said to d4s I'd be happy to implement something that advanced users would appreciate. But, I would need to have some idea how to implement it. Please e-mail me. Currently it does a minimal trace, but it has been upgraded to a nonlinear trace. Meaning, it passes through the region you want disassembled several times, enough to gather all the data it needs. It currently only emulates the M and X flags in the P register. It doesn't load values into an accumulator or store things in memory. Please e-mail me or PM me to give details b/c I want to know how feasible these things are.


Yes, I would prefer snes memory addresses as well. These are the addresses in the actual opcode data (except for relative jumps), and for debugging purposes (putting in a breakpoint) this is the address needed. In short, the system address is usually the relavent address.


I see your point of view. And I will include it as a toggled setting somewhere. I prefer to have the rom address b/c it helps me hunt down code that I'm missing. I basically want something I can punch into a hex editor and hit goto.


I have seen this used. Not often within a routine, but quite often between routines.

The worst rom I've ever looked through as far as trying to keep track of the reg widths was the SF7 Bios. The coding style used there had no "standard" width and the routines didn't even always return with the same widths as when they were called. I'm not sure how the programmer kept that all straight in his mind. It was a horror to disassemble.


SF7 BIOS? You're referring to a game doctor? Well I guess that mini stack will probably make it in. But I think I'm only going to track PHP and PLP on that stack. Thing is, I'll probably have to clear that stack on every pass, or else I could get some garbled and inaccurate results... perhaps a register width override at a particular location would be more appropriate?


There are a couple projects I am working on now that require large disassembly of 65816 code. I can test your disassembler by using it, however I don't really have time to come up with extensive tests specifically to look for bugs.


I do a lot of testing myself. And when I think I've found a stable version I try it out on my own disassembly and I generally trust it.

If you could write a crude "decompiler" instead of just a disassembler, that would be incredible. But that's pretty much a different project entirely.

If you mean something that converts the machine code to something C-ish, well... like I said it's going to be open source. I don't believe I know enough about how C compiles to write a decompiler. I imagine using such a program would require a lot of tweaking even by the user... in the short run anyway.


Anyway, good luck on your project. I look forward to seeing how it turns out, and if you want me to help test it, just let me know.


Thanks. I imagine you are referring to the X-band project for the most part, and I wish you luck there as well.


(edited by MathOnNapkins on 07-05-05 05:11 AM)
HyperLamer
<||bass> and this was the soloution i thought of that was guarinteed to piss off the greatest amount of people

Sesshomaru
Tamaranian

Level: 118

Posts: 5510/8210
EXP: 18171887
For next: 211027

Since: 03-15-04
From: Canada, w00t!
LOL FAD

Since last post: 2 hours
Last activity: 2 hours
Posted on 07-06-05 02:22 AM Link | Quote
This is really neat. Regarding the 'rare' opcodes, I like the idea of listing them in a file. Wouldn't a game with an expansion chip actually use the COP instruction?

One nice feature that I implemented in a really slow, unfinished VB Gameboy disassembler was formatted output. You could define exactly how you wanted the output, and basically it just did a search-and-replace to output things in that format. Like you would have the string "b:a on", which would produce "01:4000 NOP" (bank, address, opcode, line break), and customize the format however you wanted.
neviksti

Goomba
Level: 8

Posts: 15/25
EXP: 1510
For next: 677

Since: 06-09-05

Since last post: 36 days
Last activity: 30 days
Posted on 07-06-05 11:55 AM Link | Quote
Originally posted by MathOnNapkins
2] If you are using tracing abilities ... I'd love something that notices DMA loads and marks that section of the rom as data (and maybe even mention where it is loaded to, so we have an idea of what the data is).

As I said to d4s I'd be happy to implement something that advanced users would appreciate. But, I would need to have some idea how to implement it.
...
It doesn't load values into an accumulator or store things in memory. Please e-mail me or PM me to give details b/c I want to know how feasible these things are.
Well, you'd have to keep track of values stored to memory.
For the regular DMA, you need to keep track of data written to the $43xx registers. Then when $420B is written, just grab the source/destination addresses as well as length info of the DMA from your saved $43xx registers.

This information can be used to mark portions of the ROM as data, as well as allowing the user to know where the data is sent (what it is used for).


I do a lot of testing myself. And when I think I've found a stable version I try it out on my own disassembly and I generally trust it.

Okay. I was just offering to test it since you asked for beta testers and input. Sounds like you got it handled already.

---------------
EDIT:

HyperHacker:
Wouldn't a game with an expansion chip actually use the COP instruction?

That's not really how expansion chips are communicated with.
The real point is, that some programs may use BRK, COP, or STP. But they should be used rarely. If programs use them regularly, then a feature to select which opcodes to consider "rare" would indeed be nice. (If I remember correctly, LordTech's SNES C compiler used the COP instruction a lot due to his implementation choices.)


(edited by neviksti on 07-06-05 03:06 AM)
MathOnNapkins

Math n' Hacks
Level: 67

Posts: 2045/2189
EXP: 2495887
For next: 96985

Since: 03-18-04
From: Base Tourian

Since last post: 1 hour
Last activity: 32 min.
Posted on 07-06-05 01:01 PM Link | Quote
Well... keeping track of values stored in memory would certainly require a full or nearl full emulation. I'm not sure if I would add that at this time. I am mainly writing thie program for my own use, but am releasing it to the public b/c others probably would find it useful. But let me first ask, is this feature not implemented in the usage files of Geiger's debugger? I.e. if a DMA transfer is initiated, that area of the rom is not marked as such? I think it's more a task more appropriate for a full fledged emu or tracer, since you could be actually be doing DMA transfers on several different areas. For example, I've seen stuff almost exactly like this before:

LDA #$80

BRA dma_transfer

LDA #$7F

BRA dma_transfer

LDA #$00

BRA dma_transfer

LDA #01

dma_transfer:

*blah*

Now since there are obviously many different entry points into this routine, you will get varying results.... and.. quite frankly I'm not really sure how to deal with that with the code I have. And I don't even have an emu set up yet. I know I can write one, b/c I have one written in java on my webpage . So time is an issue there, but even with an emu set up, an algorithm for detecting these alternate results would be troublesome, wouldn't it? And that would necessitate many more initial settings, potentially. Such as the values of all registers, the state of the stack prior to the routine. It seems way too much for such a simple project as this. But feel free to enlighten me if I'm mistaken.

***

As for the "rare" opcodes thing. I think I will do the following... have settings for each opcode individually. And possibly allow the user to pick from some user defined sets that are swappable and can be saved for later use. Often if you are in a particular region of a rom some opcodes are rare while others are more frequent... I'm picturing a 16 by 16 grid of checkboxes or something... suggestions on this would be nice. Certainly if you are supposed to be in 16-bit mode, the opcodes ranging from 00-0F are more suspect than others. This includes COP and BRK. I don't think customizability will be a problem though...

***

As an aside, how are custom chips such as the Super FX communicated with? I had assumed that they might have used the COP, but I figured it was possible they might communicate with other special memory regions, like in the $2000 area of bank 0.
d4s

Panser
Level: 29

Posts: 247/325
EXP: 142151
For next: 5734

Since: 03-23-04

Since last post: 13 days
Last activity: 1 day
Posted on 07-06-05 01:32 PM Link | Quote
Originally posted by MathOnNapkins

As an aside, how are custom chips such as the Super FX communicated with? I had assumed that they might have used the COP, but I figured it was possible they might communicate with other special memory regions, like in the $2000 area of bank 0.


the super fx registers are mapped to the $3000
- $32ff area of banks $00-$3f and $80-$bf.

once the super fx is initialized and enabled,
communication is usually accomplished by accessing
the super fx' wram at bank $70 onwards.
theres a 8kb mirror of that wram in banks $00-$3f/$80-$bf,
from $6000 onwards in each bank.

the sa1 works similar, although its registers and ram arent mapped to the exact same areas.



(edited by d4s on 07-06-05 04:35 AM)
MathOnNapkins

Math n' Hacks
Level: 67

Posts: 2046/2189
EXP: 2495887
For next: 96985

Since: 03-18-04
From: Base Tourian

Since last post: 1 hour
Last activity: 32 min.
Posted on 07-06-05 03:21 PM Link | Quote
ah... so that's why games like Starfox and Doom have those seemingly huge .srm files tacked on. How do those games handle battery backup? As in, what bank do they write to instead? This is completely off topic, I realize, but it's my thread goddamit .

As for the dma/hdma table generation* and data marking, (to nevitski and d4s) I would suggest bugging Geiger about adding those features to his usage files. In the future I may be able to import such usage files for my own usage - *sigh* almost sounds like a pun. One thing I'd like to see in his debugger is breakpoints on writes of specific values, rather than the general case of noticing writes to a memory location (e.g. search for writes to $1E00 of value #$5555...) So... I'm ruling out in the short run that those features will be included. The nature of the emulation just is too incomplete, and the algorithm to detect such features with what I have seems hard if not nearly impossible to come up with in a general case.

*d4s: I'm still not sure what kind of table you'd want generated. If you can provide a simple mockup, either a graphic or text based, I could determine its feasability. Like if you're looking to create a running list of addresses where dma or hdma transfers take place, I don't suppose that would be hard. But I still need more info.

-----

But I will allow for optional rom address / snes address labels b/c we clearly have different methods, and it would be a cinch to work in.

------

As for more suggestions, they are always welcome, but at this point I'm going to start going back to coding a bit more intensively and try to finish things up, then get a release going.

HH: customizable output will probably be included to an extent. To what degree we will have to debate later on. I mean, I think most people would like the auto placement of internal labels (with or without addresses), but maybe some people don't like that, for whatever reason? who knows why they wouldn't but... who knows. On the otherhand, this isn't like a tracer that tells you a lot of information you might not even care about. like the P register, X, Y, A, etc.


(edited by MathOnNapkins on 07-06-05 06:22 AM)
d4s

Panser
Level: 29

Posts: 248/325
EXP: 142151
For next: 5734

Since: 03-23-04

Since last post: 13 days
Last activity: 1 day
Posted on 07-06-05 05:12 PM Link | Quote
Originally posted by MathOnNapkins
ah... so that's why games like Starfox and Doom have those seemingly huge .srm files tacked on. How do those games handle battery backup? As in, what bank do they write to instead? This is completely off topic, I realize, but it's my thread goddamit .


iirc its bank $78 onwards.
never worked on anything sfx battery backup related.
super fx games come with either 256 or 512 kbit ram onboard.
iirc, 1mbit max.
the work ram and battery backup ram is shared, that means its on the same chip.

Originally posted by MathOnNapkins

*d4s: I'm still not sure what kind of table you'd want generated. If you can provide a simple mockup, either a graphic or text based, I could determine its feasability. Like if you're looking to create a running list of addresses where dma or hdma transfers take place, I don't suppose that would be hard. But I still need more info.



i was thinking about the possibility of "extracting" hdma tables.
i frequently rip hdma tables for all kind of stuff from games because im too lazy to create my own, or because i lack tools to create some for certain effects.
when indirect hdma adressing is used, its sometimes very annoying to look up every value if your table has like 200 entries or so.
i dont like indirect hdma, so i usually convert these to "normal" hdma tables.
however, i dont think such a feature is feasible for a static disassembler and its not even very useful when implementing it in an emulator.

that was just a quick idea that came to my mind, nevermind.
MathOnNapkins

Math n' Hacks
Level: 67

Posts: 2053/2189
EXP: 2495887
For next: 96985

Since: 03-18-04
From: Base Tourian

Since last post: 1 hour
Last activity: 32 min.
Posted on 07-14-05 03:28 PM Link | Quote
Okay guyz... bumpage before the release. I expect to have it shipping by tomorrow. I've met some delays over the past few days since i forgot to add a few things. Currently the only thing that needs to be added is the ability to select which opcodes are rare and which are not. But that's being taken care of and I'm almost there. Nearly everything on my original list and most of the more doable suggestions have been added. The help file is nearly complete.

The release is set to be open beta b/c i don't want to bog anyone down with the responsibility of being a beta tester. As with any program there may be bugs but I try to squish them as much as possible. Alert me if you can verify one.

I'm also going to release the sources (for Visual C++ 6.0) But it's truly written in C, hardly any C++ really.

Things that didn't make it in:

php, plp, and general stack tracking. I concluded it was probably more work than it was worth. there are no current plans to add it in. If you desperately start needing this contact me.

data table generation, however JUMP TABLE GENERATION is here and it's a nice-uh . Choose from 4 slick jump table formats

hdma/dma stuff. Probably better suited for an advanced debugger and tracer. *shrug*

C source code recovery - no clue how to do this

insertion of labels for known variables - might be done in the future, probably by means of a list ing of #Define delcarations in a file you'll have to include in the same directory as the Rom with the same name but different extension. Interesting idea but the motivation isn't there at this time.

other stuff, like Euclid requesting reassembly capability. not in the foreseeable future will this happen

On Zophar's MDomain, somoen said I should detect hi-rom and headers.didn't do it. sorry. You have no business disassembling a rom if you don't already know these crucial details.
Add to favorites | "RSS" Feed | Next newer thread | Next older thread
Acmlm's Board - I2 Archive - Rom Hacking - Working on a disassembler for 65816 | |


ABII


AcmlmBoard vl.ol (11-01-05)
© 2000-2005 Acmlm, Emuz, et al



Page rendered in 0.016 seconds.