Acmlm's Board - I2 Archive

Register \| Login
Views: 19364387	Main \| Memberlist \| Active users \| ACS \| Commons \| Calendar \| Online users Ranks \| FAQ \| Color Chart \| Photo album \| IRC Chat	11-02-05 12:59 PM

1 user currently in Rom Hacking:

hukka | 2 guests

Acmlm's Board - I2 Archive - Rom Hacking - Question about 65816 structure

Add to favorites | "RSS" Feed | Next newer thread | Next older thread

User

Post

DarkPhoenix

Octorok
Level: 9

Posts: 9/31
EXP: 2669
For next: 493

Since: 03-08-05

Since last post: 23 days
Last activity: 16 hours

Posted on 04-27-05 09:15 PM

Link | Quote

Any of you with more experience in 65816 than I might be able to answer this...

Note: I'm running on the pretty safe assumption that game binaries are originally written in a high level language, and then compiled into 65816.

When you compile a function call in a high level language into x86, there are always certain registers pushed onto the stack, among other things, during a function call. (though of course they could be different depending on the compiler). Because these are generated by the compiler, if you can identify that series of statements, I'd imagine you could more easily identify the beginning of a function.

Well, my question is, has anyone seen anything like this (particularly things like a group of registers that are always pushed) in SNES binaries that would make the beginning of a new function more easily identifiable, or would I have to trace all of the branch and jump statements manually if I wanted to try to get an idea of the general layout of the code?

Or, perhaps an even more important question...does anyone know of a tracer/disassembler for SNES that disassembles correctly and would let me output to a file? All the ones I've tried don't seem to properly follow the size of the registers, or have some other weird problem (like Tracer, which seems to interpret opcode 00 as BRK, rather than NOP, which throws off the rest of the disassembly)

blackhole89

LOLSEALS
Moderator of ROM hacking
EmuNET IRC network admin
Head GM of TwilightRO
Level: 47

Posts: 680/971
EXP: 739208
For next: 26995

Since: 03-15-04
From: Dresden/Germany

Since last post: 14 hours
Last activity: 12 hours

Posted on 04-27-05 09:23 PM

Link | Quote

Opcode 00 is BRK, not NOP. A call to it is comparable to cli; hlt in x86 assembler because it immediately crashes the SNES.

A disassembler called D816 works quite well for me, although I use SPASM for the most part.

MathOnNapkins

Math n' Hacks
Level: 67

Posts: 1780/2189
EXP: 2495887
For next: 96985

Since: 03-18-04
From: Base Tourian

Since last post: 1 hour
Last activity: 32 min.

Posted on 04-27-05 09:36 PM

Link | Quote

This depends on what kind of function call you mean. If you mean an interrupt (IRQ) or an non maskable interrupt (NMI) then yes. During those, all registers are put onto the stack. But those happen relatively rarely compared to the rest of code execution. A JSR or JSL for example, only push their return address to the stack (Program Counter and Program Bank if Long). I mean, it's possible that some companies implemented their functions this way, but given how many subroutines there are , and the limited space of the romfiles, that would get expensive in space reallllllllly quickly.

So the answer is no, b/c I've never seen that in any code I've looked at. Geiger's Debugger is an option for tracing, but if that's all you want to do I would recommend his older build of snes9x, since that is mainly all it does and is easier to use for that purpose. I have not at this point tried out the bug fixes for the tracing in the Debugger, but will soon. If you need Peer (Geiger's) Tracer I can get it to you.

DarkPhoenix

Octorok
Level: 9

Posts: 10/31
EXP: 2669
For next: 493

Since: 03-08-05

Since last post: 23 days
Last activity: 16 hours

Posted on 04-27-05 10:02 PM

Link | Quote

Thanks for the correction...looked up the opcodes, and yeah, it is BRK...I think maybe those 00's were being read as instructions because I hadn't set it to follow the accumulator size properly...my bad.

As far as the JSR/JSL, I'd imagine that the address is pushed before the jump instruction...well, of course, before the instruction pointer is updated...which isn't as helpful, since then I might as well just go off the jump statement itself.

As for disasemblers, I think I'll give Tracer another swing, since it seems I was just misinterpreting the problem. Should be able to fix it with a switch, I hope.

Thanks again

Parasyte

Bullet Bill
Level: 35

Posts: 474/514
EXP: 267348
For next: 12588

Since: 05-25-04

Since last post: 104 days
Last activity: 32 days

Posted on 04-28-05 06:39 AM

Link | Quote

The JSR/JSL instruction is what does the pushing. The flow of these instructions works something like this:
1) Push return_address - 1
2) Jump to subroutine

The "return_address - 1" is just how the CPU works. When the RTS/RTL is executed to return, the flow goes something like this:
1) Pop return_address
2) Jump to return_address + 1

To pass arguments to subroutines, the registers are simply loaded with the arg values. When the regs are not enough to hold all args, the zero page (or direct page, for you 65816 buffs) is used. Args are almost never pushed to the stack on 65816, mainly because it's stupid, and Intel should die for coming up with such stupidness.

Now uhh, I haven't any idea what you are attempting accomplish, but it may help to remind you that the beginning of a subroutine is referenced directly by the calls (JSR/JSL/JMP/JML) to it. Quite often, you can also find the start of a subroutine simply by locating the first RTS/RTL/JMP/JML above the routine you're working with. So long as no preceding branches skip over the 'exit instruction' you find, it's quite safe to assume that the instruction following that 'exit instruction' is the beginning of your subroutine. Some times you may find some data poked between subroutines, but of course, it's almost always a trivial process (for a human) to decide what's code and what's data.
Now then, it's also interesting to note that most SNES games are, indeed, written in some mid-level language at the least. Probably somewhere between assembly and C.
For example, Virtual Boy development (which was started a good 5 years after the SNES' release) was done almost exclusively in assembly, but the assembler had an advanced macro structure to help ease the writing of code. Some particularly worthy examples of the macro structure included IF-THEN-ELSE statements rather than compare-branch-compare-branch, and several macros to deal with subroutine args and the like. This, in effect, is the "mid-level" language I was refering to.

DarkPhoenix

Octorok
Level: 9

Posts: 13/31
EXP: 2669
For next: 493

Since: 03-08-05

Since last post: 23 days
Last activity: 16 hours

Posted on 04-28-05 08:08 AM

Link | Quote

I like the Intel stack frames

...but that's probably more because I haven't worked with anything else...[EBP]+8 and all that... If it isn't clear enough yet, I'm a touch new to disassembling stuff. What I was trying to accomplish is just to get a general overview of the code. It would be a bit time consuming, to say the least, to go through the entire disassembly...with Eclipse, (www.Eclipse.org ...not that I have too much experience with it) you can generate a diagram of a C program in terms of which functions call eachother...like a UML sequence diagram. I'm wondering if there would be a quick way to get something like that, of course with offsets instead of function names, so that when I go in and look at actual functions, I have a diagram of where exactly they stand in the program, and can eventually map out how the whole program works. Might just be the wrong way of looking at the problem, but I figured it makes dissecting C source code easier, so it might make the binary not so much a mess. In short, I'm trying to objectify it into a sequence of smaller bits of code.

Parasyte

Bullet Bill
Level: 35

Posts: 476/514
EXP: 267348
For next: 12588

Since: 05-25-04

Since last post: 104 days
Last activity: 32 days

Posted on 04-28-05 02:45 PM

Link | Quote

So, you're talking about something like WinGraph32? I've never actually used tools like this. And they may have some unique purposes, I'm sure. I just haven't found any practical use, yet. In all honesty, these types of things don't tell you a whole lot about the program. It just shows you which routines call which other routines. Sounds more like a gimmick than a useful tool!
Earlier today, I was interested in writing a multi-pass disassembler for 65816. So I may spend a few minutes working on that tomorrow. The ultimate goal, of course, would be to intelligently seperate code from data to get the end result as close to the original source code as [automatically] possible. It would take Tracer's SEP/REP tracing idea one step further, and put it somewhere between that and bbitmaster's CDL feature in FCEUXD. Something similar to Snowbro's REV. (I believe that's the name. It's a 6502/NES disassembler which uses the same idea.)

(edited by Parasyte on 04-27-05 09:45 PM)

DarkPhoenix

Octorok
Level: 9

Posts: 15/31
EXP: 2669
For next: 493

Since: 03-08-05

Since last post: 23 days
Last activity: 16 hours

Posted on 04-28-05 06:17 PM

Link | Quote

Those mapouts have their usages, particularly when browsing through a lot of unfamiliar sourcecode. With this, I figured if I could get a mapout, it'd just make it easier to go through and mark off what each function does, as I dissect the code, so I know where to find things later...Just a little more visual than throwing a ton of offsets into a txt file or a spreadsheet. Plus a little more helpful if I'm looking for something, and don't find it in the function I expect it to be in - I could trace my way back up the tree. Might save me some time...Mostly I'm operating on the idea that it might be easier to get through the code if it more closely resembled something of a higher level, so I figured breaking it all up into functions would be the first step to that end. Also, more along the same lines as the program you're working on, if you know the beginning and end of each subroutine, then you have a pretty good idea of where the code is and where the data is, at least for the large chunks of each. Again, my overall idea here was a more abstract interpretation of the file (I've been having mixed success with the usage maps in Geiger's debugger). I get the idea you think it's kind of a waste of time, though, so perhaps my time would be better spent just tearing into the code

Parasyte

Bullet Bill
Level: 35

Posts: 478/514
EXP: 267348
For next: 12588

Since: 05-25-04

Since last post: 104 days
Last activity: 32 days

Posted on 04-29-05 07:15 AM

Link | Quote

In my quite honest opinion, that level of reverse engineering is for doing things like source code recovery. You won't find too many situations in ROM hacking that require that amount of work. For most things, a single breakpoint will suffice.
If you're interested in rewriting large portions of the game, such tools may have some interesting uses.

MathOnNapkins

Math n' Hacks
Level: 67

Posts: 1786/2189
EXP: 2495887
For next: 96985

Since: 03-18-04
From: Base Tourian

Since last post: 1 hour
Last activity: 32 min.

Posted on 04-29-05 07:23 AM

Link | Quote

I've had pipedreams about many such projects. But after looking for code hour after hour, day after day, the likelihood of that being a possibility is near nil. The best I can come up with would be an active code manager, that would attempt to manage data blocks, and code at the same time, so you could insert your code, and "recompile" it. However, that would rely on your real time emulation knowing almost everything about the game. Otherwise, you might have some unknown code calling the old location of your data blocks, etc...

Doesn't seem like an easy task.

jonwil

Goomba
Level: 9

Posts: 20/25
EXP: 2989
For next: 173

Since: 04-09-04

Since last post: 85 days
Last activity: 81 days

Posted on 05-08-05 05:39 PM

Link | Quote

Why not implement a proper tracing disassembler for SNES (much like what IDA PRO does)

Basicly, you start at whatever the entry-points (i.e. NMI/VBL/normal entry point).
Then you disassemble instructions, following branches. When you hit an instruction that changes the flags (i.e. SEP or REP), you take that into account and decode acordingly.
Since you are tracing the code, it should be possible to differentiate between code and data easily (since I have yet to see a SNES game that uses the same area as data and code or one that uses the same area of code with different flag settings)
There are a few corner cases that might make things harder e.g. where it does "jump if zero" then "jump if not zero" then follows the "jump if not zero" with data but some logic in the disassembler should be able to identify those.

blackhole89

LOLSEALS
Moderator of ROM hacking
EmuNET IRC network admin
Head GM of TwilightRO
Level: 47

Posts: 694/971
EXP: 739208
For next: 26995

Since: 03-15-04
From: Dresden/Germany

Since last post: 14 hours
Last activity: 12 hours

Posted on 05-09-05 12:34 AM

Link | Quote

Actually, I started working on a similar tool just yesterday. Now isn't that a weird coincidence

The only problem have encountered so far is that tracing indirectly addressed jumps is nearly impossible without properly emulating the ROM (and even then, you will hardly find all possibilities). As a (not very nice) solution for that, I added the possibility to explicitely mark data pieces as code (will be disassembled properly) or data (will be disassembled to .db "instructions") in description files that formerly only were intended to contain information about what the subroutines do (thus replacing autogenerated jump marks by senseful names).

HyperLamer
<||bass> and this was the soloution i thought of that was guarinteed to piss off the greatest amount of people

Sesshomaru
Tamaranian

Level: 118

Posts: 4405/8210
EXP: 18171887
For next: 211027

Since: 03-15-04
From: Canada, w00t!
LOL FAD

Since last post: 2 hours
Last activity: 2 hours

Posted on 05-09-05 01:31 AM

Link | Quote

That's not the only problem. Consider:
PHX
RTS
How does your tracer know where that's going to end up? The only real way I can see to do it accurately is to actually emulate the game.

Sukasa

Boomboom
Error 349857348734534: The system experienced an error.
Level: 57

Posts: 532/1981
EXP: 1446921
For next: 39007

Since: 02-06-05
From: *Shrug*

Since last post: 6 days
Last activity: 1 day

Posted on 05-09-05 01:54 AM

Link | Quote

possibly by emulating the code in a sort of "userless" enviroment. In ZSNES, it always seemd to me that the slowest part of emulation was drawing the screen and receiving input from the user.

d4s

Panser
Level: 29

Posts: 203/325
EXP: 142151
For next: 5734

Since: 03-23-04

Since last post: 13 days
Last activity: 1 day

Posted on 05-09-05 02:21 AM

Link | Quote

there are several things that will be very hard to pull off with a static disassembler.
for example, breath of fire 2s "kernel" (from what ive seen, several capcom games use it) manages several stacks at once to perform some sort of multitasking wich ends up manipulating the stack and then rtl'ing to jump to different execution threads.

imho, a cpu-emulator following all possible branches would be the best idea, but thatd be some serious work.
personally, im happy with the debugging features emulators provide.

Sukasa

Boomboom
Error 349857348734534: The system experienced an error.
Level: 57

Posts: 534/1981
EXP: 1446921
For next: 39007

Since: 02-06-05
From: *Shrug*

Since last post: 6 days
Last activity: 1 day

Posted on 05-09-05 02:30 AM

Link | Quote

Well,if you simply set it to make repeated passes, each tie choosing a different button configuration, that might work, and be a little easier to use/program.

DarkPhoenix

Octorok
Level: 9

Posts: 21/31
EXP: 2669
For next: 493

Since: 03-08-05

Since last post: 23 days
Last activity: 16 hours

Posted on 05-09-05 08:53 AM

Link | Quote

Perhaps it'd be better to start more simple, like with a more real-time disassembler that allows you to disassemble sections of code separately with different parameters (specifically the initial status of the flags register), and then modify those parameters for each section...rather than disassembling the whole thing at once, and then going back and then re-disassembling from some offset every time you run into an error in the disassembly. (correct me if there's already anything remotely like this). That way, if a decent technique comes around for doing a proper trace regarding the flags registers, it could be implemented as a function that automatically finds and sets the proper parameters for each section of code...essentially, let the user just guess/figure out where subroutines are and how they are supposed to be traced for themselves, and test it a little more quickly, for now. This, of course, rather heavily relies on the assumption that when a game is actually run, the status register is always either properly set prior to executing the subroutine, or that the way the subroutine is executed is independant of the status of that register (particularly if its first statements set this register to the proper settings)

As far as actually figuring out the proper way to trace the binary, it might be easier to do it as two separate parts...such as an add on to an emulator that logs jumps (and returns, for that matter) and the condition of the status register prior to the jump during an actual execution (particularly as part of a usage map), and worry about parsing the logfile to do an actual disassembly later. Similarly, this could be used to create a dynamic model, more for source code recovery, as parasyte mentioned, or maybe for error checking on the status of the flags. Not quite as easy to write as Darkflight's idea, but not too much work, and more efficient. Takes a lot more work on the part of the user, though.

Parasyte

Bullet Bill
Level: 35

Posts: 503/514
EXP: 267348
For next: 12588

Since: 05-25-04

Since last post: 104 days
Last activity: 32 days

Posted on 05-09-05 09:51 AM

Link | Quote

The "problems" brought up are nothing new. Study decompiler theory for a huge list of problems and suggested solutions.

jonwil

Goomba
Level: 9

Posts: 21/25
EXP: 2989
For next: 173

Since: 04-09-04

Since last post: 85 days
Last activity: 81 days

Posted on 05-09-05 12:52 PM

Link | Quote

If I had the skills, I would build an IDA plugin for 65816 and a SNES ROM loader.
Only problem is that I dont know if IDA can handle CPUs where an instruction is decoded differently based on the setting of a CPU register.

In any case, something like what IDA does (including being able to manually specifiy "this is code" and "this is data" for a given block of code, mabie with some SNES specific data decoders to decode graphics and other things in various formats) is the right way.
For example, if the disassembler sees push then RTS or RTL, it knows where to keep disassembling.
If it sees a push then RTS/RTL where it cant identify the address being returned to, it stops tracing that code path.

You would then go in and manually identify what else is code and data.
Mabie comibining this with a dumper for an emulator that would dump wich addresses are read as code would help.

In any case, I doubt that building a SNES disassembler that doesnt require a fair whack of manual identification is possible.

Parasyte

Bullet Bill
Level: 35

Posts: 505/514
EXP: 267348
For next: 12588

Since: 05-25-04

Since last post: 104 days
Last activity: 32 days

Posted on 05-09-05 02:07 PM

Link | Quote

Fully automated decompilation is hardly doable. But what can be done is still quite impressive. I'd humbly suggest you do some more studying before posting again. Any good decompiler reference will bring up many of the same ideas.

While IDA is a great disassembler, it just isn't designed for decompiling SNES ROMs. It contains a CPU module to disassemble 6502, but not 65816. It is more than capable of deciding how to decode instructions based on the Index and Accumulator sizes. But that would depend on how the module is written. Take a look at the ARM module, for instance, IDA should be able to choose between ARM and Thumb code automatically, but it does not. This is mostly due to how the module was written -- it does not follow bx instructions closely. Most IDA modules do not emulate memory (including the stack) making it rather difficult for IDA to do such things.

One reason I do not wish to write an IDA CPU module for 65816: IDA is a disassembler only. I wish to have the ability to recompile extracted sources. Writing a plugin to assemble the output from IDA is certainly no difficult task, but IDA doesn't seperate banks or anything. Like I said, it just isn't designed for decompiling SNES ROMs.
Oh right, and IDA is far too expensive for the casual SNES hacker. Not to mention that endorsing piracy just so your module can be used is none too clever.

(edited by Parasyte on 05-08-05 09:09 PM)

Add to favorites | "RSS" Feed | Next newer thread | Next older thread

Acmlm's Board - I2 Archive - Rom Hacking - Question about 65816 structure

Page rendered in 0.017 seconds.