Register | Login
Views: 19364387
Main | Memberlist | Active users | ACS | Commons | Calendar | Online users
Ranks | FAQ | Color Chart | Photo album | IRC Chat
11-02-05 12:59 PM
1 user currently in Rom Hacking: hukka | 2 guests
Acmlm's Board - I2 Archive - Rom Hacking - Any known text compressions? | |
Add to favorites | "RSS" Feed | Next newer thread | Next older thread
User Post
Kei-kun

Level: 9

Posts: 1/27
EXP: 2633
For next: 529

Since: 11-15-04

Since last post: 86 days
Last activity: 26 days
Posted on 11-26-04 05:09 AM Link | Quote
Well, I've been trying to edit Final Fantasy Tactics Advance for the GBA, but I can't make a table file. The text just isn't stored in the same way as other games. I've looked and seen that the alphabet is in order (and without spaces), so relative searching should work (and so as to keep you all from saying that I'm just not doing it right, I have searched, with multiple programs, all lowercase letter-only strings), but found nothing.

So that leads me to the conclusion that the text is compressed. If anyone happens to know of how it works, please tell. If anyone can share any known text compressions, that would also be great. Even giving me the offset of one instance of text would help me greatly.


Thanks,
Kei-kun


(edited by Kei-kun on 11-25-04 08:10 PM)
Smallhacker

Green Birdo

SMW Hacking Moderator
Level: 68

Posts: 964/2273
EXP: 2647223
For next: 81577

Since: 03-15-04
From: Söderhamn, Sweden

Since last post: 10 hours
Last activity: 9 hours
Posted on 11-26-04 06:11 PM Link | Quote
Sometimes, one letter uses two bytes.
Keitaro

Iron Knuckle
ウラシマ ケイタロウ
Level: 54

Posts: 854/1342
EXP: 1201569
For next: 32301

Since: 03-15-04
From: Hinata, Japan

Since last post: 2 days
Last activity: 2 days
Posted on 11-26-04 08:09 PM Link | Quote
by in order, you meant you saw the alphabet this way in a tile editor, I assume? Well still the tiles may be in order, the hex values for them may be different from standard ascii. You'll need to find out what hex number represents each letter tile. After that, you should be all set once you've made a table file
Xkeeper
The required libraries have not been defined.
Level: NAN

Posts: -3048/-863
EXP: NAN
For next: 0

Since: 03-15-04

Since last post: 2 hours
Last activity: -753366 sec.
Posted on 11-26-04 10:34 PM Link | Quote
Originally posted by Keitaro
by in order, you meant you saw the alphabet this way in a tile editor, I assume? Well still the tiles may be in order, the hex values for them may be different from standard ascii. You'll need to find out what hex number represents each letter tile. After that, you should be all set once you've made a table file
That's what relative searching does.

But, yes, it's likely compressed or uses two bytes.
Kei-kun

Level: 9

Posts: 2/27
EXP: 2633
For next: 529

Since: 11-15-04

Since last post: 86 days
Last activity: 26 days
Posted on 11-27-04 03:47 AM Link | Quote
Well, it's impossible to relative search for text that uses two bytes a letter, even if radicals were to be used (which I have tried [with like a radical in between of before letters]); so I guess this thing is just compressed. I've also run into such a problem in games like Golden Sun.

I guess I'll just keep looking/asking around about text compressions and for offsets for some text. If all fails, I guess I'll just have to move on to a different game.

By the way, why isn't the post count showing in the side bar? Not that I care, but it looks weird seeing an empty field there.


(edited by Kei-kun on 11-26-04 06:48 PM)
HyperLamer
<||bass> and this was the soloution i thought of that was guarinteed to piss off the greatest amount of people

Sesshomaru
Tamaranian

Level: 118

Posts: 2169/8210
EXP: 18171887
For next: 211027

Since: 03-15-04
From: Canada, w00t!
LOL FAD

Since last post: 2 hours
Last activity: 2 hours
Posted on 11-27-04 06:09 AM Link | Quote
GDI's not installed, so the images don't work. The plain text ones in other themes do, tho.

Anyway, about your text problem: If the text is compressed, it probably uses one of the compressions that the BIOS has built-in. I dunno how good the various GBA debuggers are, but try to find calls to those when it's reading text.
Keitaro

Iron Knuckle
ウラシマ ケイタロウ
Level: 54

Posts: 856/1342
EXP: 1201569
For next: 32301

Since: 03-15-04
From: Hinata, Japan

Since last post: 2 days
Last activity: 2 days
Posted on 11-27-04 06:39 AM Link | Quote
Since you mentioned Golden Sun, I can let you in on something. The credits, and the beta programs the programers never left out ("hello world" and "this is a sprite" ) use ASCII
Kei-kun

Level: 9

Posts: 3/27
EXP: 2633
For next: 529

Since: 11-15-04

Since last post: 86 days
Last activity: 26 days
Posted on 11-29-04 06:32 AM Link | Quote
Hm... might it possibly be that there are bytes used that represent a whole combination of letters? Kind of like the word "discrimination" might end up like "dis-cri-mi-na-tion," kind of like how Chrono Trigger has "allowance" as "all-ow-an-ce?"

If it was something like that, how would I find it without blindly changing bytes and hoping it's text that I'll notice when testing the game so that I can make a table file?


(edited by Kei-kun on 11-28-04 09:33 PM)
Juggling Joker

Boomerang Brother
SMW Hacking Moderator
Yeah, JAMH is still being worked on.
Level: 48

Posts: 371/1033
EXP: 811447
For next: 12096

Since: 03-15-04
From: Wyoming

Since last post: 2 days
Last activity: 3 hours
Posted on 11-29-04 08:23 AM Link | Quote
If it's using dictionary compression, as you seem to think, try searching for a very uncommon string of characters that appear in text somewhere. If you find something, you can probably work backwards from there, as most text is almost always found in the same general area of the rom.
labmaster

Blue Octorok
Level: 12

Posts: 2/43
EXP: 6135
For next: 1786

Since: 07-17-04
From: New Zealand!

Since last post: 10 days
Last activity: 2 min.
Posted on 11-29-04 12:18 PM Link | Quote
Golden Sun TLA (and I'd assume the first one as well) - uses a very interesting system that I never fully reversed (the reason why I got into it was I was trying to rip out enemy names), if I can find my notes on the game I'll post them here.
Kei-kun

Level: 9

Posts: 4/27
EXP: 2633
For next: 529

Since: 11-15-04

Since last post: 86 days
Last activity: 26 days
Posted on 11-30-04 04:10 AM Link | Quote
JJ, I guess that's the only way, huh? Well then, I guess if it is, I'll have to replay the game and be on a lookout for a really weird word. Wait, that's impossible. Damn, it'll take something without vowels for that to work. Maybe even then, they could have an entry for repeating letters like "mm" or "hh."

Oh, that would be great, Labmaster. The only reason why I mentioned Golden Sun at all though is because I was also thinking of editing that game if I couldn't edit FFT (but that was before I ran into the same problem). Truthfully, though... after playing Golden Sun for a while, I kind of think that'd be a more fun game to edit.

Edit: I lied. It'd be more fun to get all the data I can, lol. When was the last time I've ever completed a hack? I usually just write documents. However, text is the most basic thing in a game; and I can't do much without knowing how it works.


(edited by Kei-kun on 11-29-04 07:12 PM)
labmaster

Blue Octorok
Level: 12

Posts: 3/43
EXP: 6135
For next: 1786

Since: 07-17-04
From: New Zealand!

Since last post: 10 days
Last activity: 2 min.
Posted on 11-30-04 05:46 AM Link | Quote
Haven't found it yet - it may not even be on this comp - still looking...

Anyway this is basically what I did - the start of the enemy datablock (RAM) contains the name of the enemy in ASCII. I was able to breakpoint those addresses and trace back - I managed to get as far as what appeared to be 'pointers'. The problem was, they weren't pointers in the traditional sense (they told the game, somehow, where to look) - the text didn't seem to be any particular order either, with the various languages jumbled together. The actual storage of the text itself could quite possibly be a form of compression - it's nothing I've seen before. I guess it was the giant loops that put me off going any further.

Just on a side - for anyone looking for some practice on dictionary compression, the latest Who Wants to be a Millionaire game would be a great target, I was thinking of writing a question editor for that, but some other things came up.
Kei-kun

Level: 9

Posts: 5/27
EXP: 2633
For next: 529

Since: 11-15-04

Since last post: 86 days
Last activity: 26 days
Posted on 12-01-04 01:47 AM Link | Quote
Coudln't find the enemy name in memory. I need something that will allow me to search the memory and not just view it (one of those times where I wish I had the developer version of No$Gba). Just curious, though. What were you doing to figure out that what you traced back to was something that told the game where to go?
labmaster

Blue Octorok
Level: 12

Posts: 5/43
EXP: 6135
For next: 1786

Since: 07-17-04
From: New Zealand!

Since last post: 10 days
Last activity: 2 min.
Posted on 12-03-04 05:15 AM Link | Quote
I use a modified version of VBA - VisualBoy Advance for Hackers - the windows version has a very simple ASCII search hacked into the memory viewer. As for tracing, it was basically just using breakpoints and dumping large amounts of ASM traces.

I just took a look at the first game - the first enemy starts at 02030878, for The Lost Age, it's 020308c8.


edit: I couldn't find my old work, so I've started again from scratch. Below is what I have so far.



First enemy name is stored as ASCII at 02030878


Breakpoint on Write:

Breakpoint (on write) address 02030878 old:00 new:54
R00=03007d94 R04=03007d94 R08=00000009 R12=0002f7c0
R01=02030878 R05=00000001 R09=00000080 R13=03007d90
R02=00000000 R06=02030878 R10=080c71c3 R14=08019787
R03=00000054 R07=08082908 R11=00000058 R15=0807951c
CPSR=0000003f (......T Mode: 1f)
0807951a 3202 add r2, #0x2
debugger>

The text is being copied from IWRAM 03007d94 using this routine:

08079514 5a13 ldsb r3, [r2, r0]
08079516 3501 add r5, #0x1
08079518 700b strb r3, [r1, #0x0] <--the store
0807951a 3202 add r2, #0x2 <--increment source by 2 (source text looks to be utf-16)
0807951c 3101 add r1, #0x1 <--increment destination by 1
0807951e 2d0d cmp r5, #0xd <--if count > 13 (13=max length of name)
08079520 dc02 bgt $08079528<--end of block
08079522 5b13 ldsb r3, [r2, r4]
08079524 2b00 cmp r3, #0x0
08079526 d1f5 bne $08079514

Using db to skip a crap load of stack breaks:

Breakpoint (on write) address 03007d94 old:15fa new:0054
R00=00000054 R04=080374c0 R08=0300207c R12=00000160
R01=00000054 R05=03007d68 R09=0000ffff R13=03007d68
R02=08039b5c R06=03007d94 R10=00000000 R14=08019777
R03=0038c187 R07=0000000e R11=00000058 R15=08019770
CPSR=2000003f (..C...T Mode: 1f)
0801976e 3602 add r6, #0x2
debugger>

We'll concentrate on the generation of the second letter. An ARM routine at 0300207c is called via bl 08007304 with the destination vector in r8. This bl is a standard bx r8 function call, and should not be used in traces.

R00=03007d68 R04=080374c0 R08=0300207c R12=00000160
R01=00000054 R05=03007d68 R09=0000ffff R13=03007d68
R02=08039b5c R06=03007d96 R10=00000000 R14=08019777
R03=0038c187 R07=0000000e R11=00000058 R15=08019774
CPSR=2000003f (......T Mode: 1f)
08019772 f7ed bl $08007304

Following thumb registers are trashed immediately without saving:
r1,r2,r3,r4

r5 and r6 are pushed at the start and popped off at the end, data is not used.

Reasonable to assume that r0 is sole parameter?

r0 points to 3 words on the stack.

some sample word combos (looping through letter of name):

0, 08039b5c,01c60c3e (1110001100000110000111110)
54,08039b5c,0038c187 (1110001100000110000111)
68,08039b5c,001c60c3 (111000110000011000011)
75,08039b5c,000038c1 (11100011000001)
6e,08039b5c,0000038c (1110001100)
64,08039b5c,00000071 (1110001)
65,08039b5c,0000000e (1110)
72,08039b5c,00000001 (1)

I see a pattern

editing the data structure before going through the loop results in
'TTTTTTThunder ' instead of
'Thunder Lizard'



Below is the entire function:

0300207c e92d0060 stmfd sp!, {r5,r6}
03002080 e890000e ldmia r0, {r1-r3}
03002084 e59fc12c ldr r12, [$030021b8] (=$0803842c)
03002088 e1a04421 mov r4, r1, lsr #0x08
0300208c e08cc184 add r12, r12, r4, lsl #0x03
03002090 e89c0030 ldmia r12, {r4,r5}
03002094 e201c0ff and r12, r1, #0xff
03002098 e08cc00c add r12, r12, r12
0300209c e19550bc ldrh r5, [r5, r12]
030020a0 e0844005 add r4, r4, r5
030020a4 e1a05004 mov r5, r4
030020a8 e3a0c001 mov r12, #0x1
030020ac e2146003 ands r6, r4, #0x3
030020b0 0a000005 beq $030020cc

030020b4 e07c6186 rsbs r6, r12, r6, lsl #0x03
030020b8 e3c44003 bic r4, r4, #0x3
030020bc e494c004 ldr r12, [r4], #0x4
030020c0 e1a0c06c mov r12, r12, rrx
030020c4 e1a0c63c mov r12, r12, lsr r6
030020c8 e3a06000 mov r6, #0x0

030020cc e1b0c0ac movs r12, r12, lsr #0x01
030020d0 0494c004 ldreq r12, [r4], #0x4
030020d4 01b0c06c moveqs r12, r12, rrx
030020d8 2a000029 bcs $03002184

030020dc e1b030a3 movs r3, r3, lsr #0x01
030020e0 3afffff9 bcc $030020cc

030020e4 04923004 ldreq r3, [r2], #0x4
030020e8 01b03063 moveqs r3, r3, rrx
030020ec 3afffff6 bcc $030020cc

030020f0 e3a01000 mov r1, #0x0
030020f4 e1b0c0ac movs r12, r12, lsr #0x01
030020f8 2a000017 bcs $0300215c

030020fc e1b0c0ac movs r12, r12, lsr #0x01
03002100 2a000006 bcs $03002120

03002104 e1b0c0ac movs r12, r12, lsr #0x01
03002108 2a000003 bcs $0300211c

0300210c e1b0c0ac movs r12, r12, lsr #0x01
03002110 2a000009 bcs $0300213c

03002114 e2811004 add r1, r1, #0x4
03002114 e2811004 add r1, r1, #0x4
03002118 eafffff5 b $030020f4

0300211c e2811001 add r1, r1, #0x1
03002120 12866001 addne r6, r6, #0x1
03002124 1afffff2 bne $030020f4
03002128 e494c004 ldr r12, [r4], #0x4
0300212c e1b0c06c movs r12, r12, rrx
03002130 32811002 addcc r1, r1, #0x2
03002134 22866001 addcs r6, r6, #0x1
03002138 eaffffed b $030020f4

0300213c e2811002 add r1, r1, #0x2
03002140 12866001 addne r6, r6, #0x1
03002144 1affffea bne $030020f4
03002148 e494c004 ldr r12, [r4], #0x4
0300214c e1b0c06c movs r12, r12, rrx
03002150 32811002 addcc r1, r1, #0x2
03002154 22866001 addcs r6, r6, #0x1
03002158 eaffffe5 b $030020f4

0300215c 0a000003 beq $03002170
03002160 e2866001 add r6, r6, #0x1
03002160 e2866001 add r6, r6, #0x1
03002164 e2511001 subs r1, r1, #0x1
03002168 aaffffe1 bge $030020f4
0300216c eaffffd6 b $030020cc

03002170 e494c004 ldr r12, [r4], #0x4
03002174 e1b0c06c movs r12, r12, rrx
03002178 2afffff8 bcs $03002160

0300217c e2811001 add r1, r1, #0x1
03002180 eaffffdb b $030020f4

03002184 e1b010a6 movs r1, r6, lsr #0x01
03002188 e0866001 add r6, r6, r1
0300218c e0456006 sub r6, r5, r6
03002190 e5565001 ldrb r5, [r6, -#0x1]
03002194 e5566002 ldrb r6, [r6, -#0x2]
03002198 2205100f andcs r1, r5, #0xf
0300219c 21861401 orrcs r1, r6, r1, lsl #0x08
030021a0 31a01205 movcc r1, r5, lsl #0x04
030021a4 31811226 orrcc r1, r1, r6, lsr #0x04
030021a8 e880000e stmia r0, {r1-r3}
030021ac e1b00001 movs r0, r1
030021b0 e8bd0060 ldmfd sp!, {r5,r6}
030021b4 e12fff1e bx lr <---end of routine I hope



(edited by labmaster on 12-04-04 12:13 AM)
Add to favorites | "RSS" Feed | Next newer thread | Next older thread
Acmlm's Board - I2 Archive - Rom Hacking - Any known text compressions? | |


ABII


AcmlmBoard vl.ol (11-01-05)
© 2000-2005 Acmlm, Emuz, et al



Page rendered in 0.017 seconds.