Acmlm's Board - I2 Archive

Register \| Login
Views: 19364387	Main \| Memberlist \| Active users \| ACS \| Commons \| Calendar \| Online users Ranks \| FAQ \| Color Chart \| Photo album \| IRC Chat	11-02-05 12:59 PM

1 user currently in Rom Hacking:

hukka | 2 guests

Acmlm's Board - I2 Archive - Rom Hacking - UNRIF file format proposition

Pages: 1 2

Add to favorites | "RSS" Feed | Next newer thread | Next older thread

User Post

Vystrix Nexoth

Level: 30

Posts: 229/348
EXP: 158678
For next: 7191

Since: 03-15-04
From: somewhere between anima and animus

Since last post: 3 days
Last activity: 2 days

Posted on 01-22-05 06:54 PM

Link | Quote

UNRIF (Universal NES ROMhack Interchange Format) is a proposed file format for an NES-specific ROM file patching system, to replace IPS for that purpose. It does not address non-NES applications.

The key features of UNRIF in comparison to IPS are:

Describes patch data in a semantically-rich way, thereby allowing a patch to be applied to either an iNES or a UNIF ROM file, or indeed any other NES ROM file format if supported by the patcher.
Can record extensive meta-data, such as the author's name, website address, commentary, the title of the hack, and other such things. This meta-data uses UTF-8 encoding to accomodate those who don't speak English, and can also record a language code, so if the author wishes to give meta-data in multiple languages, he/she can, in which case a patcher could select which one to view based on the user's system settings.
Accomodates ROM expansion more elegantly, by allowing for instructions which insert, delete, or rearrange the contents of PRG-ROM or CHR-ROM, rather than just overwriting data.
Includes optional checksums to help ensure that the correct file is being patched.
Allows for variants, whereby essentially multiple patches can be present in the file, plus allowing for some patch data to be common to all variants. This allows for a patch that can be applied to multiple versions of a given game; for example, a hack of Final Fantasy II that can be applied to either the original Famicom ROM or to an English translation thereof, with patch data common to both not having to be defined twice.

The complete file format specification can be found in XHTML format at the following URI:
http://zero-soul.panicus.org/misc/unrif-prop-1.php

The purpose of this thread is for commentary and questions about the format.

dan

Snap Dragon
Level: 43

Posts: 356/782
EXP: 534516
For next: 30530

Since: 03-15-04

Since last post: 20 hours
Last activity: 14 hours

Posted on 01-22-05 09:07 PM

Link | Quote

One suggestion I have is to increase the 24 bit values to 32 bit values. Not for the added size, just that most programming languages do not have a default 24 bit variable, so some kind of struct hack is needed.

There's a few things I am iffy about, but they are more to do with gaps in my own knowledge, so I'll just google them probably.

Geiger

Buster Beetle
Level: 34

Posts: 248/460
EXP: 241080
For next: 12571

Since: 03-15-04
From: Indianapolis, IN, USA

Since last post: 6 hours
Last activity: 6 hours

Posted on 01-22-05 11:43 PM

Link | Quote

I would suggest making the checksums mandatory at creation time and optional at application time. For example, the patch creator makes the patch and records checksums, and the patch applier can popup a box that says "this ROM does not match this patch, apply anyway?"

---T.Geiger

Vystrix Nexoth

Level: 30

Posts: 230/348
EXP: 158678
For next: 7191

Since: 03-15-04
From: somewhere between anima and animus

Since last post: 3 days
Last activity: 2 days

Posted on 01-22-05 11:49 PM

Link | Quote

Well, people have been writing IPS programs (IPS uses 24-bit addressing) just fine. Using 32-bit addressing just wastes space, and you can always keep it in a 32-bit variable anyway:

unsigned long int address;
address = fgetc(patchfile) | (fgetc(patchfile)<<8) | (fgetc(patchfile)<<16);

(this has the further benefit of being byte-order-agnostic)

The reason chunks have 32-bit "length" fields is because they don't occur as often, plus it was in keeping with the same chunk format as UNIF. PDAT/CDAT blocks, however, occur much more frequently.

dan

Snap Dragon
Level: 43

Posts: 357/782
EXP: 534516
For next: 30530

Since: 03-15-04

Since last post: 20 hours
Last activity: 14 hours

Posted on 01-23-05 01:15 AM

Link | Quote

True enough. It doesn't affect me personally, as Delphi can do something similar, but I was thinking of those who wanted to implement a patcher in Visual Basic. It would have to be in a hacky way. It wouldn't really waste that much space to increase the addresses to 24-bit, as most patches are distributed within ZIP files.

I agree with Geiger about the checksums, however if I wrote a UNRIP patch creator, I would always make it add checksums to the patch, without the user having any say whatsoever.

Smallhacker

Green Birdo

SMW Hacking Moderator
Level: 68

Posts: 1247/2273
EXP: 2647223
For next: 81577

Since: 03-15-04
From: Söderhamn, Sweden

Since last post: 10 hours
Last activity: 9 hours

Posted on 01-23-05 01:58 AM

Link | Quote

I've read a part of the specification, but not all (due to lack of time) and I can't see any problems. I have a question, though, and I don't know if it's answered in the specs or not.

What would happen if somebody creates an UNRIF patch of a SNES game or something else? Would the format make it work at all?

TheMonster
Newcomer
Level: 4

Posts: 4/4
EXP: 139
For next: 140

Since: 01-03-05

Since last post: 283 days
Last activity: 17 hours

Posted on 01-23-05 02:24 AM

Link | Quote

I know Playstation hacking is fairly non-existent, but that could change...so I would propose one of two things:

A.32-bit addressing so a much larger range can be addressed. This is a simple solution, and still would be required if B would(and should) be implemented.

B.OK, there should be a couple of different fields describing what kind of ROM would be used. One would be for the system in question, and the sub-field(optional) would be the type of ROM for that console/handheld/whatever. I think one important thing that could be denoted in these fields is that a file is an ISO. Then the patcher could modify the files in the ISO separately rather than one bulky ISO that could have been ripped numerous ways(if my memory serves me correctly). It shouldn't be TOO difficult to put that in a patcher...I am a complete fool and had my own ISO-reading/updating utility I made when I was messing with Symphony of the Night years ago.

I have some more notes somewhere because I was thinking about throwing my own format out there for a hack I *WAS* working on. Heh. If I can find the document, maybe I will share(I think it took care of a TON of possibilities and was easily expandable).

Oh yeah:

C.This is just a pleasantry, but wouldn't it be nice if patches were automatically ZLibbed so the contents were compressed and not required to be in an archive? Maybe even the description could be contained within the patch. I'm just thinking of something along the lines of associating a patching program with these and...it capturing your clicks in a browser. Blah. Wait, that's kind of a crappy reason. I still think it would be a nice addition, though.

I may hop back into this topic later.

-Monsty

Gavin

Fuzzy
Rhinoceruses don't play games. They fucking charge your ass.
Level: 43

Posts: 397/799
EXP: 551711
For next: 13335

Since: 03-15-04
From: IL, USA

Since last post: 13 hours
Last activity: 13 hours

Posted on 01-23-05 02:35 AM

Link | Quote

Originally posted by TheMonster
I know Playstation hacking is fairly non-existent, but that could change...

not quite so my friend: playstation hacking has been around for a bit of while actually. Check out the romhacking.com forums, specifically a lot of |Pixel|'s work has been pretty awesome in the ways of PSX. PSX hacking itself goes a way back in the form of the ToP PSX translation by Cless by a few years too, as far as my knowledge is concerned

dan

Snap Dragon
Level: 43

Posts: 358/782
EXP: 534516
For next: 30530

Since: 03-15-04

Since last post: 20 hours
Last activity: 14 hours

Posted on 01-23-05 02:53 AM

Link | Quote

I think UNRIP is specifically designed for the NES, so any patch creators would probably only recognise iNES or UNIF images and create patches of those. It's not really designed for any other system.

Vystrix Nexoth

Level: 30

Posts: 231/348
EXP: 158678
For next: 7191

Since: 03-15-04
From: somewhere between anima and animus

Since last post: 3 days
Last activity: 2 days

Posted on 01-23-05 04:48 AM

Link | Quote

Well, after some discussion on IRC, the repertoire of "methods" of PDAT/CDAT has been condensed to three methods: Replace, Copy-from-Source and Copy-from-Result, which can do everything the other methods (except "Delete", which I don't think was really needed anyway) could do and is somewhat more elegant about doing it. The specification has been updated accordingly.

I still haven't made a decision regarding other systems. As Dan said, UNRIF was designed from the start for the nuances of the NES, and it is therefore exclusive to the NES. However, I'm not at all opposed to the idea of similar formats being devised for other systems (and being tailored for the nuances of each system).

Basically, it could go one of two ways:

Devise independent- but related- formats for each system, e.g. USRIF (Universal SuperNES ROMhack Interchange Format) for SuperNES games, which would presumably have the same foundation as UNRIF (block format/methods, meta-data, variations, etc) but replace PDAT/CDAT/PCRC/CCRC/MAPR/MIRR with chunks appropriate for the nuances of the Super NES. Other such formats could make other changes necessary to accomodate the system, e.g. by expanding block addressing from 24-bit to 32-bit if necessary.
Extend UNRIF to define a console name inside the format itself, and have META and VARY be common to all formats, and with other chunks specific to a particular system. This is essentially the same as the first, except patches for different systems would presumably have the same filename extension and the distinction would be made inside the patch itself.

Personally, I'm leaning towards the former (having separate formats). This would work similar to how e.g. "PSF" was expanded from a single format to a family of formats (USF, GSF, etc) with the same basic structure but tailored to the nuances of each system.

Note that, either way, I won't be drawing up specifications for patching formats for any other system, as I'm not as familiar with them as I would need to be.

Geiger

Buster Beetle
Level: 34

Posts: 249/460
EXP: 241080
For next: 12571

Since: 03-15-04
From: Indianapolis, IN, USA

Since last post: 6 hours
Last activity: 6 hours

Posted on 01-23-05 05:23 AM

Link | Quote

One other suggestion is that you may wish to consider using Unicode instead of UTF8. Unicode is supported on most operating systems in one form or another. And Microsoft, at least, is being a bit heavy-handed about its use nowadays.

Basically, I am suggesting this as a "future proof" feature. If not a required part of the standard format, perhaps one of the remaining 64 bits could designate that the patch uses Unicode instead of UTF8.

---T.Geiger

Vystrix Nexoth

Level: 30

Posts: 232/348
EXP: 158678
For next: 7191

Since: 03-15-04
From: somewhere between anima and animus

Since last post: 3 days
Last activity: 2 days

Posted on 01-23-05 05:41 AM

Link | Quote

Originally posted by Geiger
One other suggestion is that you may wish to consider using Unicode instead of UTF8. Unicode is supported on most operating systems in one form or another. And Microsoft, at least, is being a bit heavy-handed about its use nowadays.

Basically, I am suggesting this as a "future proof" feature. If not a required part of the standard format, perhaps one of the remaining 64 bits could designate that the patch uses Unicode instead of UTF8.

---T.Geiger

Unicode and UTF-8 are conceptually different things. Unicode is a character repertoire; Unicode text is a sequence of codepoints (which address a character in the character repertoire); UTF-8 is one of several ways to represent Unicode text as a sequence of actual bytes.

In short, UTF-8 is a way to represent Unicode.

I think you're referring to UTF-16 (which is another way to represent Unicode text). I considered that, but I have my reasons for opting to go with UTF-8:

Unlike UTF-16, UTF-8 will not have any 00 bytes in the course of normal text (except to represent the Unicode character U+0000, a character which is expressly forbidden from meta-data in the UNRIF spec), so a program can easily scan for the end of a tag by looking for a 00 byte. UTF-16, on the other hand, will have almost every other byte be a 00 byte, for Latin-alphabet text.
UTF-8 is more compact than UTF-16 when representing Latin-alphabet text (which is used by English, French, German, Spanish, Swedish, etc), which I presume will be the most commonly-used writing system for meta-data.

If a program absolutely must operate on UTF-16 instead of UTF-8, it could simply translate UTF-8 to UTF-16; after all, they are simply two different ways to represent the same thing (a sequence of Unicode codepoints).

HyperLamer
<||bass> and this was the soloution i thought of that was guarinteed to piss off the greatest amount of people

Sesshomaru
Tamaranian

Level: 118

Posts: 3008/8210
EXP: 18171887
For next: 211027

Since: 03-15-04
From: Canada, w00t!
LOL FAD

Since last post: 2 hours
Last activity: 2 hours

Posted on 01-23-05 05:47 AM

Link | Quote

I like the metadata idea and being able to select things, but really, this has two major problems. One, it's becoming ridiculously complicated. Two, system-specific patch formats suck hard.
Really, some sort of script-based patch system would be a lot better. Doesn't need to be complicated, just capable of prompting for input, displaying output, and reacting to input in various ways (such as modifying different parts of the file, executing or skipping parts of the script, etc).

(edited by HyperHacker on 01-22-05 08:50 PM)

Geiger

Buster Beetle
Level: 34

Posts: 250/460
EXP: 241080
For next: 12571

Since: 03-15-04
From: Indianapolis, IN, USA

Since last post: 6 hours
Last activity: 6 hours

Posted on 01-23-05 05:54 AM

Link | Quote

In short, UTF-8 is a way to represent Unicode.

D'oh! Guess I should look abbreviations up every once in awhile. I remembered UTF7/8 in conjunction with the iso character set from the old pine mail days, but never made the connection to Unicode (probably because it never displayed correctly). My mistake.

---T.Geiger

Vystrix Nexoth

Level: 30

Posts: 233/348
EXP: 158678
For next: 7191

Since: 03-15-04
From: somewhere between anima and animus

Since last post: 3 days
Last activity: 2 days

Posted on 01-23-05 06:12 AM

Link | Quote

Originally posted by HyperHacker
I like the metadata idea and being able to select things, but really, this has two major problems. One, it's becoming ridiculously complicated. Two, system-specific patch formats suck hard.

UNRIF is tailored for the NES format and does so in a far better way than any general-purpose patching format ever could. The reason is a patcher would need to understand iNES and UNIF, or at least, be able to find the PRG-ROM and/or CHR-ROM (among other things). This is something you can't accomplish with a general-purpose format.

What UNRIF does is it separates conceptually different things. For example, data that edits PRG is marked as such. Then a patcher would look at it and say, "that there modifies PRG. I'd better find where the PRG is in the ROM", find it (if it can), and then apply the patch correctly. All IPS (or any other general-purpose binary-patching format) can say is some bytes at such-and-such location were edited.

And, if you'll notice, the issue of supporting other systems has already been raised, and right now I'm leaning towards other formats being devised for other systems. To what extent those will be necessary, I don't know, but either they're necessary (in which case it'll be a good thing to have them) or they're not (in which case something like IPS or IPS32 will suffice).

and I don't know about you, but parsing and executing a scripting language, particularly to the extent necessary to support iNES and UNIF (let alone any other formats), sounds much more complicated-- both for the person writing the patch program and the person creating the patch-- than UNRIF (which places the burden on the programmer, and even then less burden than with a scripting language). Furthermore, it places the burden of supporting various ROM formats on the patch itself, rather than simply saying "this is PRG-ROM data" and letting the patch-applying program figure out what to do with it.

Or, to put it more simply, which sounds like a more complicated thing for a patch to say?

To apply the patch to an iNES ROM, you assume the PRG is 16KB x whatever is set in the 5th byte of the iNES header, and then you apply the PRG data by adding 0x10 to each address, and CHR data by adding 0x10 + PRG size to each address. For UNIF ROMs, you scan the file looking for chunks (skipping over unrecognized ones by reading the length field (which is 32-bit unsigned, little-endian) and skipping over that many bytes), and if you find a "PRG0" chunk, apply the PRG patch to the stuff therein, and if you find a "CHR0" chunk, apply the CHR patch to the stuff therein.
Over here is PRG data. Over here is CHR data. Deal with it.

...and that's not even taking into account the matter of detecting whether the ROM is iNES or UNIF, the copy-from-source/-result methods, applying meta-data to UNIF NAME/READ, handling mapper changes, or ROM expansion.

(edited by Vystrix Nexoth on 01-22-05 09:28 PM)

Heian-794

Red Super Koopa
Level: 44

Posts: 616/896
EXP: 611014
For next: 271

Since: 06-01-04
From: Kyoto, Japan

Since last post: 21 days
Last activity: 10 days

Posted on 01-23-05 07:36 AM

Link | Quote

I have always been a big fan of Unicode and the 65,536 characters it offers, and think that having 00 in every other byte for most Latin characters is a small price to pay for having an easy-to-use standard. Unicode is also handy for accented Latin characters that ASCII inexplicably doesn't have -- such as o and u with macrons -- yet are really handy for transcribing Japanese titles in our alphabet. How does UTF-8 handle these?

Vystrix Nexoth

Level: 30

Posts: 234/348
EXP: 158678
For next: 7191

Since: 03-15-04
From: somewhere between anima and animus

Since last post: 3 days
Last activity: 2 days

Posted on 01-23-05 08:02 AM

Link | Quote

ASCII specifies only the 0x00-0x7F codepoints that are common to Unicode and the ISO-8859 series. It does not have any accented national characters. the ISO-8859 series accomodate these by filling in the 0x80-0xFF range in various ways. Unicode, of course, accomodates them all.

Unicode does not have merely 65,536 codepoints available, it has 1,112,064 codepoints available (0x110000, minus 0x800 for the U+D800..U+DFFF range that is reserved for the UTF-16 escape mechanism).

UTF-8 can represent any codepoint. so can UTF-16. therefore there is no difference in what UTF-8 and UTF-16 can represent (i.e. Unicode text), only how they actually do so.

the bytes-per-codepoint ratings for UTF-8 and UTF-16 are like so:

U+0000..U+007F (ASCII: A-Z, a-z, 0-9, and everyday punctuation): UTF-8 = 1 byte, UTF-16 = 2 bytes.
U+0080..U+07FF (Latin accented characters, Greek, Cyrillic, Hebrew, Arabic, and others): UTF-8 = 2 bytes, UTF-16 = 2 bytes.
U+0800..U+FFFF (many, but primarily Japanese/Chinese/Korean): UTF-8 = 3 bytes, UTF-16 = 2 bytes.
U+10000..U+10FFFF: UTF-8 = 4 bytes, UTF-16 = 4 bytes.

While UTF-8 is less efficient for e.g. Japanese text, Japanese requires fewer actual characters to represent a particular concept.

Latin-alphabet text, on the other hand (which accounts for English, French, German, Spanish, Swedish, among many others), generally takes one byte per character, or two for accented characters, so UTF-8, at the least, will break even with UTF-16.

And, again, UTF-8 is every bit as functional as UTF-16. It is simply another way to express the exact same thing; that thing being Unicode text.

(edited by Vystrix Nexoth on 01-22-05 11:08 PM)

dan

Snap Dragon
Level: 43

Posts: 361/782
EXP: 534516
For next: 30530

Since: 03-15-04

Since last post: 20 hours
Last activity: 14 hours

Posted on 01-23-05 04:15 PM

Link | Quote

I actually had a similar idea to Hyperhacker's in regards to the scripting language, for my attempted IPS replacement. However, any scripts that would be written would be compiled into a binary file, which would make a lot more sense than everyone who wants to write a patcher having to create a parser of a scripting language. (Not an easy thing to do, let's face it, implementing support for a file format is far easier than writing a tokenizing parser thingiemajobber)

However, for this patcher, a scripting language is not necessary. All there needs to be is some kind of standard logic for the patch creator, and it should be easy enough to implement. (ie. how to tell if a ROM is expanded, what to do if it is, etc)

Vystrix Nexoth

Level: 30

Posts: 235/348
EXP: 158678
For next: 7191

Since: 03-15-04
From: somewhere between anima and animus

Since last post: 3 days
Last activity: 2 days

Posted on 01-24-05 01:38 AM

Link | Quote

embedding the patching logic in the patch itself would be very much like embedding low-level, font-rendering logic (how to turn a font and some given text into actual pixels, including handling kerning, hinting, anti-aliasing, and of course the actual bezier shapes) in a .txt file.

a .txt file contains text. it is left up to the rendering system to handle the process of turning that into pixels.

an UNRIF file contains patch data separated according to purpose. it is left up to the patch-applying program to make use of that data.

in other words, like so many other file formats, UNRIF stores data. it does not store how to use that data.

dan

Snap Dragon
Level: 43

Posts: 364/782
EXP: 534516
For next: 30530

Since: 03-15-04

Since last post: 20 hours
Last activity: 14 hours

Posted on 01-24-05 02:23 AM

Link | Quote

No, I didn't mean to embed the logic inside the patch file. That would be ridiculous. I was proposing more a set of guidelines on how the patch creator should work out that a ROM was expanded and what to do in the patch.

Pages: 1 2

Add to favorites | "RSS" Feed | Next newer thread | Next older thread

Acmlm's Board - I2 Archive - Rom Hacking - UNRIF file format proposition

Page rendered in 0.018 seconds.