Register | Login
Views: 19364387
Main | Memberlist | Active users | ACS | Commons | Calendar | Online users
Ranks | FAQ | Color Chart | Photo album | IRC Chat
11-02-05 12:59 PM
0 user currently in Programming. | 3 guests
Acmlm's Board - I2 Archive - Programming - free() crashes program
  
User name:
Password:
Reply:
 

UserPost
HyperLamer
Posts: 3554/8210
Er, I forgot to include the part that does that. This segment just finds the best match. I actually got it working the way it should be, but it looks like there's still other tricks being used here. They must have bought this program off someone, it's far too efficient for them.
Parasyte
Posts: 333/514
Where does that code "skip however many bytes matched in the input"? It appears to be doing exactly what it should...
Shame the format only supports 18 bytes with a window on 4096 bytes. That just sounds like bad design.
HyperLamer
Posts: 3553/8210
The LZ data uses 4 bits for length and 12 for position, so the window is 4096 bytes. It adds 3 to the length because this is the minimum number that would do any good (the LZ data itself is 2 bytes), so the data has to match between 3 and 18 bytes. It doesn't allow for any other type of compression.

This is the code so far that finds the best match:
BestMatch = 0;
for(i=4096;i>0;i--) //Search the last 4096 bytes for it
{
if((InPos - i) >= 0) //Don't try to look before the beginning of the file
{
j = 0;
Match = true;
while((Match == true) && (j < 18))
{
if(InputFileData[(InPos - i) + j] != InData[j])
Match = false; //Exit for
else
j++;
}
if((j > BestMatch) && (j > 0)) //New best match
{
BestMatch = j;
BestMatchLoc = i; //Relative position
}
if(BestMatch >= 18) i = 0; //Can't get any better, may as well stop looking
}
}

Whenever a match is found, it skips however many bytes matched in the input. If there isn't one it just includes one byte raw and tries again with the next 18 bytes. What I imagine I need to do is loop this entire segment so that instead of skipping the bytes that were matched, it checks them as well for bigger matches. The only part of this I don't really understand is how this can work if the number of bytes being skipped varies potentially every time the loop executes. (I'm gonna keep trying stuff anyway, though. )
Parasyte
Posts: 332/514
Uhh, use a few nested while loops to search for "best match". You generally have two pointers in an LZ-type compression routine; source and destination. You need to search through the 'window' location which is allocated for the copy (source). This is usually so-many-bytes (or all) of the previously outputted data. The outer loop will search for a matching byte in the output (destination). The inner loop then checks how many bytes match, and records the result. Both loops are allowed to finish the search (without breaking or returning), recording results along the way. When completed, you are left with a "best result" to use. So if location 0x0110 had 5 matching bytes, location 0x01AC had 8 matching bytes, and location 0x0201 had 7 matching bytes, the best result would be 0x01AC.

This will completely eliminate the problem you are experiencing. You seem to be checking for a hard-coded number of matches, rather than searching for an "infinite" (not really, but close) number of matches. And while loops help accomplish that.

The real problem with writing a compression routine is adjusting the weight of the compression codes vs. the uncompressed data. If you have a string of 2 bytes which are all 0x00, and the compression code for RLE takes up 3 bytes (RLE ID, Length, Data) then you're not really going to get a very good compression ratio with that. So it's imparative that you decide which cases will work best and use them. Another example: If you have the ability to use RLE in 2 bytes, or LZ in 3 bytes, you must balance the decision of which to use. "Can I squeeze this string down into two RLE's, or one LZ?" Two RLE's will result in a worse ratio.
Using LZ to copy it's own output is a good way to output many matches. If your source is at 0x0100 and destination is at 0x0101, then you're only going to copy one byte to output. But if you copy more than one byte, you end up with an RLE-type compression which can generally have a much larger output than RLE allots. RLE may only allow 128 bytes output per compression code. The LZ may allow 256, or 512, etc. So you can see how that idea could really come in handy to reduce the compressed output size.

I've run into these problems quite a lot. And it just takes some really daring programming practices to make it work. For an example, you can look at the compressor and decompressor I wrote for Spartan X 2 (Kung Fu 2) translation (sources included, of course): http://abstractcrouton.emuxhaven.net/utils/kf2pack/kf2pack.html
HyperLamer
Posts: 3539/8210
Cool, thanks. Now though, I've run into something more difficult. (Frustratingly enough, I know exactly what's wrong, just not how to go about fixing it. ) I'm writing a program that compresses data into a format used in some N64 games. The compression works, but isn't as efficient as the original (decompressing the data in a game, then re-compressing it, yields a bigger (though still good) compressed file.) Looking at the original file, I noticed that Nintendo's compressor appears to skip small matches in favour of large ones. (The program looks through previous data in the input file for a match of no less than 3 and no more than 18 bytes to the 18 bytes at the current position; an LZ77-style compression.) For example, my program finds that 3 bytes at 0x20E are identical to the 3 at 0x199, so it writes that into the output and skips along to 0x211. However, Nintendo's compressor appears to have skipped this, because by doing so, it can see that 5 bytes at 0x20F are identical to 0x1DD. Doing this improves the compression ratio, the question is... How would I go about doing it? (I'd try, but I don't have the time. Today's been so busy, meetings and dentist appointments and so on. )
Parasyte
Posts: 329/514
Originally posted by neotransotaku
... and why shouldn't you write before the end of the file with a? I mean, the mode is there to allow you to append. Otherwise, you ahve to figure out before hand how big the file size is before you can fseek to the end for writing.


Simple answer: Because the mode is for appending data, not overwriting pre-existing data. ;P

A note about the 'a+' mode (reference: http://www.cplusplus.com/ref/cstdio/fopen.html) -- "Open a file for reading and appending. All writing operations are done at the end of the file protecting the previous content to be overwritten. You can reposition (fseek, rewind) the pointer to anywhere in the file for reading, but writing operations will move back to the end of file. The file is created if it doesn't exist."

Therefore, you must use 'r+' for reading and writing. (And yes, you can APPEND data to the file in this mode as well.) The only problem with it is that the file must exist, so you should check the return value to see if the function failed. And if failed, create the file ("wb") and try again. As a matter of fact, you could use freopen() to skip closing and opening it after creation. For example:

//Handle files that do not exist
OutputFile = fopen(OutputFileName, "r+b");
if (!OutputFile) {
OutputFile = fopen(OutputFileName, "wb");
if (!OutputFile) return 1; //Could not create file
OutputFile = freopen(OutputFileName, "r+b", OutputFile);
if (!OutputFile) return 1; //Could not open file
}
Dish
Posts: 290/596
Originally posted by ||bass
Why aren't you just using the fstream classes? It would be a hell of alot easier and it's much harder to f-up by accident.


meh... the new method of I/O streams was one thing C++ tried to improve on that I REALLY just didn't like.

Give me FILE* over ifstream/ofstream any day of the week.
||bass
Posts: 208/817
Why aren't you just using the fstream classes? It would be a hell of alot easier and it's much harder to f-up by accident.
neotransotaku
Posts: 2520/4016
i'm not sure what you are doing, HH.. a and a+ aren't supposed to clobber an existing file. and why shouldn't you write before the end of the file with a? I mean, the mode is there to allow you to append. Otherwise, you ahve to figure out before hand how big the file size is before you can fseek to the end for writing.
Dish
Posts: 289/596
Try "r+b" then. opens and allows for writing, but doesn't erase anything. I thought you could add to the end of the file too....
HyperLamer
Posts: 3538/8210
I always thought you couldn't (or at least shouldn't) write before the end of the file with a. And fseek() doesn't help much when just opening the file deletes it.
Dish
Posts: 288/596
you can't write past the end with "r+b"? nm then.

I knew you could seek back to the start of the file in "a" mode... I just didn't think you were supposed to.
neotransotaku
Posts: 2517/4016
with fseek, you can write anywhere.

Using the a allows you to keep on writing to an existing file. Otherwise, you would have to open a file, read it all, write the file again, and then you can start writing to it. So, having the a option reduces the amount you have to write.
Dish
Posts: 287/596
"r+b"

r = read permission (open without clear)
+ = add write permission
b = binary mode

"ab" may also work, although messing with data before the end of file might be taboo. Basically "a" is only for when you wish to append data to the end of the file and nothing else. or at least that's what I always thought.
neotransotaku
Posts: 2515/4016
there is a wb+?

anyways, based on the documentation here for the fopen I'm familar with, just change the w to an a
HyperLamer
Posts: 3537/8210
Yeah, I got it. Though another, presumably simple problem arises. How do I open a file in binary mode without deleting it? Using wb+ deletes the old file; I want to modify the file instead, or create one if there isn't one there already.
Dish
Posts: 286/596
Originally posted by HyperHacker
Hmm, that seems to have been the problem. (Had a for loop writing to it without checking if it was at the end. ) Though I don't see why it would only happen in debug mode, the loop executes regardless.


If it's crashing in debug mode, it's likely causing unseen problems in Release even if it doesn't necessarily crash. It's definatly something you should look into remedying. malloc/free likely have different bodies depending on which type of build you do (it might be a little more strict/cautious in a dubug build, for example)


Or for that matter, why it would run fine until I try to free it.


malloc() actually allocates more memory than you specify, and stores other information about the allocated buffer near the supplied pointer. If this data gets corrupted by an out of bounds write, it may not be noticable until free() looks at the data and says "wait a minute, something ain't right here". Free may potential try to free memory which hasn't been allocated by your program due to false/corrupt info, which could cause a crash.

This really is sounding like an out of bounds problem... keep looking for something to that effect. Try commenting out sections of your code to try and isolate which section is causing the corruption.

edit:

or nm.... is the problem solved? I re-read your post and it sounded like you fixed it
HyperLamer
Posts: 3536/8210
Hmm, that seems to have been the problem. (Had a for loop writing to it without checking if it was at the end. ) Though I don't see why it would only happen in debug mode, the loop executes regardless. (The only thing debug mode does besides a bunch of printf()s is filling the memory block with a known value, and I triple-checked that there wasn't a problem there.) Or for that matter, why it would run fine until I try to free it.
Dish
Posts: 285/596
First problem that comes to mind is that you're overstepping the bounds of the buffer somewhere and corrupting memory outside what you allocated. Double check all your OutputFileData writes to make sure you stay in bounds. Sometimes it can be tricky to find these things.

I might be able to help you find it with the full source (provided its not huge)
HyperLamer
Posts: 3535/8210
This is an odd one... free() seems to be crashing my program with certain-size memory blocks.

First, I open a file and allocate some memory:
OutputFile = fopen(OutputFileName, "wb+");
[...]
OutputFileData = malloc(OutputSize);


Next, write some stuff into the memory, and dump it into the file:
OutputFileData[OutPos] = InputFileData[RawPtr];
[...]
fwrite(OutputFileData,1,OutputSize,OutputFile); //Write the output
fclose(OutputFile);


Finally, free the memory:
free(OutputFileData);

The program runs, and the file output works fine, but it crashes at the free() statement. It's a console program, so none of that Windows API stuff. The strangest part, though, is that it doesn't crash every time. Various applications produce output files of 44 bytes, 51200 bytes and 100878 bytes. It only crashes with 51200, and does so every time. (Meanwhile the file itself is 100% intact.)

[edit] Interesting... The problem seems to be related to printf(). When the program is in debug mode it prints various extra information, as well as initializing the memory to a known value (which isn't required, but helps to detect when too much is being output). With debug mode off, it doesn't crash, but I tried leaving it on and commenting out each debug mode statement one by one, and it still crashed. So somehow the problem must be related to multiple printf() statements?

BTW, the code used to initialize the memory:

if(DebugMode)
{
for(i = 0; i < OutputSize; i++)
{
OutputFileData[ i ] = 0x69;
}
}

*kicks code tag* Stop eating my freakin sig!
Acmlm's Board - I2 Archive - Programming - free() crashes program


ABII


AcmlmBoard vl.ol (11-01-05)
© 2000-2005 Acmlm, Emuz, et al



Page rendered in 0.030 seconds.