↑↑ Home ↑ Hardware

bined - an automated binary editor

Handling binary files is something which almost automatically comes with working close to the hardware. While full-blown computers can be controlled very well with ASCII files only, embedded systems' limited resources often entail the use of binary data. Similarly, data-intensive applications like audio, image and video processing still make use of binary storage formats.

One day I found myself needing to do a trivial transformation to every single data word of a binary file. While Pascal Rigeaux's hexedit is an extremely useful tool (and I have extended it yet a bit), large-scale manual edits are a pain in any editor. I thought to myself, "If this was ASCII, I'd be done in under ten seconds using sed or vi. At the outside I'd have to write a short Perl script." I briefly considered producing a hexdump of the binary, editing that and somehow transforming the changed dump back to binary. But I had never heard of a dump-to-binary converter and did not really want to write a C program for a task that I would need done only once. (Update: xxd can convert back from hexdump to binary, so xxd - sed - xxd might be an alternative.)

That's when I decided to write bined. It is comparable in concept to a sed for binary files. It takes a fragment of Perl code on the command line which is used to transform a hexdump of the binary file. The simplest such code fragment, sufficient for many applications, is a regular expression. Being a Perl script, bined is not terribly fast, but sufficient for medium-sized files as needed in my original application. If you can give it a bit of time, it is good enough for doing simple things to a few MBs or doing more complicated things to tens of kBs.

Get bined: Download

Make it executable, put it somewhere in your path, extract the manual page with pod2man or pod2html, and enjoy!


bined's manual page



NAME

bined - a batch editor for binary files


SYNOPSIS

bined digits perlfragment sourcefile destinationfile


DESCRIPTION

bined allows you to modify binary files in an automated way. It reads a binary file, converts it to a hexdump, modifies it with user-provided Perl code and transforms the result back to binary. This can be useful for processing uncompressed images, media format files or any other type of binary files which you may have to deal with.


ARGUMENTS

digits
Number of hexadecimal digits (not bytes) in each word of the hexdump. The order of the digits is always the same; no swapping of bytes will be done to account for little-endian storage conventions. Within each byte, the most significant digit comes first, and the bytes are printed in the order in which they appear in the file.

If digits is 0, no spaces are inserted between words, and the hexdump will just be a string of hexadecimal digits. This is considerably faster. The downside is that you have to keep track of any alignment you need yourself. Also, depending on what you do to the data, the speed of the conversion may be insignificant by comparison.

perlfragment
Perl code which transforms the hexdump. The original data is stored in the special Perl variable $_. Successive words of the hexdump are separated by single spaces.

The result of the transformation has to be placed also in the special variable $_. If the alignment is to be kept, the number of digits per word has to remain the same. Adding or removing digits results in inserted or deleted data.

All variables except @ARGV may be modified by the Perl fragment. The variable $data still contains the binary contents of the source file, should you need that.

sourcefile, destinationfile
The file to modify, and the destination path to write to. The destination file will be overwritten without prompting.


EXAMPLES

Basic usage

Byte-swap a file consisting entirely of 16-bit data:

    bined 0 's/(\w\w)(\w\w)/$2$1/g' source dest

The alignment is correct because successive regular expression matches are non-overlapping.

Convert a file containing 16-bit short integers to 32-bit integers (both little-endian):

    bined 4 's/(\w+)/${1}0000/g' source dest

Convert a Portable Pixmap image which contains only grayscale data to a Portable Graymap image:

    bined 2 's/^50 36/50 35/; s/(\w\w) \1 \1/$1/g' src.ppm dest.pgm

The first substitution changes the ``P6'' (meaning colour) in the header to ``P5'' (grayscale). The second substitutes the common brightness value for the three equal colour values. This rests on the assumption that the ASCII header does not contain three equal characters in succession, so this does not work on image sizes like 1000 or 666.

Advanced usage

Read a file containing fixed-size records 42 bytes long and print the little-endian short value at offset 16 from each record:

    bined 84 's/\b\w{32}(\w\w)(\w\w)\w*/$2$1/g; s/\s/\n/g; print $_; exit;' \
             datafile /dev/null

The exit is not strictly necessary but saves a little time by preventing the unnecessary conversion back to binary. (The backslash at the end of the first line only serves to tell the shell that the line will be continued; it is not required of you type all this in the same line.) Similarly, conversions between different fixed-size record formats are possible.

Convert a hex dump to binary:

    bined 2 '$_=""; while( defined(my $ln=<STDIN>) ) { $_ .= $ln; }' \
                /dev/null hexdata.bin < hexdata.dump

The number of digits was chosen arbitrarily. It affects only the conversion of the original file, which is a dummy here anyway. If your hexdump contains hexadecimal offsets at the beginning of each line, you have to remove them first, for instance with the command cut -f 2-.

Compute a histogram of the bytes of a (small) file:

    bined 0 'my @histo; while( $data =~ s/^(.)//s )
                { ++$histo[unpack("C", $1)]; }
            for (0..255) { print "$_\t\t", $histo[$_] || 0, "\n"; }' \
            file /dev/null

The histogram is generated while removing successive bytes from the start of the hexdump. The oct function converts its argument to a number. To process multi-byte values, choose a different unpack code (see the perlfunc manual page). For bytes, the ord function could also have been used. In the printing loop, $_ is the loop variable, not the hexdump. The backslash after the first lines was omitted because the shell knows the quotes have not been closed yet.

Traverse a file containing variable-length records which contain their 16-bit length at offset 2, and print the short integer at offset 4:

    bined 0 'while( length($data) ) { $data= substr($data,2);
            my $len= unpack("S", $data); $data= substr($data,$len-2);
            $len= ($len-6)*2; s/^.{8}(.{4}).{$len}//; print $1, "\n"; }' \
            file /dev/null


SEE ALSO

hexedit(1), bvi(1), xxd(1), sed(1)


BUGS

bined is too slow for serious binary processing, and for large files.

Bug reports are welcome to the e-mail `perl' at the domain volkerschatz.com.


LICENSE

bined is (c) 2008 Volker Schatz. It is free software and may be copied and/or modified under the same terms as Perl.