| ↑↑ Home | ↑ Hardware |
Handling binary files is something which almost automatically comes with working close to the hardware. While full-blown computers can be controlled very well with ASCII files only, embedded systems' limited resources often entail the use of binary data. Similarly, data-intensive applications like audio, image and video processing still make use of binary storage formats.
One day I found myself needing to do a trivial transformation to every single
data word of a binary file. While Pascal Rigeaux's hexedit
is an extremely useful tool (and I have extended it yet a bit), large-scale
manual edits are a pain in any editor. I thought to myself, "If this was
ASCII, I'd be done in under ten seconds using sed or vi. At
the outside I'd have to write a short Perl script." I briefly considered
producing a hexdump of the binary, editing that and somehow transforming the
changed dump back to binary. But I had never heard of a dump-to-binary
converter and did not really want to write a C program for a task that I would
need done only once. (Update: xxd can convert back from hexdump to binary, so
xxd - sed - xxd might be an alternative.)
That's when I decided to write bined. It is comparable in concept to a sed for binary files. It takes a fragment of Perl code on the command line which is used to transform a hexdump of the binary file. The simplest such code fragment, sufficient for many applications, is a regular expression. Being a Perl script, bined is not terribly fast, but sufficient for medium-sized files as needed in my original application. If you can give it a bit of time, it is good enough for doing simple things to a few MBs or doing more complicated things to tens of kBs.
Get bined: Download
Make it executable, put it somewhere in your path, extract the manual page with pod2man or pod2html, and enjoy!
bined - a batch editor for binary files
bined digits perlfragment sourcefile destinationfile
bined allows you to modify binary files in an automated way. It reads a binary file, converts it to a hexdump, modifies it with user-provided Perl code and transforms the result back to binary. This can be useful for processing uncompressed images, media format files or any other type of binary files which you may have to deal with.
If digits is 0, no spaces are inserted between words, and the hexdump will just be a string of hexadecimal digits. This is considerably faster. The downside is that you have to keep track of any alignment you need yourself. Also, depending on what you do to the data, the speed of the conversion may be insignificant by comparison.
$_. Successive words of the hexdump are separated by
single spaces.
The result of the transformation has to be placed also in the special variable
$_. If the alignment is to be kept, the number of digits per word has to
remain the same. Adding or removing digits results in inserted or deleted
data.
All variables except @ARGV may be modified by the Perl fragment. The variable
$data still contains the binary contents of the source file, should you need
that.
Byte-swap a file consisting entirely of 16-bit data:
bined 0 's/(\w\w)(\w\w)/$2$1/g' source dest
The alignment is correct because successive regular expression matches are non-overlapping.
Convert a file containing 16-bit short integers to 32-bit integers (both little-endian):
bined 4 's/(\w+)/${1}0000/g' source dest
Convert a Portable Pixmap image which contains only grayscale data to a Portable Graymap image:
bined 2 's/^50 36/50 35/; s/(\w\w) \1 \1/$1/g' src.ppm dest.pgm
The first substitution changes the ``P6'' (meaning colour) in the header to ``P5'' (grayscale). The second substitutes the common brightness value for the three equal colour values. This rests on the assumption that the ASCII header does not contain three equal characters in succession, so this does not work on image sizes like 1000 or 666.
Read a file containing fixed-size records 42 bytes long and print the little-endian short value at offset 16 from each record:
bined 84 's/\b\w{32}(\w\w)(\w\w)\w*/$2$1/g; s/\s/\n/g; print $_; exit;' \
datafile /dev/null
The exit is not strictly necessary but saves a little time by preventing the
unnecessary conversion back to binary. (The backslash at the end of the first
line only serves to tell the shell that the line will be continued; it is not
required of you type all this in the same line.) Similarly, conversions
between different fixed-size record formats are possible.
Convert a hex dump to binary:
bined 2 '$_=""; while( defined(my $ln=<STDIN>) ) { $_ .= $ln; }' \
/dev/null hexdata.bin < hexdata.dump
The number of digits was chosen arbitrarily. It affects only the conversion of
the original file, which is a dummy here anyway. If your hexdump contains
hexadecimal offsets at the beginning of each line, you have to remove them
first, for instance with the command cut -f 2-.
Compute a histogram of the bytes of a (small) file:
bined 0 'my @histo; while( $data =~ s/^(.)//s )
{ ++$histo[unpack("C", $1)]; }
for (0..255) { print "$_\t\t", $histo[$_] || 0, "\n"; }' \
file /dev/null
The histogram is generated while removing successive bytes from the start of
the hexdump. The oct function converts its argument to a number. To process
multi-byte values, choose a different unpack code (see the perlfunc
manual page). For bytes, the ord function could also have been used. In
the printing loop, $_ is the loop variable, not the hexdump. The backslash
after the first lines was omitted because the shell knows the quotes have not
been closed yet.
Traverse a file containing variable-length records which contain their 16-bit length at offset 2, and print the short integer at offset 4:
bined 0 'while( length($data) ) { $data= substr($data,2);
my $len= unpack("S", $data); $data= substr($data,$len-2);
$len= ($len-6)*2; s/^.{8}(.{4}).{$len}//; print $1, "\n"; }' \
file /dev/null
hexedit(1), bvi(1), xxd(1), sed(1)
bined is too slow for serious binary processing, and for large files.
Bug reports are welcome to the e-mail `perl' at the domain volkerschatz.com.
bined is (c) 2008 Volker Schatz. It is free software and may be copied and/or modified under the same terms as Perl.