#!/usr/bin/perl use strict; =pod =head1 NAME bined - a batch editor for binary files =head1 SYNOPSIS B I I I I =head1 DESCRIPTION B allows you to modify binary files in an automated way. It reads a binary file, converts it to a hexdump, modifies it with user-provided Perl code and transforms the result back to binary. This can be useful for processing uncompressed images, media format files or any other type of binary files which you may have to deal with. =head1 ARGUMENTS =over =item I Number of hexadecimal digits (not bytes) in each word of the hexdump. The order of the digits is always the same; no swapping of bytes will be done to account for little-endian storage conventions. Within each byte, the most significant digit comes first, and the bytes are printed in the order in which they appear in the file. If I is 0, no spaces are inserted between words, and the hexdump will just be a string of hexadecimal digits. This is considerably faster. The downside is that you have to keep track of any alignment you need yourself. Also, depending on what you do to the data, the speed of the conversion may be insignificant by comparison. =item I Perl code which transforms the hexdump. The original data is stored in the special Perl variable C<$_>. Successive words of the hexdump are separated by single spaces. The result of the transformation has to be placed also in the special variable C<$_>. If the alignment is to be kept, the number of digits per word has to remain the same. Adding or removing digits results in inserted or deleted data. All variables except @ARGV may be modified by the Perl fragment. The variable C<$data> still contains the binary contents of the source file, should you need that. =item I, I The file to modify, and the destination path to write to. The destination file will be overwritten without prompting. =back =head1 EXAMPLES =head2 Basic usage Byte-swap a file consisting entirely of 16-bit data: bined 0 's/(\w\w)(\w\w)/$2$1/g' source dest The alignment is correct because successive regular expression matches are non-overlapping. Convert a file containing 16-bit short integers to 32-bit integers (both little-endian): bined 4 's/(\w+)/${1}0000/g' source dest Convert a Portable Pixmap image which contains only grayscale data to a Portable Graymap image: bined 2 's/^50 36/50 35/; s/(\w\w) \1 \1/$1/g' src.ppm dest.pgm The first substitution changes the "P6" (meaning colour) in the header to "P5" (grayscale). The second substitutes the common brightness value for the three equal colour values. This rests on the assumption that the ASCII header does not contain three equal characters in succession, so this does not work on image sizes like 1000 or 666. =head2 Advanced usage Read a file containing fixed-size records 42 bytes long and print the little-endian short value at offset 16 from each record: bined 84 's/\b\w{32}(\w\w)(\w\w)\w*/$2$1/g; s/\s/\n/g; print $_; exit;' \ datafile /dev/null The C is not strictly necessary but saves a little time by preventing the unnecessary conversion back to binary. (The backslash at the end of the first line only serves to tell the shell that the line will be continued; it is not required of you type all this in the same line.) Similarly, conversions between different fixed-size record formats are possible. Convert a hex dump to binary: bined 2 '$_=""; while( defined(my $ln=) ) { $_ .= $ln; }' \ /dev/null hexdata.bin < hexdata.dump The number of digits was chosen arbitrarily. It affects only the conversion of the original file, which is a dummy here anyway. If your hexdump contains hexadecimal offsets at the beginning of each line, you have to remove them first, for instance with the command C. Compute a histogram of the bytes of a (small) file: bined 0 'my @histo; while( $data =~ s/^(.)//s ) { ++$histo[unpack("C", $1)]; } for (0..255) { print "$_\t\t", $histo[$_] || 0, "\n"; }' \ file /dev/null The histogram is generated while removing successive bytes from the start of the hexdump. The oct function converts its argument to a number. To process multi-byte values, choose a different C code (see the L manual page). For bytes, the C function could also have been used. In the printing loop, C<$_> is the loop variable, not the hexdump. The backslash after the first lines was omitted because the shell knows the quotes have not been closed yet. Traverse a file containing variable-length records which contain their 16-bit length at offset 2, and print the short integer at offset 4: bined 0 'while( length($data) ) { $data= substr($data,2); my $len= unpack("S", $data); $data= substr($data,$len-2); $len= ($len-6)*2; s/^.{8}(.{4}).{$len}//; print $1, "\n"; }' \ file /dev/null =head1 SEE ALSO B(1), B(1), B(1), B(1) =head1 BUGS B is too slow for serious binary processing, and for large files. Don't report any bugs of bined any more, as I am in the process of writing an automated binary editor in C that will be faster and more powerful. =head1 LICENSE B is (c) 2008 Volker Schatz. It is free software and may be copied and/or modified under the same terms as Perl. =cut unless( @ARGV == 4 ) { print STDERR "Usage: bined <# digits> \n"; print STDERR <; } close IN; if( $ARGV[0] ) { $_= join " ", unpack("(H$ARGV[0])*", $data); } else { ($_)= unpack("H*", $data); } # print $_, "\n"; exit; eval $ARGV[1]; die "Could not run transform: $@" if $@; s/\s+//g; # print $_, "\n"; exit; open OUT, ">$ARGV[3]" or die "Could not open `$ARGV[3]' for writing: $@"; print OUT pack("H*", $_); close OUT;