Bunnings site rip

Some time ago I foolishly volunteered to perform a site rip of https://www.bunnings.com.au/ for the local SES group I am a member of. This was to allow our accountant member to more accurately assign a value to our assets. I understand this is an important thing for an accountant.

I have done a number of site rips in the past, the Bunnings site is probably the most painful so far. The product pages are very complex for what they are.

Each Bunnings product page is roughly 300k. I extracted 1.1k of content from each page. So 99.63% of it basically useless, or an efficiency rate of 0.4%. The vast majority of the space is taken up by the nested menu at the top, the ads near the bottom take a bit and then there is a fairly extensive site map across the bottom. At least the CSS is in an external file, well, four of them.

There is a mobile website which is a bit slimmer. I think the page served is triggered by browser fingerprinting and cookies. I didn’t discover it until too late though.

There are also two different HTML structures used for product pages, they look similar but have different tags with different classes.

And a fun trick, these two links go to the same page:
https://www.bunnings.com.au/romak-m6-high-tensile-course-hex-nut-10-pack_p1100797
https://www.bunnings.com.au/nobody-nibbles-nuts-like-noddy_p1100797

That trick gets less awesome when you realise that they actually do this and link to the same product with different urls, 626 times.

In case anyone else is feeling foolish enough to try this themselves, and brave enough to look at my code, the end result of my trials and tribulations is on github. All the mistakes have of course been purged from the history so it looks like I just brilliantly did it in one go.

https://github.com/lod/bunnings-siterip

Install Jammer Extractor

I recently spent a few days reverse engineering an Install Jammer generated binary installer, specifically the LPCXpresso installer supplied by NXP. The goal was to try and install the program without running the binary installer as root. I managed to create a perl script which unpacks the install files into a local directory.

UPX

One of the first things I noticed when examining the installer was a UPX header

00000070: 0010 0000 ea2d 27a5 5550 5821 e811 0d0c  .....-'.UPX!....

I hadn’t played with UPX before but it is a system to compress executable files. There are two parts, a program which compresses the executable and a decompression program which gets prepended to the compressed file.

When the executable is run it uncompresses the payload and restarts the execution at the start of the new executable.

UPX is an open source project with some nice tools. Specifically they provide a program which can read the UPX headers and provide information and decompress the binary. They strongly advocate not messing things up so that these tools can function.

Unfortunately all the leading google results, stack overflow entries and forum queries are centered around preventing people from uncompressing the binary. Given the way UPX works it is easy to slightly modify the decompilation and compilation process in a way that causes incompatibility. UPX also makes a special effort to allow GDB to work, which is easy to sabotage. These things contribute to make UPX very popular with virus writers as a masking element.

Naturally Install Jammer did all of this. I extracted the UPX header by hand but it refers to a compression scheme which doesn’t exist in the original program. The sections and section headers that UPX uses are missing or masked, a commonly recommended technique to prevent decompression. Attempting to run using GDB didn’t provide any useful information.

It should be possible to extract the assembler instructions and figure out or run the decompression routine. However that was beyond me and I found an easier approach.

Install Jammer Extractor

The Install Jammer program which generates the final install binaries comes with binary blobs that are prepended to the final installer.

This precompiled program looks at the rest of the file and extracts from it the install files. Looking at the strings there are what looks like file names in the install blob.

I simplified the problem by creating an Install Jammer installer of my own containing a small collection of scripts.

Inside the generated binary is a section with the following lines (there are actually two, identical sections… no idea why):

0015af60: 0000 0000 0000 0000 0000 0000 0046 494c  .............FIL
0015af70: 455a 4c30 3637 3239 3039 412d 3946 3236  EZL0672909A-9F26
0015af80: 2d33 4539 312d 4242 4546 2d30 3241 3230  -3E91-BBEF-02A20
0015af90: 3633 3238 3639 3200 0000 0000 0000 0000  6328692.........
0015afa0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015afb0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015afc0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015afd0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015afe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015aff0: 0000 0034 3431 0000 0000 0000 0000 0032  ...441.........2
0015b000: 3632 0000 0000 0000 0000 0031 3437 3337  62.........14737
0015b010: 3432 3339 3400 0000 0031 3137 3830 3031  42394....1178001

It looks like a filename and several numbers encoded as strings, I found the filename portion in one of the intermediate files generated in the installer generation, Linux-x86-files.tcl, this allows much of the detail to be identified. The compressed address and size refer to the position and size of a blob within the install binary, this was confirmed by sequencing multiple adjacent entries.

File ::0672909A-9F26-3E91-BBEF-02A206328692 -name compiles.t -parent 81FF3CF4-D2FD-4649-FA7F-C2640F59BE65 -directory <%InstallDir%>/t -size 441 -mtime 1473742394 -permissions 00644 -filemethod 0
FILE start marker
ZL flag
0672909A-9F26-3E91-BBEF-02A206328692 id string
441 extracted size
262 compressed size
1473742394 mtime
1178001 blob address (#11F96F)

ZLib files

I extracted the compressed blob and grabbed the matching uncompressed file. I tried several different compression techniques on the uncompressed file and tried matching them to the extracted blob. Zlib, attempted due to the ZL flag, was a very close match. Below is an example very small file.:

> zlib-flate -compress < original_file | xxd
00000000: 789c 2b4a 4db3 5228 4a4d 2bd6 2f4a cdcd  x.+JM.R(JM+./J..
00000010: 2f49 2dd6 cf2f ca4c cfcc d3cf 4d2c 2e49  /I-../.L....M,.I
00000020: 2de2 0200 c596 0bf2                      -.......

Extracted blob, lined up to match:

00000000: 0000 2b4a 4db3 5228 4a4d 2bd6 2f4a cdcd  ..+JM.R(JM+./J..
00000010: 2f49 2dd6 cf2f ca4c cfcc d3cf 4d2c 2e49  /I-../.L....M,.I
00000020: 2de2 0200                                -...

The ZLib header and footer are both missing. The header sets the compression method and options such as the dictionary to use. Adding the standard header bytes allowed the extracted blob to uncompressed using zlib-flate -uncompress. The four byte footer is a checksum which seems to be optional.

This technique allowed all the install files to be extracted however their names and structure of the directory tree were lost.

LZMA files

Along with the ZLib compressed install files are a bunch of tcl files with an LZ flag. These have full names and seem to be the files necessary to run the installer, including files for tcl and the necessary libraries.

The tcl files are not from my system, some of them have different versions or do not exist at all. I chose iso8859-3.enc to examine, assuming that it was likely to be the same as my version.

I assumed the encoding used was LZMA (Lempel–Ziv–Markov chain algorithm) partially because I had noticed a binary library called craplzma in the Install Jammer application files. Unfortunately LZMA is, like the name suggests, an algorithm which is used by multiple different archivers such as 7-Zip, LZip, XZ and more. Most of the archive containers specify how to store multiple files but for a single file it turns out you can just tack the appropriate header on and any program will extract it.

The header that matched most closely was LZMA alone or LZMA1. Which is conveniently supported by the Perl Compress::Raw::Lzma module. :

cat /usr/share/tcltk/tcl8.5/encoding/iso8859-3.enc | lzma -z | xxd
00000000: 5d00 0080 00ff ffff ffff ffff ff00 1188  ]...............
00000010: 0528 b979 d70b 91f8 28ae b6ac 59fc 1cbb  .(.y....(...Y...

Extracted blob, first line:

5d00 0080 0000 1188 0528 b979 d70b 91f8

The LZMA file format defined a header:

  • 2 bytes properties
  • 4 bytes dictionary size
  • 8 bytes uncompressed size

Our extracted blob is missing the uncompressed size field. Fortunately passing a size of FFFF FFFF to the decompression routine indicates an unknown size, splicing this field in allowed all the LZ flagged files to be extracted.

TCL scripts

Install Jammer is largely a TCL project, I believe it is a C++ base which uses TCL to perform the GUI tasks, allow scriptable extension and do most of the work.

The intermediate files created by the installer build process include a bunch of TCL generated scripts, these scripts rename the extracted files from their stored ID names to the final name. They also create the directories, symlinks if required and set the mtime for the files. It looks like the script is meant to set the permissions for the files but this doesn’t actually work, everthing is set to 777, there is no facility to set the ownership.

Extracting the files from the installer this script can be found in main2.tcl for my generated file or main.tcl for the lpcxpresso installer. I ended up just processing every root directory tcl script to be safe.

The tcl script contains lines like the following which are fairly simple to parse. By combining these lines with the entry table extracted from the installer binary each file can be extracted, decompressed and placed in the appropriate location.

File ::4D49D586-0ADF-966C-3FC4-8DB31B47B741 -name dumpio2curl -parent 81FF3CF4-D2FD-4649-FA7F-C2640F59BE65 -directory <%InstallDir%> -type dir -permissions 040755 -filemethod 0
File ::381BB57B-2E9F-3012-F9BB-C1752B423A6E -name .travis.yml -parent 81FF3CF4-D2FD-4649-FA7F-C2640F59BE65 -directory <%InstallDir%> -size 164 -mtime 1473742394 -permissions 00644 -filemethod 0

The last step was to parse the tcl script for info variable block. This gives the variables such as InstallDir which are embedded in the File entry. Several of these variables would typically be set by the install wizard, we support this by allowing the user to pass values on the command line, either to customise the install or provide variables which are missing.

References

Weekly Wrap 18-25 April

Marbled butter biscuits

Work

  • Not much public work to show, mostly investigating potential manufacturing partners.
  • Farnell order is dribbling in, my PCBs and AliExpress orders have been shipped. All the pieces should be ready when I get back next week.

Play

  • Went to Canberra on Friday to catch up with folk and party, staying for the week.
  • Discovered Coconuts Duo, amazingly fun to play and spectate. It’s probably even fun sober.
  • Made marbled butter biscuits (pictured). Annoyingly fragile to being burnt but fortunately I made so many that after chucking 15% I still needed two containers to store them all.

Other

Last week’s wrap

Return top