I think the document did a good job of introducing the idea of index hacking and provided a nice overview of all the associated concepts.
I am a little confused, however, on how the bin 'value' is currently determined. "Instead of calculating the bin values from the offsets, use the first chunk_beg and chunk_end to calculate the relative size of the bin." Is there a reason that only the first chunk is used?
To me, it would make the most sense to sum the differences between each chunk's data start and data end. Also, it seems that the start/end integers would need to first be converted into their two-part bit constituents (48 | 16) to then determine the chunk's actual length/size of compressed data rather than using the raw integer values.
I may just be confused, however. Please let me know what you think.
I created an additional figure showing how the chunks between various bins overlap. I was also able to determine the offset values from the chunks. It appears that each bins chunks overlap each other slightly. The offset value effectively takes the overlap into account so that the data are not be double-counted in the overlap section. It also makes clear that the first chunk is the most important chunk, and that the rest of the chunks are smaller and overlap with several other bins.
I added an additional description of how the actual byte values can be calculated from the virtual file offset.