How to do it...

In the following recipe, we tamper with a binary file. We then compare it to the original to see that ssdeep determines that the two files are highly similar but not identical:

  1. First, we download the latest version of Python, python-3.7.2-amd64.exe. I am going to create a copy, rename it python-3.7.2-amd64-fake.exe, and add a null byte at the end:
truncate -s +1 python-3.7.2-amd64-fake.exe
  1. Using hexdump, I can verify that the operation was successful by looking at the file before and after:
hexdump -C python-3.7.2-amd64.exe |tail -5

This results in the following output:


018ee0f0 e3 af d6 e9 05 3f b7 15 a1 c7 2a 5f b6 ae 71 1f |.....?....*_..q.|
018ee100 6f 46 62 1c 4f 74 f5 f5 a1 e6 91 b7 fe 90 06 3e |oFb.Ot.........>|
018ee110 de 57 a6 e1 83 4c 13 0d b1 4a 3d e5 04 82 5e 35 |.W...L...J=...^5|
018ee120 ff b2 e8 60 2d e0 db 24 c1 3d 8b 47 b3 00 00 00 |...`-..$.=.G....|

The same can be verified with a second file using the following command:

hexdump -C python-3.7.2-amd64-fake.exe |tail -5

This results in the following output:

018ee100  6f 46 62 1c 4f 74 f5 f5  a1 e6 91 b7 fe 90 06 3e  |oFb.Ot.........>|
018ee110 de 57 a6 e1 83 4c 13 0d b1 4a 3d e5 04 82 5e 35 |.W...L...J=...^5|
018ee120 ff b2 e8 60 2d e0 db 24 c1 3d 8b 47 b3 00 00 00 |...`-..$.=.G....|
018ee130 00 |.|
018ee131

  1. Now, I will hash the two files using ssdeep and compare the result:
import ssdeep

hash1 = ssdeep.hash_from_file("python-3.7.2-amd64.exe")
hash2 = ssdeep.hash_from_file("python-3.7.2-amd64-fake.exe")
ssdeep.compare(hash1, hash2)

The output to the preceding code is 99.