- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 282字
- 2025-04-04 13:36:46
How to do it...
In the following recipe, we tamper with a binary file. We then compare it to the original to see that ssdeep determines that the two files are highly similar but not identical:
- First, we download the latest version of Python, python-3.7.2-amd64.exe. I am going to create a copy, rename it python-3.7.2-amd64-fake.exe, and add a null byte at the end:
truncate -s +1 python-3.7.2-amd64-fake.exe
- Using hexdump, I can verify that the operation was successful by looking at the file before and after:
hexdump -C python-3.7.2-amd64.exe |tail -5
This results in the following output:
018ee0f0 e3 af d6 e9 05 3f b7 15 a1 c7 2a 5f b6 ae 71 1f |.....?....*_..q.|
018ee100 6f 46 62 1c 4f 74 f5 f5 a1 e6 91 b7 fe 90 06 3e |oFb.Ot.........>|
018ee110 de 57 a6 e1 83 4c 13 0d b1 4a 3d e5 04 82 5e 35 |.W...L...J=...^5|
018ee120 ff b2 e8 60 2d e0 db 24 c1 3d 8b 47 b3 00 00 00 |...`-..$.=.G....|
The same can be verified with a second file using the following command:
hexdump -C python-3.7.2-amd64-fake.exe |tail -5
This results in the following output:
018ee100 6f 46 62 1c 4f 74 f5 f5 a1 e6 91 b7 fe 90 06 3e |oFb.Ot.........>|
018ee110 de 57 a6 e1 83 4c 13 0d b1 4a 3d e5 04 82 5e 35 |.W...L...J=...^5|
018ee120 ff b2 e8 60 2d e0 db 24 c1 3d 8b 47 b3 00 00 00 |...`-..$.=.G....|
018ee130 00 |.|
018ee131
- Now, I will hash the two files using ssdeep and compare the result:
import ssdeep
hash1 = ssdeep.hash_from_file("python-3.7.2-amd64.exe")
hash2 = ssdeep.hash_from_file("python-3.7.2-amd64-fake.exe")
ssdeep.compare(hash1, hash2)
The output to the preceding code is 99.