We knew that we have a lot of scanned docs inside our repositories, but there was a big issue where we needed to know which content was really an image. Because of the OCR made at runtime during the scan, now we know exactly the data we have. When you have zipped documents the scan will unzip them and find the data inside.