(hopefully it’s okay that I respond in English)
I have also been thinking about this. My main concern is: how would this affect Machine Learning models that need to interpret the images.
I think many compression algorithms are focused on how humans look at images. But it might negatively affect ML models.
I guess best is to keep the original images but just with the license plates / faces blurred/cut out. And then have low filesize / different sizes for the actual user front end. So bandwidth would be lowered but the original big files are still available for ML models for those that need it
This is risky, even with state of art’s models are making mistakes while blurring features. If you keep only the resulting pictures, you loose the possibility to retrain and reprocess an untainted dataset. For our uses, we keep the original with the auxiliary masks (COCO JSON).
different use cases usually means different products, diffrenet compressions hould be used depending on it (lossless for reuse, lossy(s) for daily use)
christian, peux-tu mettre tes commandes gdal ? nos essais de comparaison jpg/webp ne donnaient pas un avantage démesuré à ce dernier, pas assez en tout cas pour perdre les décennies d’optimisation de ce format avec des lib comme turbo-jpeg.
Look, I get it. Keeping the originals for the purposes of retraining. Makes a lot of sense from that perspective.
The problem is privacy. Storing all that privacy-sensitive information of a public space is a weird legal gray area. Which from my understanding in Belgium atleast for example isn’t legal (camera’s facing public roads are not allowed (video doorbells are weird little loophole)).
Second problem is that it creates a large liability in case the server they’re stored on gets hacked or configured incorrectly resulting in data leaks.
Mapillary had the advantage of being started before GDPR and such. So all the footage they had back then they used to make a … pretty good… face/licenseplate detection system. Now of course we have GDPR, and as far as we know publicly. Meta is keeping originals of images anymore. Only the blurred versions. Probably for similar reasons like I mentioned before. But their hard work was already done.
There’s a weird legal situation here that I honestly don’t really know how to solve.