Stockage optimisé de photos

Suite à une discussion en cours sur les besoins de stockage de l’outil Géovisio (Geovisio et stockage - OpenStreetMap France - Forum OSM France) je partage ici quelques réflexions.

Les images auront besoin d’être accédées:

  • à différentes résolutions
  • de façon partielles (tuilage) pour réduire le bande passante en consultation
  • tout en occupant le moins de place possible en stockage

J’ai fait un rapide test pour voir si le format COG et la compression webp (bien plus efficacge que le JPEG) pouvaient être une option et voici ce que ça donne :

Piste COG/webp à creuser très sérieusement !

Sur des photos 12MPixels (Gopro) un gdal_translate en COG/webp en qualité par défaut à 75 et AVEC overviews donne:

  • 6.5Mo → 1.3Mo (-80%)
  • 5.7Mo → 1.1Mo
  • 3.4Mo → 343Ko (-90%)

On se retrouve avec un fichier unique sur disque, tuilé en interne (blocksize par défaut de 512 pixels), multi-résolution et au transfert réseau limité aux tuiles/résolution utiles (merci le COG).

La lib javascript permet de gérer ce format nativement dans un navigateur.

Côté serveur, un nginx ou apache de base sert le COG sans rien faire (le client utilise les byte-range HTTP pour ne charger que le nécessaire).

Qui dit mieux ?


(hopefully it’s okay that I respond in English)
I have also been thinking about this. My main concern is: how would this affect Machine Learning models that need to interpret the images.
I think many compression algorithms are focused on how humans look at images. But it might negatively affect ML models.

I guess best is to keep the original images but just with the license plates / faces blurred/cut out. And then have low filesize / different sizes for the actual user front end. So bandwidth would be lowered but the original big files are still available for ML models for those that need it

1 Like

AFAIK, ML usually use low resolution version of the original images in many cases. Recompression should not affect ML too much… but I’m not expert at all in image based ML !

1 Like

This is risky, even with state of art’s models are making mistakes while blurring features. If you keep only the resulting pictures, you loose the possibility to retrain and reprocess an untainted dataset. For our uses, we keep the original with the auxiliary masks (COCO JSON).
different use cases usually means different products, diffrenet compressions hould be used depending on it (lossless for reuse, lossy(s) for daily use)

christian, peux-tu mettre tes commandes gdal ? nos essais de comparaison jpg/webp ne donnaient pas un avantage démesuré à ce dernier, pas assez en tout cas pour perdre les décennies d’optimisation de ce format avec des lib comme turbo-jpeg.

Look, I get it. Keeping the originals for the purposes of retraining. Makes a lot of sense from that perspective.

The problem is privacy. Storing all that privacy-sensitive information of a public space is a weird legal gray area. Which from my understanding in Belgium atleast for example isn’t legal (camera’s facing public roads are not allowed (video doorbells are weird little loophole)).

Second problem is that it creates a large liability in case the server they’re stored on gets hacked or configured incorrectly resulting in data leaks.

Mapillary had the advantage of being started before GDPR and such. So all the footage they had back then they used to make a … pretty good… face/licenseplate detection system. Now of course we have GDPR, and as far as we know publicly. Meta is keeping originals of images anymore. Only the blurred versions. Probably for similar reasons like I mentioned before. But their hard work was already done.

There’s a weird legal situation here that I honestly don’t really know how to solve.

We need to clarify the legal situation regarding pictures taken in the public space.

I don’t know if GDPR really changed things at that level, a lot of people are over-applying GDPR while the targets originally aimed by GDPR are on the opposite side.

French laws are summarized here: Droit à l'image et respect de la vie privée |

Dans le cas d’une image prise dans un lieu public, votre autorisation est nécessaire si vous êtes isolé et reconnaissable.

→ In the case of an image taken in a public place, your permission is required if you are isolated and recognizable.

This limits greatly takedown requests from my point of view if you’re not the subject of the picture (coincidental appearance).