AI language models can exceed PNG and FLAC in lossless compression, says study

@CoderSupreme@programming.dev · 2 years ago

AI language models can exceed PNG and FLAC in lossless compression, says study

@iopq@lemmy.world · 2 years ago

How do those figures compare to state of the art compression?

@NegativeInf@lemmy.world · 2 years ago

Chart This chart uses raw compression as well as adjusted. Adjusted includes the size of the model. For a lot of this, it really only works well on server scale data because the model for compressing them is so large. But it also leads some credence to other papers that show you can use compression to build generative models and k means to get decent results.

@Bogasse@lemmy.ml · edit-2 2 years ago

Is it me or it is only competitive on the Wikipedia dataset, which probably only contains examples from the training data of the model?

@iopq@lemmy.world · 2 years ago

Lossless JPEG-XL and Webp are much better at compression:

https://siipo.la/app/uploads/lossless-comparison-median-file-size-1xritv3md2goacqf6n9jplnxd-800x596.webp

Source: https://siipo.la/blog/whats-the-best-lossless-image-format-comparing-png-webp-avif-and-jpeg-xl

that means that 58.5% for PNG could be down to 30% when using a state-of-the-art lossless compression which is better than 48% by Chinchilla 70B