Chart This chart uses raw compression as well as adjusted. Adjusted includes the size of the model. For a lot of this, it really only works well on server scale data because the model for compressing them is so large. But it also leads some credence to other papers that show you can use compression to build generative models and k means to get decent results.
How do those figures compare to state of the art compression?
Chart This chart uses raw compression as well as adjusted. Adjusted includes the size of the model. For a lot of this, it really only works well on server scale data because the model for compressing them is so large. But it also leads some credence to other papers that show you can use compression to build generative models and k means to get decent results.
Is it me or it is only competitive on the Wikipedia dataset, which probably only contains examples from the training data of the model?
Lossless JPEG-XL and Webp are much better at compression:
https://siipo.la/app/uploads/lossless-comparison-median-file-size-1xritv3md2goacqf6n9jplnxd-800x596.webp
Source: https://siipo.la/blog/whats-the-best-lossless-image-format-comparing-png-webp-avif-and-jpeg-xl
that means that 58.5% for PNG could be down to 30% when using a state-of-the-art lossless compression which is better than 48% by Chinchilla 70B