While LLMs have been used for… a lot, it seems like this use might be one where it’s not only reliable but it appears to outperform existing methods of image compression. Being able to cram more data into less space tends to lead to interesting developments, so I will be keeping my eye on this.

What do you guys think? Seem like it’s deserving of less hype than I’m giving it? What kind of security holes do you think this could open?

  • @EdgeOfToday@lemm.ee
    link
    fedilink
    142 years ago

    With a neural network, you wouldn’t be able to mathematically prove that the signal is perfectly recovered 100% of the time for all possible inputs. That is the case with PNG and FLAC. If you’re just listening to music and need a good compression ratio, then sure, it won’t be a big deal if a couple of bits are wrong. But that’s also why we have lossy compression. If the goal is to make signal degradation imperceptible to a human, then you could get a much better compression ratio using neural networks. If it’s truly critical that the signal isn’t corrupted, it would probably be better to just use the original method.

    • astraeus
      link
      fedilink
      132 years ago

      Seems like another “hey, what if we used LLMs for this” scenarios. It might be more effective, but exactly how many more resources are being used to make it do the same work as current compression algorithms? Effective doesn’t mean efficient and I think for lossless applications efficient is truly more important.

      • Butterbee (She/Her)
        link
        fedilink
        English
        82 years ago

        A LOT. You can barely run 13b parameter models on a 24gb gfx card and outputs are like a page or so of text. Translate that over to audio and it would have to be broken down into discrete chunks that the model could use as “prompts” to output a section of audio that fit into the models available output. It might compress better, but it would be exceedingly painful and slow to extract even on AI focused cards. And it would use OODLES of watts to get just a little bit better than flac.

        • @abhibeckert@beehaw.org
          link
          fedilink
          1
          edit-2
          2 years ago

          13b parameters works out to about 9GB. You need a bit more than that since it needs more than just the model in memory, but at 24GB I’d expect at least half of it to go unused. And memory doesn’t use much power at all by the way. LPDDR4 uses something like 0.3 watts while actively reading/writing to it.

          The actual computations use more, obviously, but GFX cards are not designed for this task and while they’re fast most of them are also horribly inefficient.

          I run 13b parameter models on my ultra portable laptop (which has a small battery, no active cooling (fanless) and no discrete GPU). It has 16GB of RAM not GPU memory - RAM, and I’m running a full operating system, web browsers, etc a the same time. Models like llama2, stable diffusion, etc get perfectly usable performance without using much battery at all (at a guess, single digit watts while performing the calculations).

          There is efficient hardware now and there will be even more efficient hardware in the future. My laptop definitely isn’t designed to run these models and on top of that the models aren’t designed to run on a laptop either. There’s plenty more optimisation work to be done in the years to come.

          • Butterbee (She/Her)
            link
            fedilink
            English
            12 years ago

            Ok, it’s been a while since I tried running a language model so I might have been thinking of the 30b models that were showing up at the time. The point remains though that this thing they were running would be well beyond hardware generally available and completely impractical for realtime use. Like… why would you do all that when flac and png are good enough. It is far cheaper and uses less power to accommodate the slightly less compressed files.

  • @skip0110@lemm.ee
    link
    fedilink
    English
    15
    edit-2
    2 years ago

    I think this model has billions of weights. So I believe that means the model itself is quite large. Since the receiver needs to already have this model, I’d suggest that rather than compressing the data, we have instead pre encoded it, embedded it in the model weights, and thus the “compression” is just basically passing a primary key that points to the data to be compressed in the model.

    It’s like, if you already have a copy of a book, I can “compress” any text in that book into 2 numbers: a page offset, and a word offset on that page. But that’s cheating because, at some point, we had to transfer to book too!

    • @puttputt@beehaw.org
      link
      fedilink
      92 years ago

      Yeah, it’s like saying I can “compress” a png of the Mona Lisa to just the string “Mona Lisa” because I have a database of art.

    • Coffee Junky ❤️
      link
      fedilink
      22 years ago

      I feel it’s somewhere in the middle. Like your book example only works if you already have the book. If this is a model that is a few gigabytes of data, but it works for every movie or audio file it can still be useable. In that case it’s not that you have to send the book first, but you do need to have the same dictionary.

  • @kevincox@lemmy.ml
    link
    fedilink
    22 years ago

    I think this is a legitimate use case. It shouldn’t have any security vulnerabilities beyond regular compression-related vulnerabilities.

    The core to compression is prediction. Most compression algorithms work sort of like this:

    1. Guess what the data is going to be.
    2. Encode the difference from the guess.

    If your guess is good it doesn’t take much data to encode the difference. So the data stream is smaller.

    AI image generation can be used to guess the data quite effectively, and it can use context that is hard to encode in classic algorithms (such as what a car looks like). This is basically the next step of shared dictionary compression (like what makes Brotli quite effective) where instead of building a dictionary as a simple Huffman table you compress the dictionary into the model weights. Since the model can do a pretty good job at creating “Image of a girl with brown hair looking right” you “just” need to encode the difference.

    IIUC neither PNG or FLAC use pre-shared data, so sending a massive set of neural weights can be an advantage (and presumably you only need to send these weights occasionally).

    • brie
      link
      fedilink
      1
      edit-2
      2 years ago

      An example of a compression algorithm that does support tuning parameters before hand is zstd.

      Even if something isn’t in a pre-shared dataset, I wonder if a sufficiently advanced LLM might be able to do well at compressing predictable but non-repeating data, such as “abc, bcd, cde, […]”.