https://en.wikipedia.org/wiki/Private_Use_Areas

I came across a Python library that passed the ASCII range into one of these non printable character ranges and then into a database. If someone was doing that manually with a hex table, how is that detected and mitigated?

  • Trigg@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 months ago

    I can’t work out what you’re asking.

    You use “mitigated” like this is some kind of exploit but it’s just unicode text still.

    What is the problem with private use areas of unicode?

    • 𞋴𝛂𝛋𝛆@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      2 months ago

      It is non printing. It cannot be seen or scanned or highlighted. It looks like nothing, except the file size is large with more hex than should be in the binary.

      • Trigg@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 months ago

        I’m still not seeing why that is a problem. The information remains even if it has no glyphs.

      • MartianSands@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        It ought to look like a bunch of □, which is the glyph generally used to indicate that the font has nothing to represent the character.

        Specifically you’d expect U+25A1 □ WHITE SQUARE

        • MartianSands@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          Also, the answer to your actual question is no. There’s definitely no way to block people from using any particular characters at the kernel level.

          What you seem to be asking for is a way to absolutely forbid all software from writing certain characters to files, and/or from reading those characters. Aside from requiring that the kernel inspect all data in detail before letting other software have it, which would slow everything way down, it would prevent anyone from reading or writing binary data which happens to contain those sequences of bytes by coincidence. Binary data includes things like the programs which make the system work, so blocking those characters would be terminal

          • 𞋴𝛂𝛋𝛆@lemmy.worldOP
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            2 months ago

            Not necessarily. Turn this around. Let’s say I am working at somewhere like a chip foundry with tons of IP. I have no access to encryption tools, but I can easily shift characters to a hex range in bash and send emails.

            These characters can use the control glyph, and so do not print or show up in any physical way except in hex.

            This technique must be obfuscated at every serious organization from governments to industry.

            • Trigg@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              2 months ago

              Encryption exists manually. This isn’t the problem you appear to imagine it is