Quanta magazine always seems to have a very approachable way of writing about complex topics that I really appreciate. I wish more organizations did what they do.
Not to take away anything from Maynard, but Tao is in a different league. He is the last of the generalist among mathematicians having made significant contributions to multiple streams on number theory, harmonic analysis, differential equations among others. In the age of hyper specialisation, we might never be able to see another Tao in any foreseeable future.
I fell under a collective (and also formally published) delusion that one might use prime factorization and run-length encoding to achieve data compression, but repeated experiments suggests this fails, when actually constructed. I've come to suspect the primes in this way are just a base, like any other.
Unfortunately, I don't know how to argue one way or another very well.
This is a case where the general problem is easier to reason about than a specific problem.
Forget about prime numbers for a moment: it is impossible to achieve lossless compression that always succeeds in compressing N bits into fewer than N bits. This is because 2^M is smaller than 2^N when M<N. You can't map X objects into fewer than X objects without losing information.
So the issue is not that prime factorisation fails to reliably compress things. It's that every possible method you can try would fail to do so, no matter how advanced or fancy it is. Sometimes you'll manage to make the data smaller, sometimes you won't.
Relatedly, there's an interesting proof that there exist infinite prime numbers that uses the idea of data compression in it:
I was and am aware of this result, but I find myself under the spell of Constructor Theory's claims that information theory has fundamental issues, and simply don't know enough to verify the apparent fact that this is impossible, theoretically.
For example, as I read your argument, I observe that you've divorced representation from the situation -- that is for a given number that the number of digits of information and number of bits of information aren't fungible, and can only be approximated as a metric of the amount of information contained.
That proof regaring infinitely many primes is clever though, thank you for that tidbid.
Here are two different ways of thinking about it which IMO are quite convincing:
- If you had a reliable way to losslessly compress any N bits of information into N-1 bits of information, you could use this iteratively to compress N bits into N-1 bits, then into N-2 bits, then N-3 bits, and so on until you've compressed anything into 1 bit of information. This is obviously ridiculous, so the initial assumption that we have such compression can't be true.
- Alternatively, try for yourself to make a function that maps 4 objects into 2 objects. Say mapping the input space "red, green, blue, or yellow" into the output space "white, or black". You won't be able to make this function reversible no matter what you do - it's easy to demonstrate by brute force as you can try all possible functions. If your input space is 2, your output space is at most 2, not 4. You can extend the same intuitive idea to prove that N bits of information can't always be compressed into N-1 bits by the same method.
Lossless compression should be impossible unless there's a specific probability distribution that makes some of the messages more probable than others.
I cede that I could be misunderstanding
"Efficient Data Compression Using Prime Numbers"
Authors: Kalyan Guchait Debasish Chakraborty, Snehasish Kar
Conference: Emerging Research in Computing, Information, Communication and Applications
[0] https://www.quantamagazine.org/james-maynard-solves-the-hard...
But the HNews one is 2022 and mine is back from 2020.
https://www.quantamagazine.org/number-theorist-james-maynard...
It's possible that Quantum Magazine changed the title and is redirecting this old URL to the new one.
Unfortunately, I don't know how to argue one way or another very well.
Forget about prime numbers for a moment: it is impossible to achieve lossless compression that always succeeds in compressing N bits into fewer than N bits. This is because 2^M is smaller than 2^N when M<N. You can't map X objects into fewer than X objects without losing information.
So the issue is not that prime factorisation fails to reliably compress things. It's that every possible method you can try would fail to do so, no matter how advanced or fancy it is. Sometimes you'll manage to make the data smaller, sometimes you won't.
Relatedly, there's an interesting proof that there exist infinite prime numbers that uses the idea of data compression in it:
https://en.m.wikipedia.org/wiki/Euclid%27s_theorem#Proof_usi...
For example, as I read your argument, I observe that you've divorced representation from the situation -- that is for a given number that the number of digits of information and number of bits of information aren't fungible, and can only be approximated as a metric of the amount of information contained.
That proof regaring infinitely many primes is clever though, thank you for that tidbid.
- If you had a reliable way to losslessly compress any N bits of information into N-1 bits of information, you could use this iteratively to compress N bits into N-1 bits, then into N-2 bits, then N-3 bits, and so on until you've compressed anything into 1 bit of information. This is obviously ridiculous, so the initial assumption that we have such compression can't be true.
- Alternatively, try for yourself to make a function that maps 4 objects into 2 objects. Say mapping the input space "red, green, blue, or yellow" into the output space "white, or black". You won't be able to make this function reversible no matter what you do - it's easy to demonstrate by brute force as you can try all possible functions. If your input space is 2, your output space is at most 2, not 4. You can extend the same intuitive idea to prove that N bits of information can't always be compressed into N-1 bits by the same method.
Lossless compression should be impossible unless there's a specific probability distribution that makes some of the messages more probable than others.
Here's some code too, but it's scrapped and non-functioning mostly: https://github.com/meltyness/eggschmidt