Mapping the internet with Hilbert curves

(benjojo.co.uk)

218 points | by randomdrake 2190 days ago

14 comments

  • elvinyung 2190 days ago
    Nice! Perhaps a Gibson quote is appropriate here:

    > Program a map to display frequency of data exchange, every thousand megabytes a single pixel on a very large screen. Manhattan and Atlanta burn solid white. Then they start to pulse, the rate of traffic threatening to overload your simulation. Your map is about to go nova...

  • anigbrowl 2189 days ago
    I don't think straight hilbert curves:IP ranges are the best mapping, and would prefer something built around ping times or hop check routings in 3 dimensions. Or integrate it with geolocation data. But! It's still super useful. Or integrate it with geolocation data.

    On top of all of this, I also did a bonus scan of a few APNIC IP blocks every 30 mins for 24 hours. The data from that allows you to see the internet “breathe” as clients come online in the morning and offline at night

    Really, I'm surprised there isn't a distributed/crowdsourced system to do this all the time and allow people to study the 'weather' in the datasphere.

    • vanderZwan 2189 days ago
    • mino 2189 days ago
      > I don't think straight hilbert curves:IP ranges are the best mapping,

      It is a good idea, as IP ranges are a simple (discrete) linear range.

      However, maybe this is not the best explanation:

      > The problem with displaying IP addresses, is that they are a single dimensional, they only move up and down, however humans are not good at looking at a large amount of single dimensional points.

      But rather: Hilbert curves are great because it ensures that every two consecutive points are contiguous in space (i.e., no gaps).

    • achillean 2189 days ago
      I've done it a few times w/ geolocation data. Makes it easier to see changes in developing countries but other things are much harder to see. A mix of visualizations is probably the best approach depending on the target audience/ use case:

      2014: https://imgur.com/aQUHzgu

      2016: https://imgur.com/p43QH6v

  • walrus01 2190 days ago
    Also for fun, ipv6 exhaustion counter.

    https://samsclass.info/ipv6/exhaustion-2016.htm

    • hexane360 2189 days ago
      Note: A power law is definitely not the same thing as an exponential fit. Using the two interchangeably is disingenuous.
    • sigjuice 2190 days ago
      I need to make an exhaustion counter for the /64 at my house :p
  • no_identd 2190 days ago
    I wish this used a good color mapping, like Viridis or cubehelix, or at least used HSLuv or HPLuv to map the parameters to colors. I bet we could see a lot more patterns in this then.

    Edit: I made a github issue for this:

    https://github.com/measurement-factory/ipv4-heatmap/issues/2

  • zokier 2190 days ago
    I did something similar few years back, mapping ipv4 address space owners.

    http://zokier.net/stuff/map_of_the_internet.png

  • swirepe 2190 days ago
    You can scan the whole internet in about an hour. I had luck using AWS and zmap.

    https://github.com/zmap/zmap

    • SiempreViernes 2190 days ago
      I'm surprised there haven't been more of those high cadence observations he presents at the end when the scans are that fast now.

      If nothing else his little gif shows that just scanning at different times of day could be used to estimate number of personal devices belonging to individuals there are on a certain subnet.

    • andai 2190 days ago
      >ZMap can scan the IPv4 address space in under 5 minutes.
      • hk__2 2190 days ago
        That’s misleading, because you’re quoting half of a sentence. Full quote:

        > On a typical desktop computer with a gigabit Ethernet connection, ZMap is capable scanning the entire public IPv4 address space in under 45 minutes. With a 10gigE connection and PF_RING, ZMap can scan the IPv4 address space in under 5 minutes.

  • xvilka 2190 days ago
    According to [1] IPv6 adoption is slowed down significantly, so we stick to NAT for a decade at least I think.

    [1] https://www.google.com/intl/ru/ipv6/statistics.html

  • LeoPanthera 2189 days ago
    The 9MB PNG is unoptimized. By passing it through optipng and advdef I managed to losslessly squish it down to 7MB.

    Also, I would be remiss if I did not point out that this:

    cat ping.txt | pcregrep -o1 ': (\d+\.\d+\.\d+\.\d+)'

    is a Useless Use Of Cat.[1]

    It should be rewritten:

    pcregrep -o1 ': (\d+\.\d+\.\d+\.\d+)' <ping.txt

    [1] http://porkmail.org/era/unix/award.html

    • syrrim 2189 days ago
      >I managed to squish it down to 7 MB

      wow, what a stellar compression ratio

      >Useless Use Of Cat

      Oh My God No One Cares

      • LeoPanthera 2189 days ago
        > wow, what a stellar compression ratio

        It's pretty good when compared to uncompressed RGB of the same size, which would be 48M.

      • contoraria 2189 days ago
        you cared enough to respond, syrrim
  • toolslive 2190 days ago
    You can do the same with LBA's of a block device. It's interesting to see where different file systems place the (meta) data.
  • bawana 2185 days ago
    How do the number of internet connections relate to the number of nodes? Building fat pipes is not the answer just as more highways is not the answer to more destinations. The increase in traffic will consume more resources exponentially (factorially?)faster than the increase of address space
  • iod 2190 days ago
    IPv6 Active Webhosts Hilbert also exist based on Akamai data as I found this d3 block by Vasco Asturiano:

    https://bl.ocks.org/vasturiano/0c0f60cf193fa3a04b5d414aed6f5...

    The author also has some other cool d3 visualizations of IPv6 Routes, AS, as well as IPv4 allocations.

  • infinity0 2190 days ago
    Surprised that he missed out 192.168.0.0/16
    • jlgaddis 2189 days ago
      He also left out a bunch of other networks that one wouldn't really need to scan: 100.64.0.0/10 [0] and 198.18.0.0/15 [1] (and a bunch of /24s [2,3,4] too but, relatively speaking, those are pretty insignificant), although one might find some "interesting" (CGN) stuff in 100.64.0.0/10 if their ISP was making use of it.

      There are also large portions of the 13 /8s (218 million IPs!) assigned to the US Department of Defense [5] that you wouldn't need to scan since there are no routes to them at all: the 11.0.0.0/8, 22.0.0.0/8, 26.0.0.0/8, 28.0.0.0/8, 29.0.0.0/8, 30.0.0.0/8, and 33.0.0.0/8 networks are, for all intents and purposes, "missing" from the public Internet.

      Additionally, there are only four /24s in 21.0.0.0/8 that are reachable from the public Internet. Out of the 16,777,216 IP addresses that make up 7.0.0.0/8, only 255 are reachable (7.7.7.0/24) [6].

      There's pretty much no point in scanning -- "mapping" -- this address place (unless you are looking specifically for US government/military stuff).

      ETA: In the interest of time, you probably wanna skip over 44.0.0.0/8 [7] also.

      [0]: https://tools.ietf.org/html/rfc6598

      [1]: https://tools.ietf.org/html/rfc2544

      [2]: https://tools.ietf.org/html/rfc5737

      [3]: https://tools.ietf.org/html/rfc3068

      [4]: https://tools.ietf.org/html/rfc7534

      [5]: https://en.wikipedia.org/wiki/List_of_assigned_/8_IPv4_addre...

      [6] Interestingly, the ASN (27651) that is advertising 7.7.7.0/24 into BGP appears to be registered to a company in Chile -- and they're also advertising 4.4.4.0/24. I would not be surprised to find out that neither of these advertisements are legitimate.

      [7]: https://en.wikipedia.org/wiki/AMPRNet

    • signa11 2190 days ago
      why ? these are non routable...
      • rocqua 2190 days ago
        It is missing from the table of reserved ip addresses.

        He mentions:

        0.0.0.0/8 Local System 10.0.0.0/8 Local LAN 127.0.0.0/8 Loopback 169.254.0.0/16 “Link Local” 172.16.0.0/12 Local LAN 224.0.0.0/4 Multicast 240.0.0.0/4 “Future use”

      • jlgaddis 2189 days ago
        That was infinity0's point -- there's no reason to scan them.
  • inetknght 2189 days ago
    It strikes me that we've "run out" of IPv4 address space but there's entire large blocks of space allocated to entities that don't appear to be using them.
  • eps 2190 days ago