An Inside Look at the Backblaze Storage Pod Museum

(backblaze.com)

165 points | by sp8 1896 days ago

8 comments

  • berbec 1896 days ago
    I'm always amazed by how open they are. Its great to see people like this succeed.

    My one BB question is how many data centers do they have? I know they have great sharing tech to keep data online if a pod or two goes down, but how many physical locations do they run?

    • brianwski 1896 days ago
      I work at Backblaze.

      > how many physical locations to they run?

      Two separate datacenters in the Sacramento (California) region, and one in Phoenix (Arizona). We are trying to open a European (Netherlands) datacenter this month or next month.

      However, unless you take explicit action to copy your data to two datacenters, any one file (or piece of file) is in exactly one datacenter. We believe the data to be extremely DURABLE (survive), but if your strategy is to "host content" in a highly available fashion where people will die if your content is offline for an hour, we recommend you use two different providers with some sort of fail over. Another alternative is to use a CDN (Content Delivery Network) in conjunction with Backblaze. You can find out more info here: https://www.backblaze.com/b2/solutions/content-delivery.html

      For backups, Backblaze advocates for a 3-2-1 backup strategy. https://www.backblaze.com/blog/the-3-2-1-backup-strategy/ This is where you keep 3 copies of your data, 2 on site, and 1 in the cloud.

      The exact system for how we achieve high durability is described in this blog past: https://www.backblaze.com/blog/vault-cloud-storage-architect... where any one file is striped across 20 separate computers in 20 separate locations in the one datacenter, where we can entirely lose any 3 computers and the data is completely fine and available.

      We are COMPLETELY transparent on how we calculate the durability, we do the math (including the assumptions) in this blog post: https://www.backblaze.com/blog/cloud-storage-durability/

      • Johnny555 1895 days ago
        We believe the data to be extremely DURABLE (survive), but if your strategy is to "host content" in a highly available fashion where people will die if your content is offline for an hour, we recommend you use two different providers with some sort of fail over

        Do your durability metrics take datacenter failure or human error into account?

        Datacenter failures are rare, but they do happen and can cause data loss if all of your data is in that data center.

        Likewise, human error can cause cascading failures across a datacenter (or beyond) if there are no firewalls between zones that prevent a single person/command/software update from affecting all copies of the data.

        • manigandham 1895 days ago
          That depends on what you mean by failure. Are you talking about a data center failing because every single machine inside blew up? Otherwise the common failures like DC power outage or network drops are about availability rather than durability. The data is stills safe on multiple drives.
          • Johnny555 1895 days ago
            I'm talking about the kind of failure that hit a Microsoft Azure data center:

            https://www.datacenterknowledge.com/microsoft/azure-outage-p...

            “ but in this instance, temperatures increased so quickly in parts of the data center that some hardware was damaged before it could shut down" ... "A significant number of storage servers were damaged, as well as a small number of network devices and power units.”

            • hinkley 1895 days ago
              If a lightning strike is close enough the surge protector won’t save your equipment. When I read the overview of this I expected a roomful of fried servers, not an orderly shutdown triggered by a cooling failure.
            • manigandham 1895 days ago
              Yea if that damaged all the storage servers containing your data then there would be data loss.
              • Johnny555 1895 days ago
                That's why I asked if that 99.999999999% durability number includes datacenter loss. It's an unlikely failure mode but is it .00000000001% unlikely? I don't know.

                Given the fact that Azure lost a datacenter with this failure mode, I don't think it's in the "likelihood of an asteroid destroying Earth within a million years" ballpark.

                Their durability page doesn't really clear it up, they say "Because at these probability levels, it’s far more likely that ... Earthquakes / floods / pests / or other events" known as “Acts of God” destroy multiple data centers". But from the post above: "any one file (or piece of file) is in exactly one datacenter."

                So it doesn't take multiple datacenter failures to lose data, just one unless you explicitly copy your data to multiple datacenters.

      • sp332 1895 days ago
        To be a little more specific, the "2" in 3-2-1 is for two different media types. Hard drive + tape, for example. [Edit: ok it doesn't have to be "types". But two different media - don't put all your backups on one disk!]
        • atYevP 1895 days ago
          Yev from Backblaze here -> Or Hard Drive (Internal) + Hard Drive (External) - that's what we typically see!
      • luhn 1895 days ago
        Last I checked, the backup service exclusively used the Sacramento DC. Has this changed? Being in the Sacramento area myself, I'd be much more comfortable if my offsite backup was more than a couple miles away.
      • chillaxtian 1895 days ago
        No disaster recovery? :/
        • toomuchtodo 1895 days ago
          Same as every other storage provider's default/basic storage offering. If you want georedundancy, you will need to build it.

          EDIT: Apparently GCS has this feature built in. Did not know, very cool!

          • manigandham 1895 days ago
            All clouds have options to do multi-regional storage. GCS has multiregional class. Azure has GRS class. AWS has cross-region replication that can be added to a bucket.
          • chrisseaton 1895 days ago
            I thought basic products like S3 provided cross-region replication, which gives georedundancy?

            But anyway why should I have to build it on top of the provider's offering - why wouldn't they provide georedundancy for me? It seems like a truly basic thing to expect for a backup solution?

            But I'm not an expert in this area.

            • luhn 1895 days ago
              S3 replicates across available zones, meaning copies in multiple DCs but in the same general area.

              You can setup a bucket to replicate a bucket in another region, at double the storage costs plus bandwidth charges.

            • siculars 1895 days ago
              Good Cloud Storage (GCS) has this functionality out of the box.

              https://cloud.google.com/storage/docs/locations

              /I work for Google/

    • gist 1895 days ago
      > I'm always amazed by how open they are. Its great to see people like this succeed.

      You don't have to be amazed. Watch and learn. They do it because it's good marketing and results in business. The catchy headline got me to click and read. It got the name 'backblaze' into my head one more time. It makes them appear relevant and enhances the brand.

      Since Yev is probably reading the comments I will offer another topic that would be interesting.

      Do a post on erasing data from SSD's and then being able to recover that data. There was a well known paper by some academics years ago about this. The result was YMMV depending on the drive, controller etc. That would make an interesting blog post.

      • atYevP 1895 days ago
        Yev here as prophesied! Interesting subject! I'll have to toss it to the group. My assumption is that it would a bit tough for us to write since we aren't necessarily experts in SSDs (yet) - but might be something to consider for the future!
    • jpalomaki 1895 days ago
      Their openness is the reason why I’m a customer.

      My main concern with this type of services is, are they just reselling S3 and hoping people will never use their quota (something I don’t see as sustainable business model).

  • skunkworker 1895 days ago
    It's crazy to see how fast storage prices have fallen. Just a couple days ago I saw that you could get a WD White (Shucked easystore, pretty much a relabeled WD Red) 10TB for just $169. 1.69 cents/GB. And you can get 8TBs for $129 (1.61c/GB)
  • wiredfool 1895 days ago
    I hope it has a pile of the dead 3gb seagates. Perhaps in an interactive exhibit where there are some implements of destruction.
    • atYevP 1895 days ago
      Yev from Backblaze -> We do have a drive crusher in the office, but mostly use that for small externals. It's quite satisfying!
      • russh 1895 days ago
        Do you have to treat the output of the drive crusher as hazardous waste?
      • freedomben 1895 days ago
        If you happen to grab a video of that, I'd love to see it :-)
  • daveguy 1895 days ago
    I always enjoy reading about Backblaze updates. New Pod designs, drive statistics, operations, etc... One thing I am particularly excited about is the future Pod designs with 2.5 inch drives. There may be enough miniaturization with magnetic drives to make this feasible, but I expect that the real transition will come with a Pod full of SSD. Any idea when that might happen? What is the expectation for that timeline? Do you have additional products or price/performance improvements planned in that transition? The SSD endurance experiment from 4 years ago indicates that, reliability wise, they are more than ready. I guess the only limitation now is price and maybe processing?

    https://techreport.com/review/27909/the-ssd-endurance-experi...

  • chaostheory 1894 days ago
    The only thing missing from the museum are the people behind Storage Pod. Other than that it's always really cool to see the evolution and history of the product
  • DFXLuna 1895 days ago
    It's always fun to read stuff from the backblaze guys. The stuff they do is just neat.
  • noir_lord 1895 days ago
    I love this kind of stuff.

    I like the domain I program in but some of the problems in areas like this are straight up nerd sniping[1].

    https://xkcd.com/356/