10,000 years from now, society has been destroyed. And people come across a glass plate with laser etchings. How bummed will they be when they go through all of the effort of decoding it, only to learn it's like some python library for managing drivers on a 2014 Dell laptop running linux?
Pretty sure the universe is running on some advanced version of Erlang. Erlang VM would survive until heat death of the universe, when it finally crashes the heartbeat program will kick in and we'll have a new big bang
> The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size. Each repository will be packaged as a single TAR file.
100KB is a very low limit, many repos will be useless
Despite having the ability to directly access memory via neural interface, thus rendering the concept of output obsolete, I still don't want to give up the Python2 print syntax.
Don't worry we will build a PC simulator on neuralOS, with a backport of linux, docker and Windows 95. That way you can run python2 in linux in docker in PCSim in neuralOS and i can play minesweeper in windows95 in PCSim in neuralOS.
I'd consider Linux to be the culmination of some of humanity's greatest achievements. Certainly I'd prefer they found that instead of the average website on the internet.
I really hope people that are burying physical copies of the entire Wikipedia are archiving the entire edit history and not just a snapshot that hides all the ongoing edit wars. It's a much larger dataset (terabytes) so they probably aren't.
Linux would be a tiny drop in the bucket relative to the rest of GitHub. I wonder if they'll mark certain parts of the archive as more important or interesting.
It's surprising how apparently-useless information can give people a lot of insight. In "Guns, Germs, and Steel" and "Collapse" for instance, there were people who dug through the latrines of ancient societies to determine that the early Easter Islanders' diet consisted of 25% porpoises; and that the Greenland Norse colony didn't eat any fish. Looking at the receipts of medicines bought in the court of Henry VIII allowed people to conclude fairly conclusively that he did not have syphilis.
The thing is, it's really impossible to predict what information will and will not be a critical key to unlocking understanding of future generations. Just keeping it all, history, comments, and all, will be a huge boon to future historiographers trying to figure out what developing in the early 21st century was like.
Maybe using tabs instead of spaces will be punishable by death in 10,000 years and a number of people will be held accountable for accidentally having git repos on their computing devices that contain primarily tabs (Makefiles get an exception of course!).
Look on the bright side: in 10,000 years Python dependency management will be in a better shape and they'll have a fighting chance of running your software.
Github has blocked access to my account with 10s of popular projects because one day they randomly sent me an email to click on a link to enable 2fa auth. I was coerced to enable it. A while later I lost access to the phone where the 2fa auth app was installed not having back up codes since I was pushed to enable 2fa in a rush now I'm completely locked out. I contacted support no fewer than 15 times with them saying I need to create a new account since I did not link my cell phone nor have back up codes. I had that account for over a decade and now I cannot even control the projects I was working on nor access any of my private repos. I have been communicating with them using the email I have on my account, but this is not sufficient for them to restore access to my account.
I'm sorry you lost your account (really, that sucks) but I don't understand framing this like Github is doing something wrong. It isn't like they banned you for spamming emojis.
You enabled 2fa, you lost your 2fa, and you did not have any recovery codes. Now you are asking for them to bypass the 2fa, and they are refusing.
Again, that sucks, but when I compare this to what the cell phone companies are doing with sim swapping, it increases my respect for Github.
Humans are humans though, and mistakes are inevitable. In this case, you get an email from the same address used to setup the accounts, and a corresponding complete stop in activity on the repo. Further, I bet OP could even demonstrate ownership of some of the code (whichever private repo's they happen to have checked out). All in all, it adds up, and as a paying customer its reasonable to expect Github to have some measures in place for scenarios like this.
The selling point of 2FA is that if a hacker gets your password (from a DB dump from a compromise of site A that has bad security and no PW hashing) and tries to use it on site B, then site B still won't let them in because they don't have the second factor for site B.
In this case, an email account is the second factor. Your Github password is the first factor.
No, multifactor authentication uses different forms of evidence (i.e., something you have, something you are, something you know), not just multiple accounts. Given password reuse, cookies, etc., e-mail is a terrible second factor.
GitHub 2fa is the worse implementation than the most I have encountered. They tell you that you can save recovery codes in facebook but guess what? It doesn't work! So you get lulled into having recovery codes safe in your FB account until you need it when they simply through error message that "oops, sorry. something went wrong, contact customer service.". So next, you try to find customer service and guess what? They have the worse setup for that as well! They have enterprise, pro and all kind of variations. You are expected to literally learn about their internal hierarchy of customer support organization and you scratch your head who the hell runs this thing? When you finally actually get through all that you find out pro is not pro and enterprise is not really enterprise and, BTW, whole team if off on some training this entire week so don't expect any response. You see, they don't think you actually have a job or any kind of urgengy. So you wait for response for days and likely you won't get any. There is no phone nuber to talk to human and emails are bounced around in their internal hierarchy. Losing your phone could be pretty bad thing if you have setup github 2fa.
Given how we all have placed GitHub front and center of our lives, there is no excuse for any of this. Even if I'd no recovery codes, I would expect GitHub had some process to get back access back. GitHub account should be looked upon with same reverence as bank account. The idea that you need to forget about money you have in bank account because you lose your phone and recovery code is mind numbing. They could charge you $500, run professional background check on you, get your credit card/bank info and verify it's really you, wait for 30 days, remove private repos and then grant you access again at least for your public repos. That's what I would expect an efficient customer obsessed organization to do.
I think most 2FA-supporting websites will let you regain access to your account if you lose your 2FA device but still have your password and email. Similarly to how most 1FA websites will let you regain access to your account if you lose your password but still have access to your email.
If Github is doing something different than most websites, that needs to be very very clearly communicated at setup time, otherwise people will assume Github behaves like other websites.
> Warning: For security reasons, GitHub Support may not be able to restore access to accounts with two-factor authentication enabled if you lose your two-factor authentication credentials or lose access to your account recovery methods.
I just put 2FA on my Github account. During that enrollment flow, there were no red warnings, and nothing said that if you lose your 2FA device and your recovery codes you will be unable to ever access your account. In fact it said
>Treat your recovery codes with the same level of attention as you would your password!
Most people forget passwords all the time and treat that as no big deal because they can just reset the password. If people use that "same level of attention" for the recovery codes, then they will be lost just as easily, except this time people won't be able to recover.
Edit:
After enabling I got an email with a much more cautionary tone:
> Recovery codes are the only way to access your account again.
> GitHub Support will not be able to restore access to your account.
It's interesting they call this out in the email but not during the sign up process.
There has to be a mechanism where you can get your account back if you provide some form of identification. I know that you get into other issues that way, but 2fa without a mechanism to restore account in case of phone/code loss (or inability to access, which is somewhat likely if you don't keep multiple copies of your codes) is pretty stupid.
The one I've had to use (and find reasonable) is to have "trusted devices". That's how it works, at least with 1Password:
- You log with your 2FA into a device.
- You set this device as trusted.
- In case you lose your phone or 2FA device, log onto the trusted device and disable 2FA.
- Then set up 2FA again with new phone or device.
It's not perfect, but it's workable. For instance, while I won't enable trusted device on my laptop, having my desktop stolen is a way rarer occasion, so I enable the "trust this device" options. It's just a matter of thinking on the threat model and where you can place spots for recovery while loosing as little security as possible.
But that fine because the trusted device is the second factor (what you have).
You can have multiple second factors with Github. I currently have two Yubikeys and one authenticator app enabled. If I lose one I can still log in with another.
I'd love to hear of any solutions on this. The solution should, however, be insusceptible to remote attacks, that could put a user's account at risk.
2FA with SMS has the problem that companies offer support by human and don't have a tight system for changing numbers to another phone, as proven again and again, resulting in accounts being compromised.
While the current solution of MFA aren't perfect, it's hard to come up with other solution that would be as safe or safer and prevent most to all mechanisms used to compromise accounts, like phishing, social engineering and other possible remote attacks. Giving you the possibility to save the codes somewhere physical has its downsides, but an important upside is that it allows _you_ to keep in charge your own security in most cases.
Same thing happened to me and my wife. The permanence of 2fa failing was something that didn’t exist until just a few years ago. For decades before that you could always get back in as long as you had an email. With that safety net in mind, it’s easy to go on “auto pilot” with using the internet and devices, not worrying at all about anything. And when 2fa came it kind of changed that with very little warning - or at least the warning wasn’t clearly distinct enough from previous-generation auth techniques so we (understandably so) didn’t pay enough heed. Fortunately it was only regarding an iTunes account so all we lost was access to a dozen VeggieTales movies, but it got me wanting to disable 2fa on my own account, but Apple doesn’t allow that anymore. Sigh.
If you can talk your way past two factor auth, it is useless.
I can see the argument for being forceful about prompting you to write down backup codes or whatever, but fixing that after-the-fact is something they should absolutely not been doing.
I do think that sites should offer better options for recovery. NearlyFreeSpeech do a really good job of this, offering seven methods of recovery and letting you decide how many you need to fulfil to be given access and which you want to configure. However, things like checking photo ID and more "offline" options are expensive to support, so I get why that is rare.
> If you can talk your way past two factor auth, it is useless.
No, it just means you use a different 2FA. Here's some examples.
- Gov ID
- Pushing to a private repo
- Checking if IP matches historical records (hell, since they're fingerprinting everyone, why not use that?)
There's problems with 2FA. Mainly that it is on your phone. Phones are pretty valuable devices and it isn't unlikely that they get stolen. Even if Google backed up Authenticator to Drive, how do you get Authenticator back on your new phone?
It isn't hard to come up with a hundred scenarios where things fall apart. This is why there are different levels of security. But my GitHub isn't so sensitive that if I lose access I don't want anyone to ever get access. Yet at the same time it is sensitive enough (and GitHub is attacked enough) that I don't think just a user name and password is sufficient. Where's the middle ground? Even with YubiKeys I have to have multiples (as a backup in case I lose one). There's such a thing as "acceptable security levels."
We're trying to make things easier and safer for humans. But that doesn't mean they aren't still going to be human.
Not if you didn't previously acknowledge using these kinds of documents as 2FA. I set 2FA fully aware that if I lose my recovery codes I'm done. I don't want anyone ever to be able to restore my accounts without the codes. There are so many attack vectors with government documents. Just one service hacked that I had to upload them for verification and that kind of information can become 'public'. Humans and social engineering are one of the biggest attack vectors nowadays because of things like "humans make mistake, let's reset his account because he knew the last digits of the creditcard used and uploaded an ID".
Gandi handles this nicely with a checkbox in the settings that explicitly tells support to never ever restore the account if 2FA is lost
> This is why there are different levels of security. But my GitHub isn't so sensitive that if I lose access I don't want anyone to ever get access. Yet at the same time it is sensitive enough (and GitHub is attacked enough) that I don't think just a user name and password is sufficient.
Yes, there is stuff people would rather have destroyed than get into the wrong hands. But I personally can't think of any. But I do think we all agree that a password alone is not secure enough. What I'm saying is that there shouldn't be two choices. There should be middle grounds. SMS is not a great recovery tool for 2FA because it is trivial to fake. Specifically with my GitHub I'm okay with my security not being invulnerable to nation state actors. But I don't want it to be hacked because of a dictionary attack with some modifications. If someone has a copy of my passport then there's much bigger problems that I have than my GitHub being hacked.
The key part here is that there should be varying levels of security. The two levels we have (no 2FA vs 2FA) is too small.
* as to the IP: I was suggesting that it was being used in combination with other stuff. Not a standalone verification. Just like your phone number should never be a standalone verification. We're talking multifactor.
The thing is, GitHub shouldn't care what you want here.
GitHub is a social site, and the users accessing repositories are just as important as those that own them. You might not care if your account is compromised, but if other people trust your repoisitories and they get attacked through your compromised account, that is a problem.
> If you can talk your way past two factor auth, it is useless.
> However, things like checking photo ID and more "offline" options are expensive to support, so I get why that is rare.
For you to characterize my efforts with Github support as "talking my way past it" is absurd. I was never once asked to provide a photo ID, etc. something I would have gladly complied with.
They can re-activate the keys they automatically purged from my account, I still have their associated private keys. My github profile also has detailed information in its bio section including my past employer and links to my various social media/website. They could simply validate my ownership of one of those accounts etc.
The social media websites could have been hacked by some attacker, the private keys could have been obtained as well. Or put there by an attacker that is now locked out by 2FA.
The entire point of 2FA is that it's a second factor and no way around it without a second factor that is verifiable without doubt.
Nope. Many 2FA setups have a way to workaround the second factor if it’s missing. Recovery codes, emails to secondary addresses, faxing an ID, etc etc.
What good would a photo ID be, really? It's not like they have an existing copy to compare it to, and a photo-copy quality copy of an average ID would be pretty simple to fake.
You can talk your way past it at AWS. Need a medallion signature guarantee (more intense form of notary) from a bank. It’s not easy but you can do that.
After that experience I switched to Authy, which does an encrypted cloud backup of your TOTP secrets.
Nearly Free Speech is awesome <3 Anyone reading this, if you want a web host with hacker values — not in the blackhat sense, in the old school sense — check 'em out!
They are pretty unique and are good, although they are slow to update the service and it's starting to hit some real issues for me.
The most relevant issue for me being the lack of U2F (now WebAuthn, I guess) support. It is also really annoying you can't have restricted SSH keys to allow for automation that is locked down to single sites.
They have had those issues on their feature voting for years.
I found this out the hard way as well, at some point since I last logged in they quietly enacted a policy of deleting keys that aren't used for roughly a year.[1] This meant I couldn't use the not so public method of verifying I still had access to various associated keys.
Now that you mention this I vaguely recall receiving an email from them a long time ago that they were going to purge my keys. Wow so much of this is beginning to make sense. Why would they remove keys that people unknowingly may be relying on as the only way to gain access to their account?
Why should they support holding onto SSH keys forever in case you forgot to write down your backup 2FA codes, especially when they've never advertised that they'll accept SSH key-signed artifacts as proof of identity?
Why would they purge SSH keys when they don't purge anything else? Why not just purge the whole account after a year of inactivity, if they care so much about space?
It's clearly not about space. Old SSH keys are a security hazard. Even moreso keys you aren't using anymore and therefore may not be particularly careful with.
Heck, even in this very scenario, if I haven't used an SSH key with GitHub in many years, and then GitHub receives an artifact signed with that key saying "I lost my 2FA token and backup codes, please reset account auth so I can log back in", I very much do not want GitHub to trust that artifact. If I haven't used the key in years, that probably means I don't have it anymore and either never got around to removing it from GitHub or forgot it was there.
An email suggesting enabling 2FA isn't coercion. You voluntarily enabled 2FA and then choose not to protect the backup codes as they repeatedly warn to do because loss of them along with loss of your 2FA device will result in exactly this situation.
Now instead of accepting the result is the consequence of your own poor choices you are trying to shift the blame to GitHub.
This is why I’d rather have the phone number/email 2fa than a device 2fa even with the risk of sim swap.
If a human can’t give me my account back through tech support I’m not very keen on trusting my account a gadget that can break or get lost.
The risk of losing a phone and the backup codes is probably several orders of magnitude larger than the risk of being the target of a sim swap attack for the vast majority of users.
As someome who lost access to their TOTP 2FA device for ~3 months I can definitely relate to that. But SMS is still insecure and there are better ways of doing this.
For one, no one is forcing you to only have one TOTP device. You can scan that QR code as many times as you want. Have them on multiple devices.
Depending on your threat vectors, putting them into a password manager that supports it (like Bitwarden) might also be smart. Less secure than fully offline, but definitely better than SMS.
As for the backup codes - one big encrypted text file synced to the cloud of your choice should do the trick, but if you prefer the "scary men with guns" kind of security, safety deposit boxes were literally made to store this kind of stuff (bonus points for on-paper encryption).
I discovered recently the QR codes are dumber than I thought; you can even print the QR codes out or store them as screenshots depending on your threat model.
I do something silly like that. I take the qr codes and convert them into Unicode glyphs and then put them in a gpg encrypted file. I started doing this after my first phone upgrade lost all my google auth entries. Now I can just decrypt in the terminal and directly scan all the codes into google auth should I ever lose them.
Do you mean you use something like grencode to literally draw the QR code using Unicode box characters or do you just decode and save their contents?
As an extra suggestion: if you use an Android phone for OTP, [andOTP](https://github.com/andOTP/andOTP) supports exporting directly into a PGP-encrypted JSON file which can then be either imported back into the app or converted back to QR codes with a script.
Since it allows you to trigger the export using a Broadcast Intent, I have it set up to do that as a part of my weekly backup Tasker script (of course, you could also just use any other sync solution and manually export when you add a new code).
Yeah, literal QR codes made out of unicode box characters. That way it's just scanning a bunch of codes instead of trying to recreate them just to scan them.
I would prefer to have u2f devices but be able to trust some tokens from friends and family without having to have them present at every registration, kind of like having a spare key with someone for every lock. I guess I'm not really worried about my relatives socially engineering my GitHub password out of me.
But you already can do that. You can register multiple U2F keys and give it to a family member or put it in a safe. You can do the same with recovery keys.
>≥ without having to have them present at every registration
For example, I have given a token to a family member in another country, for proper utility I need that token back each time I register on another site..
I don't understand, if a neighbor moves or a key gets lost, you give a spare key to another neighbor based on your own key.
What difference does it make unless everyone you trust is gone or has lost everything? At that point you have larger problems than logging into online accounts.
It's unfortunate, but I'd consider it a feature that you're not able to sign in without access to the physical devices linked to your 2FA account, i.e. it shouldn't be possible for someone with access to your Email account to be able to "phish" their way passed 2FA access.
Nevertheless the anxiety of losing the physical device with all my 2FA logins is what prevented me from enabling 2FA on most of my accounts until I was referred to Authy (authy.com) where you can sync your 2FA across multiple devices including your PC, which other than being very convenient, the effortless syncing + redundancy gave me confidence to enable 2FA on all my accounts as the redundancy ensures I'll still be able to access my accounts if one of my devices is broken/lost.
This happened to me a couple of years ago. Lost my phone, no recovery codes, so no access to github. Contacted support, they told me that they wouldn't be able to restore access even when I was communicating with the email associated with my account.
Luckily I only had public repos, so I created a new account and forked them all. Support had told me that if there isn't any login activity in the blocked account for a period of six months, they would delete the account and release the username. And yes, I had to follow up after six months were up to make that happen.
> If github did their 2FA correctly/very securely, it may literally be impossible for them to give you access again.
This is probably the most popular current dogma in security circles. This is a policy issue not an intractable mathematical problem. In theory maybe it could be as rigorous as math but in practice security is always relative.
The entire point of 2FA is to avoid someone taking over your email and then being able to access _anything_ tied to that email.
The same can be said for strong encryption, where losing the keys means you have no chance of recovery. I'm not an advocate of defaulting to things like full-disk-encryption for that reason (and know people who lost a lot due to that.) I guess the underlying problem is "humans are fallible, strong security is not", and that risks "in the other direction" aren't often mentioned; would you rather have the risk of being hacked, or of losing access to your data forever?
So that's what happens if you enable 2fA. I have disabled it after I left a Github org where 2fA was required, and haven't enabled it since. However, Github now regularly sends me e-mails with auth codes, basically forcing mail-based 2fA on me. It's annoying as hell and I can't disable it, nor does Support want to do anything about this. Very sad as I'm using a password manager and my password should be safe.
> I'm using a password manager and my password should be safe.
You may have forgotten that it doesn't matter how you store your password, but the problem is that it is a single factor. Once compromised, one can gain access to anything within that account. You may be compromised by phishing, keylogging or other means. 2FA can help with making these types of attacks more difficult, although not impossible.
If you were "coerced" to enable 2FA, that was a decision made by one of the organizations/teams (company accounts) you were a member of, not GitHub. You had every option to leave it disabled, but, per that team's policies, you would have lost access to their repos.
> On February 2, 2020, GitHub will capture a snapshot of every active public repository, to be preserved in the GitHub Arctic Code Vault. This data will be stored on 3,500-foot film reels, provided and encoded by Piql, a Norwegian company that specializes in very-long-term data storage. The film technology relies on silver halides on polyester. This medium has a lifespan of 500 years as measured by the ISO; simulated aging tests indicate Piql’s film will last twice as long.
But also a great engineering exercise. I wouldn't be surprised if this exercise leads to lots of valuable improvements for GitHub users in the here and now. Trying to solve such a grand challenge forces you to develop a vocabulary and understanding of your current systems that can lead to more immediate improvements. I think it unlikely any of these archives will actually be accessed, but simply building them could lead to great side effects.
It's also great marketing, as I now believe that Microsoft/GitHub takes the job of not losing user data extremely seriously, more so than if they had spent an equivalent some of money buying an ad that says "We take not losing data seriously".
> The GitHub Arctic Code Vault is a data repository preserved in the Arctic World Archive (AWA), a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. The archive is located in a decommissioned coal mine in the Svalbard archipelago, closer to the North Pole than the Arctic Circle. GitHub will capture a snapshot of every active public repository on 02/02/2020 and preserve that data in the Arctic Code Vault.
Nice to see the Long Now Foundation involved with this. A friend of mine is a member. The work that they do is a pretty big deal given that not much of it is being done in today's culture of short term thinking.
An interesting thought: The license of your open-source software will become irrelevant if this is cracked open in 1000 years because everything will be public domain, assuming copyright laws aren't changed drastically, but it's interesting to wonder about what future humans will think about the open-source license movement of the 1980's-to-present.
Free (libre) software sort of faded into existence, simply because the notion to copyright software faded into existence, and free software sort of relies on people to consider that software can be copyrighted in the first place. "A-2 System" https://en.wikipedia.org/wiki/A-0_System is roughly the first open-source software, although I'm not sure an explicit license text even existed.
The second question is what will enter public domain first, business-owned software or individually-written software. If the latter, which open-source developer has died the earliest?
> If the latter, which open-source developer has died the earliest?
Can we just take a second to appreciate how downright inhumane copyright expiry is? It basically encourages cheering for people to die because of copyright expiry.
I agree that it's dumb. The only clear alternative, N years after the work was created, has its own problems though, such as trying to figure out when each work was created to determine whether it's in the public domain or not.
How awfully convenient the news that github is literally putting an archive on ice breaks swallowing search traffic about how Github workers are resigning because they want the company to break their contract with the ICE...
> The 02/02/2020 snapshot archived in the GitHub Arctic Code Vault will sweep up every active public GitHub repository, in addition to significant dormant repos as determined by stars, dependencies, and an advisory panel.
If you have public GitHub repos, people crawl them all of the time. I've seen at least three top-of-HN articles that say something like: "I scripted through 10 billion LOC on GitHub, here's a bunch of passwords", or something to that effect. Just set your repo to private if you don't want it in the vault, and if it's open source anyway, who cares?
> If we had the means, should the web as a whole be append-only?
You probably meant this as a rhetorical question, but I'd argue that yes, (for public available data at least) it probably should be. It'd enable solutions to a lot of problems we have with the current web, not least archival and broken links.
Honestly, I would argue "stored in a glacier forever" is drastically more private than anything you've ever set to "public". The number of people who can actually get to and read that data is incredibly minimal.
There’s no reason to believe they won’t have an online copy of the 2020 snapshot too. Isn’t that kind of the point? For future generations to be able to use it?
The online copy is... GitHub itself. Which is current and up-to-date, and you can continue to remove your data from. The 2020 snapshot is useful historically... like a time capsule. There's no reason to invest the resources in keeping it online. And if it was online, it would have major regulatory problems, such as GDPR.
So there's a lot of reason to believe the 2020 snapshot won't also be online.
It will likely apply, still I imagine that they can simply pretend it does not.
My intuition is that archiving data for long term historical use is different from datamining a [meta]data to maximize invasion of privacy. Also there is a difference in accessibility, stored inside a glacier very few people are going to actually read it.
I believe that if mass complaints from all over the EU emerged it would be a different story. But this does not look like the activities the GDPR was created for
This makes me wonder: How does GDPR apply to books? Essentially what's happening here, is that GitHub is printing off a "paper copy" and putting it in a box somewhere.
You can't exactly GDPR request deletion of your information from a printed book, so I'm curious how GDPR applies to such physical archival mediums.
There is an important choice in deleting a public repo, even it has been archived elsewhere. At the very least you are no longer claiming that it fits your criteria of public portfolio.
Hi, Julia here, the PM for the Archive Program at GitHub. Yes, you will be able to opt out of the program. One option is to make your repo private, as only active public repos will be archived. If you don’t wish to make your repo private please contact support at support.github.com.
You do know that if anyone has forked your project, they have a complete copy of your work. That said, if they don't opt out... well then I guess your work IS going to be archived.
They probably mean that today will be referred to as 'yesterday' in the future. The sentence structure is ambivalent, though, it would work both ways. I think your interpretation is actually better, I'd have written 'tomorrow' as well too.
Both are correct. The article's phrasing is that the code of today is "yesterday's historical curiosity" from the point of view of someone who looks at it tomorrow.
The article's phrasing doesn't seem that correct: yesterday's historical curiosity is not necessarily today's historical curiosity (today being 3000 AD).
How do I submit a patch against archiveprogram.github.com? The copy contains a typo, and unlike GitLab—which has a footer at the bottom of most public-facing pages with a link back to the markdown source—there's no indicator of how to get this corrected.
Software/code rots if not touched for months. This is such a fruitless effort but does get Github some good PR. They should instead preserve ideas that make all software happen.
I find the assumption that software rots after months a sad thing. Most of that is poorly versioned libraries that change quickly. Something like the Go 1 compatibility promise means code written just against core Go should be fine if someone has the latest (maybe last) version of Go 1.
The Arctic World Archive description says: "The film technology relies on silver halides on polyester. This medium has a lifespan of 500 years as measured by the ISO; simulated aging tests indicate Piql’s film will last twice as long."
I bet people in 2200 can't wait to figure out what version of node they need to run a specific gulp version from an obscure error, caused by an underlying dependency's reliance on an deprecated built-in (true story).
After figuring out that issue, they will need to debug why a specific package's version is broken, only to find out the maintainers 200 years ago hard-coded the century.
I'm skeptical of the idea of static archives, especially when it comes to code. I think software is pretty much a living creature - with all of its environments and contexts and baggage. The only feasible way to survive it is to keep changing it. When it's no longer updated, we should pay a respect and bury them.
Plus, people are going to reinvent the wheel no matter what.
> I'm skeptical of the idea of static archives, especially when it comes to code.
I don't think anyone is going to be hoping to use this code for their own purposes, but rather to study programmer culture and development. For instance, when did the idea of "immutable by default" start, and how did it spread from esoteric academic languages into mainstream languages like Rust? How did the decades-long specter of Perl 6 affect the course of Perl 5, and how did the rename to Raku lead to its resurgence (or lack thereof)?
I'm no expert in history, but I have read a reasonable amount; and what's often fascinating is when there are two groups who think that they are in complete disagreement, but in fact share a common assumption which you, from the perspective hundreds of years in the future, do not share.
There are things that are so obvious to us now that we don't bother writing them down or explaining them, which will be a complete mystery to people 1000 years hence; and it's basically impossible to imagine what those might be.
Common Lisp code begs to differ. It’s not perfect but code that’s half a century old (Lisp that’s not even Common Lisp!) can run nearly untouched. It’s a huge reason I write it.
I get where you’re coming from, but the interpreter that you’re using to run that code has been rewritten to suit the underlying architecture of your machine dozens of times in the intervening time.
I think that’s what GP means, that if you don’t include the “context” (interpreter, cpu architecture, power supply), you’re only burying the most malleable tip of the iceberg.
It reminds me of the Zork source code that was published here some months ago. It was written in a language for which the compiler appears to have been lost. People have tried to develop compilers given the source code provided, but we can’t get it to work just right, because some crucial bit of context appears to be missing.
There will probably be people who maintain LISP and C for decades more, but imagine trying to write a Java or Haskell compiler with only the source code for Kafka or Pandoc as your guide?
Not impossible of course, but hey, maybe you should check the source code for a JVM or ghc version (and maybe add a Linux distro and gcc for good measure) in with your source in time for the cut off date. :)
Point is, once you enumerate all of the dependencies, you realize the only solution is what GP said. Constant maintenance at all levels is the only thing that keeps the ship running.
> For greater data density and integrity, most of the data will be stored QR-encoded
At face value, this seems like a really odd choice. I don't understand why you would choose QR encoding unless this was being printed. I feel like I'm missing something here.
Its very important to protect the civilizatory process from fallout, as severe or black-swan conditions might swipe what we have acomplished so far, and history is full of examples of advanced civilizations that were vanished and we had to dig out from the mud slowly without a chance to learn, starting all over again from scratch.
The civilizatory process is fragile and is always menaced by all sides, all the time. Its good to be always vigilant and prepared to anything.
It seems prudent. We may be entering a period of global upheaval and panic where large parts of human civilization will not make it past the next 100 years. It makes sense to attempt to preserve what we've accomplished in a way that will last for 1000s of years.
I'm the PM at GitHub for the Archive Program. We aren't making the assumption that you will have computer. The archive will include a Tech Tree that explains in plain language the fundamentals of computer programming and how to use the material in the Arctic Code Vault. We will also include technical guides to QR decoding, file formats, character encodings, and other critical metadata so that the raw data can be converted back into source code for use by others in the future.
However, what's to say the archive will even be accessible in 10,000 years time? The entire archive may not be readable/usable due to fundemental changes in technology/language/society etc.
Probably a copy of how to learn english in many languages, so if any languages that are similar to any of the provided ones survive the data can be recovered.
I hope the Piql polyester film material is good at binding flint knapped stones to spear shafts. Could be an agile pivot for descendants in a thousand years.
I would consider it pretty obvious that the permanence of usability/readability for archiveprogram.github.com is orthogonal to the actual goal of this project. Do you think future generations will learn about the archive from archiveprogram.github.com?
I was able to do that (so, thank you) but don't understand what great grandparent means by unusable. I prefer it to the version with CSS (which I gave up on after about 2 seconds before getting intrigued by great grandparent). The list of partners is not rendered; is that what qualifies it as unusable, great grandparent?
(I am participating here solely out of curiosity about the web; am not trying to make any point or argue any position.)
Chris Pedrick's Web Developer Toolbar, an extension that was once the greatest thing, does this and many other useful and insightful things.
My fave, I think, is Information -> View Document Outline, which is pivotal for gaming Google SEO, by telling you how your Heading tags semantically convey the page's story.
CockroachDB
Better check in your node_modules directory the night before archive day so it’s included.
I wouldn't call that a problem.
> The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size. Each repository will be packaged as a single TAR file.
100KB is a very low limit, many repos will be useless
Someone will write a GNU Emacs mode for that
The thing is, it's really impossible to predict what information will and will not be a critical key to unlocking understanding of future generations. Just keeping it all, history, comments, and all, will be a huge boon to future historiographers trying to figure out what developing in the early 21st century was like.
You enabled 2fa, you lost your 2fa, and you did not have any recovery codes. Now you are asking for them to bypass the 2fa, and they are refusing.
Again, that sucks, but when I compare this to what the cell phone companies are doing with sim swapping, it increases my respect for Github.
Isn't one of the biggest selling points of 2fa that if one account gets hacked you don't lose all accounts?
In this case, an email account is the second factor. Your Github password is the first factor.
Given how we all have placed GitHub front and center of our lives, there is no excuse for any of this. Even if I'd no recovery codes, I would expect GitHub had some process to get back access back. GitHub account should be looked upon with same reverence as bank account. The idea that you need to forget about money you have in bank account because you lose your phone and recovery code is mind numbing. They could charge you $500, run professional background check on you, get your credit card/bank info and verify it's really you, wait for 30 days, remove private repos and then grant you access again at least for your public repos. That's what I would expect an efficient customer obsessed organization to do.
If Github is doing something different than most websites, that needs to be very very clearly communicated at setup time, otherwise people will assume Github behaves like other websites.
> Warning: For security reasons, GitHub Support may not be able to restore access to accounts with two-factor authentication enabled if you lose your two-factor authentication credentials or lose access to your account recovery methods.
https://help.github.com/en/github/authenticating-to-github/r...
>Treat your recovery codes with the same level of attention as you would your password!
Most people forget passwords all the time and treat that as no big deal because they can just reset the password. If people use that "same level of attention" for the recovery codes, then they will be lost just as easily, except this time people won't be able to recover.
Edit:
After enabling I got an email with a much more cautionary tone:
> Recovery codes are the only way to access your account again.
> GitHub Support will not be able to restore access to your account.
It's interesting they call this out in the email but not during the sign up process.
- You log with your 2FA into a device. - You set this device as trusted. - In case you lose your phone or 2FA device, log onto the trusted device and disable 2FA. - Then set up 2FA again with new phone or device.
It's not perfect, but it's workable. For instance, while I won't enable trusted device on my laptop, having my desktop stolen is a way rarer occasion, so I enable the "trust this device" options. It's just a matter of thinking on the threat model and where you can place spots for recovery while loosing as little security as possible.
You can have multiple second factors with Github. I currently have two Yubikeys and one authenticator app enabled. If I lose one I can still log in with another.
2FA with SMS has the problem that companies offer support by human and don't have a tight system for changing numbers to another phone, as proven again and again, resulting in accounts being compromised.
While the current solution of MFA aren't perfect, it's hard to come up with other solution that would be as safe or safer and prevent most to all mechanisms used to compromise accounts, like phishing, social engineering and other possible remote attacks. Giving you the possibility to save the codes somewhere physical has its downsides, but an important upside is that it allows _you_ to keep in charge your own security in most cases.
I can see the argument for being forceful about prompting you to write down backup codes or whatever, but fixing that after-the-fact is something they should absolutely not been doing.
I do think that sites should offer better options for recovery. NearlyFreeSpeech do a really good job of this, offering seven methods of recovery and letting you decide how many you need to fulfil to be given access and which you want to configure. However, things like checking photo ID and more "offline" options are expensive to support, so I get why that is rare.
No, it just means you use a different 2FA. Here's some examples.
- Gov ID
- Pushing to a private repo
- Checking if IP matches historical records (hell, since they're fingerprinting everyone, why not use that?)
There's problems with 2FA. Mainly that it is on your phone. Phones are pretty valuable devices and it isn't unlikely that they get stolen. Even if Google backed up Authenticator to Drive, how do you get Authenticator back on your new phone?
It isn't hard to come up with a hundred scenarios where things fall apart. This is why there are different levels of security. But my GitHub isn't so sensitive that if I lose access I don't want anyone to ever get access. Yet at the same time it is sensitive enough (and GitHub is attacked enough) that I don't think just a user name and password is sufficient. Where's the middle ground? Even with YubiKeys I have to have multiples (as a backup in case I lose one). There's such a thing as "acceptable security levels."
We're trying to make things easier and safer for humans. But that doesn't mean they aren't still going to be human.
Gandi handles this nicely with a checkbox in the settings that explicitly tells support to never ever restore the account if 2FA is lost
The main problem is with targeted attacks. That's why SMS not secure
> This is why there are different levels of security. But my GitHub isn't so sensitive that if I lose access I don't want anyone to ever get access. Yet at the same time it is sensitive enough (and GitHub is attacked enough) that I don't think just a user name and password is sufficient.
Yes, there is stuff people would rather have destroyed than get into the wrong hands. But I personally can't think of any. But I do think we all agree that a password alone is not secure enough. What I'm saying is that there shouldn't be two choices. There should be middle grounds. SMS is not a great recovery tool for 2FA because it is trivial to fake. Specifically with my GitHub I'm okay with my security not being invulnerable to nation state actors. But I don't want it to be hacked because of a dictionary attack with some modifications. If someone has a copy of my passport then there's much bigger problems that I have than my GitHub being hacked.
The key part here is that there should be varying levels of security. The two levels we have (no 2FA vs 2FA) is too small.
* as to the IP: I was suggesting that it was being used in combination with other stuff. Not a standalone verification. Just like your phone number should never be a standalone verification. We're talking multifactor.
GitHub is a social site, and the users accessing repositories are just as important as those that own them. You might not care if your account is compromised, but if other people trust your repoisitories and they get attacked through your compromised account, that is a problem.
Sounds reasonable. What's important is that you can securely recover the account at all, not that it's cheap.
> However, things like checking photo ID and more "offline" options are expensive to support, so I get why that is rare.
For you to characterize my efforts with Github support as "talking my way past it" is absurd. I was never once asked to provide a photo ID, etc. something I would have gladly complied with.
The entire point of 2FA is that it's a second factor and no way around it without a second factor that is verifiable without doubt.
After that experience I switched to Authy, which does an encrypted cloud backup of your TOTP secrets.
The most relevant issue for me being the lack of U2F (now WebAuthn, I guess) support. It is also really annoying you can't have restricted SSH keys to allow for automation that is locked down to single sites.
They have had those issues on their feature voting for years.
If you still have the keys, try:
ssh -i ~/.ssh/your_linked_privkey -T git@github.com verify
Edit: I previously wrote to pass your public key to the ssh client rather than your private key. Of course, that was incorrect.
[1] https://help.github.com/en/github/authenticating-to-github/d...
Heck, even in this very scenario, if I haven't used an SSH key with GitHub in many years, and then GitHub receives an artifact signed with that key saying "I lost my 2FA token and backup codes, please reset account auth so I can log back in", I very much do not want GitHub to trust that artifact. If I haven't used the key in years, that probably means I don't have it anymore and either never got around to removing it from GitHub or forgot it was there.
Of course, someone might still have removed those keys. IDK.
Now instead of accepting the result is the consequence of your own poor choices you are trying to shift the blame to GitHub.
If a human can’t give me my account back through tech support I’m not very keen on trusting my account a gadget that can break or get lost.
The risk of losing a phone and the backup codes is probably several orders of magnitude larger than the risk of being the target of a sim swap attack for the vast majority of users.
For one, no one is forcing you to only have one TOTP device. You can scan that QR code as many times as you want. Have them on multiple devices.
Depending on your threat vectors, putting them into a password manager that supports it (like Bitwarden) might also be smart. Less secure than fully offline, but definitely better than SMS.
As for the backup codes - one big encrypted text file synced to the cloud of your choice should do the trick, but if you prefer the "scary men with guns" kind of security, safety deposit boxes were literally made to store this kind of stuff (bonus points for on-paper encryption).
Cite: https://www.eff.org/deeplinks/2017/09/guide-common-types-two...
As an extra suggestion: if you use an Android phone for OTP, [andOTP](https://github.com/andOTP/andOTP) supports exporting directly into a PGP-encrypted JSON file which can then be either imported back into the app or converted back to QR codes with a script.
Since it allows you to trigger the export using a Broadcast Intent, I have it set up to do that as a part of my weekly backup Tasker script (of course, you could also just use any other sync solution and manually export when you add a new code).
>≥ without having to have them present at every registration
For example, I have given a token to a family member in another country, for proper utility I need that token back each time I register on another site..
What difference does it make unless everyone you trust is gone or has lost everything? At that point you have larger problems than logging into online accounts.
Nevertheless the anxiety of losing the physical device with all my 2FA logins is what prevented me from enabling 2FA on most of my accounts until I was referred to Authy (authy.com) where you can sync your 2FA across multiple devices including your PC, which other than being very convenient, the effortless syncing + redundancy gave me confidence to enable 2FA on all my accounts as the redundancy ensures I'll still be able to access my accounts if one of my devices is broken/lost.
Luckily I only had public repos, so I created a new account and forked them all. Support had told me that if there isn't any login activity in the blocked account for a period of six months, they would delete the account and release the username. And yes, I had to follow up after six months were up to make that happen.
The entire point of 2FA is to avoid someone taking over your email and then being able to access _anything_ tied to that email.
This is probably the most popular current dogma in security circles. This is a policy issue not an intractable mathematical problem. In theory maybe it could be as rigorous as math but in practice security is always relative.
The same can be said for strong encryption, where losing the keys means you have no chance of recovery. I'm not an advocate of defaulting to things like full-disk-encryption for that reason (and know people who lost a lot due to that.) I guess the underlying problem is "humans are fallible, strong security is not", and that risks "in the other direction" aren't often mentioned; would you rather have the risk of being hacked, or of losing access to your data forever?
You may have forgotten that it doesn't matter how you store your password, but the problem is that it is a single factor. Once compromised, one can gain access to anything within that account. You may be compromised by phishing, keylogging or other means. 2FA can help with making these types of attacks more difficult, although not impossible.
You clicked it, failed to know the ramifications of 2FA (which GitHub does spell out), and didn't secure your backup codes.
Take some responsibility instead of blame shifting.
That just sounds like a fun project.
But also a great engineering exercise. I wouldn't be surprised if this exercise leads to lots of valuable improvements for GitHub users in the here and now. Trying to solve such a grand challenge forces you to develop a vocabulary and understanding of your current systems that can lead to more immediate improvements. I think it unlikely any of these archives will actually be accessed, but simply building them could lead to great side effects.
It's also great marketing, as I now believe that Microsoft/GitHub takes the job of not losing user data extremely seriously, more so than if they had spent an equivalent some of money buying an ad that says "We take not losing data seriously".
www.longnow.org
The second question is what will enter public domain first, business-owned software or individually-written software. If the latter, which open-source developer has died the earliest?
Can we just take a second to appreciate how downright inhumane copyright expiry is? It basically encourages cheering for people to die because of copyright expiry.
https://github.com/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee/eeeeeeee...
The archive team should be able to get a very good compression ratio on that repo.
There’s a big difference between “public” and “stored in a glacier forever”.
You probably meant this as a rhetorical question, but I'd argue that yes, (for public available data at least) it probably should be. It'd enable solutions to a lot of problems we have with the current web, not least archival and broken links.
So there's a lot of reason to believe the 2020 snapshot won't also be online.
My intuition is that archiving data for long term historical use is different from datamining a [meta]data to maximize invasion of privacy. Also there is a difference in accessibility, stored inside a glacier very few people are going to actually read it.
I believe that if mass complaints from all over the EU emerged it would be a different story. But this does not look like the activities the GDPR was created for
You can't exactly GDPR request deletion of your information from a printed book, so I'm curious how GDPR applies to such physical archival mediums.
So you can feel better?
It's more akin to "hiding" - to me, deleted means unrecoverable, by anyone, at any time.
Even your OS doesn't "delete" files, until the actual sector on the drive is overwritten (depending on media used, of course).
Yes, probably
I'm going to go against the HN zeitgeist and say no.
If I have the right to publish something to the web, I should also have the right to edit and delete it if I so choose.
"What happens on the Internet stays on the Internet. Forever."
So, "share only code no-one cares about and don't work on it much" seems to also be an option.
> As today’s vital code becomes yesterday’s historical curiosity
shouldn't be "tomorrow's historical curiosity"?
Also archeologists study rot.
Why not use a much more accessible medium like M-DISC, which claims a lifespan of at least 1,000 years? https://en.wikipedia.org/wiki/M-DISC
just wondering how an encoded digital format is more accessible than something printed on a transparent film that you can see by looking through it.
I bet people in 2200 can't wait to figure out what version of node they need to run a specific gulp version from an obscure error, caused by an underlying dependency's reliance on an deprecated built-in (true story).
After figuring out that issue, they will need to debug why a specific package's version is broken, only to find out the maintainers 200 years ago hard-coded the century.
Plus, people are going to reinvent the wheel no matter what.
I don't think anyone is going to be hoping to use this code for their own purposes, but rather to study programmer culture and development. For instance, when did the idea of "immutable by default" start, and how did it spread from esoteric academic languages into mainstream languages like Rust? How did the decades-long specter of Perl 6 affect the course of Perl 5, and how did the rename to Raku lead to its resurgence (or lack thereof)?
I'm no expert in history, but I have read a reasonable amount; and what's often fascinating is when there are two groups who think that they are in complete disagreement, but in fact share a common assumption which you, from the perspective hundreds of years in the future, do not share.
There are things that are so obvious to us now that we don't bother writing them down or explaining them, which will be a complete mystery to people 1000 years hence; and it's basically impossible to imagine what those might be.
I think that’s what GP means, that if you don’t include the “context” (interpreter, cpu architecture, power supply), you’re only burying the most malleable tip of the iceberg.
It reminds me of the Zork source code that was published here some months ago. It was written in a language for which the compiler appears to have been lost. People have tried to develop compilers given the source code provided, but we can’t get it to work just right, because some crucial bit of context appears to be missing.
There will probably be people who maintain LISP and C for decades more, but imagine trying to write a Java or Haskell compiler with only the source code for Kafka or Pandoc as your guide?
Not impossible of course, but hey, maybe you should check the source code for a JVM or ghc version (and maybe add a Linux distro and gcc for good measure) in with your source in time for the cut off date. :)
Point is, once you enumerate all of the dependencies, you realize the only solution is what GP said. Constant maintenance at all levels is the only thing that keeps the ship running.
At face value, this seems like a really odd choice. I don't understand why you would choose QR encoding unless this was being printed. I feel like I'm missing something here.
It is. QR codes printed onto microfilm, essentially.
[0] https://code.google.com/archive/
"Si vis pacem, para bellum"
Its very important to protect the civilizatory process from fallout, as severe or black-swan conditions might swipe what we have acomplished so far, and history is full of examples of advanced civilizations that were vanished and we had to dig out from the mud slowly without a chance to learn, starting all over again from scratch.
The civilizatory process is fragile and is always menaced by all sides, all the time. Its good to be always vigilant and prepared to anything.
It seems prudent. We may be entering a period of global upheaval and panic where large parts of human civilization will not make it past the next 100 years. It makes sense to attempt to preserve what we've accomplished in a way that will last for 1000s of years.
What is the issue of archiving like this may I ask.
It's like these strange place full of dead elephant in africa.
Observe: resultant web page is unusable.
So the acid test is, have the people implementing this really thought this through?
Based on the web pages they are using, the answer appears to be, no.
Complexity is the enemy of reliability. A principle first stated in 1958 in The Economist.
https://books.google.com/books?id=aDsiAQAAMAAJ&dq="complexit...
I would consider it pretty obvious that the permanence of usability/readability for archiveprogram.github.com is orthogonal to the actual goal of this project. Do you think future generations will learn about the archive from archiveprogram.github.com?
(I am participating here solely out of curiosity about the web; am not trying to make any point or argue any position.)
We know plain text will be around.
I was surprised to find the website promoting this is tricked-out, which isn't consistent with very long-term vision of the project.
Chris Pedrick's Web Developer Toolbar, an extension that was once the greatest thing, does this and many other useful and insightful things.
My fave, I think, is Information -> View Document Outline, which is pivotal for gaming Google SEO, by telling you how your Heading tags semantically convey the page's story.
https://chrispederick.com/work/web-developer/
uMatrix can disable CSS for any source, including 1st-order references.
There are extensions which will disable CSS as well.
Alternatively, you can view under a console mode browser (lynx, links, elinks, w3m, etc.), and see what appears.