> The Chief Data Officer (CDO) will "(1) be responsible for lifecycle data management"
I am very interested in what this type of lifecycle might look like considering most data I feel should be kept forever. I wonder how a lifecycle might collide with the challenges that bit rot[0] are dealing with.
This is already getting really exciting. Some government entities (including lower levels like local, county, and state governments) have moved to digitizing their old paper and microfilm records. But if they're expected to maintain many types of records essentially forever, it places a constant burden to continue to update and migrate data in perpetuity, whereas paper or microfilm can sit in a box in a closet for decades.
For the most part, common file formats like PDFs, JPGs, and TIFs are likely to be understood for a very, very long time, but you don't just have file storage, you have systems to manage, index, and find those files, and those systems are likely to need constant maintenance.
I've seen Blu-ray disc manufacturers claim lifespans of over 100 years with capacities of 100 GB per disc. An entire warehouse of paper and microfilm documents would be able to fit in a shoebox.
Note that most record digitization projects are likely looking at live storage (online disk arrays), as it allows constant access to said records. Also note that the discs may last 100 years but having disc readers which can read them may not, and one would need to load all the discs to do a media conversion.
I would think that outsourcing this to Amazon and Azure for redundant copies would be a sufficient start. AWS S3 and Azure Blob Storage would go a long way. Structured directories, document metadata complements and having hot indexes for said metadata in a database would be a very good start.
A lot of this type of work is already being/been done.
I mean, sure, until Amazon and Azure decide to delete it all [0] to save money, or those companies go bankrupt and shut off their servers. None of these big cloud providers is even 50 years old.
Sure, but if you're legally responsible for holding onto records for a certain period of time ("forever" may be 80-100 years, realistically, in a lot of cases), you're expected to employ a reasonable practice to ensure that happens. In the case of digital files, it'd be crazy not to have an off-site backup for records you're legally obligated to retain. And if you do what is reasonable, and still the records are lost (such as to a fire), you're likely legally in the clear. Whereas if you destroy them prematurely with intent, or are even just negligent as to their maintenance or care, you may be in hot water.
But different and arguably on-average cheaper costs.
Paper can rot if it's not kept climate-controlled, but not nearly as fast as bits stored to non-volatile media if the power goes out and there isn't a hot backup.
Government records aren't "text" though. They're documents, which need to be preserved with any margin notes, markings, signatures, etc. intact. So if you're storing government records, you're storing images.
Physical storage space definitely does have a cost, but so does software licensing, hardware maintenance contracts, etc. And a lot of government organizations have physical space in abundance, but IT resources, less so.
I am reminded of the "API" decision made by Jeff Bezos at Amazon, as famously described by Steve Yegge:
So one day Jeff Bezos issued a mandate. He's doing that all the time, of course, and people scramble like ants being pounded with a rubber mallet whenever it happens. But on one occasion -- back around 2002 I think, plus or minus a year -- he issued a mandate that was so out there, so huge and eye-bulgingly ponderous, that it made all of his other mandates look like unsolicited peer bonuses.
His Big Mandate went something along these lines:
1) All teams will henceforth expose their data and functionality through service interfaces.
2) Teams must communicate with each other through these interfaces.
3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team's data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
4) It doesn't matter what technology they use. HTTP, Corba, Pubsub, custom protocols -- doesn't matter. Bezos doesn't care.
5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
I'd also like to point out that with all design decisions there are trade offs. Bezos made these trade offs specifically with Amazon's scale in mind and likely would not make the same trade offs with a team of 5 devs or even a team of a thousand devs that had good habbits around adhering to data boundries.
Clearly the dev culture at Amazon had a habbit of abusing those boundries or it wouldn't have gotten to the point where such a black and white mandate made a net positive change to the status quo.
> Bezos made these trade offs specifically with Amazon's scale
It's more nuanced for Amazon - they intentionally ignored language skills when looking to hire good developers (difficult in 90's Seattle), so the end result was a hodgepodge of services written in many different languages talking any number of protocols which including Service A directly accessing the database of Service M. This mandate was his answer to teams complaining about working with each other and the bullshit they had to go through when integrating features across service boundaries to get things done.
This is how Bezos built AWS, so he meant services that are needed to build websites, such as databases, servers, containers, queues, caches, storage...
Bezos built AWS like Hadrian built a wall; both ordered that something be done, and it was left to tens of thousands of others to implement it. If you want to credit them with a lone vision, that’s fine if a bit unlikely given the reality of advisors and boards, but let’s not buy into the level of myth making required to say they built things.
I think that Bezos literally wrote every line of code and assembled every atom on the servers hosting the api. That’s how I intetpreted Yegge’s post and why I think Bezos built AWS.
I think that Bezos literally wrote every line of code and assembled every atom on the servers hosting the api. That’s how I intetpreted Yegge’s post and why I think Bezos built AWS.
The Bezos-as-Demiurge theory, or “Gnostic Amazon” theory? I like it.
Bezos didn't build AWS, people who worked for him did. They also didn't do it as part of the normal day to day at Amazon, no matter how much Amazon pushes the "it's the tech we use already" narrative.
> Bezos didn't build AWS, people who worked for him did.
The people that worked on AWS didn't build it, the other people that worked on AWS built it. That's the meaning of your statement.
Bezos works for Amazon, he decides whether AWS gets built or not, how large resources are deployed in the company, and what the priorities are over time. He worked on AWS. If you aren't an engineer, that doesn't mean you didn't help build a product or business. There are a lot of non-engineers that were required to build AWS, including managers, support staff, sales, marketing, HR, accounting and so on.
Saying he didn't build it and that everyone else did, is the same style of mistake as pretending that Bezos built it by himself would be (and nobody actually thinks he built it by himself).
In that example they'd expose it through the source control system and through the build system, though that's really stretching the meaning.
What it really means is that if you're building something that you run on your teams hosts it needs to be exposed through services. For most of the services at amazon that means using the common service interfaces that everyone uses and not just giving out DB credentials (which I've seen happen it took (think it's still in progress 5+ years later)) to untangle.
That's a weird example. A team that makes a library probably doesn't share too much data with other teams. I think this directive is more like, don't share a spreadsheet of your quarterly stats.
It's supposed to be a weird example. We could think of lots of things that don't belong as services. Bezos's post doesn't offer guidance on what should be a service - only that you'll be fired if your team doesn't have one.
What matrix library for computer graphics was Amazon working on back in the early 2000s? Genuinely curious since I don't hear much about things like that from Amazon.
Curious, do you have examples of this working well? It sounds like SOAP, which in my experience ends badly. Yes because, well, SOAP, but also because sharing your schema means you're stuck with it along with any internal assumptions about design that result from it.
I'm assuming you're more likely talking about something like GRPC/protobuf which I have similar gripes about.
I'd always want to make sure no PII is accidentally leaked. Example: in 1997, researchers at MIT showed that using only gender, date of birth and ZIP code, it is possible to identify the majority of US residents! They proved their point by identifying the Massachusetts governor's medical records in a publicly-available dataset that was presumed anonymous. In 2010, Netflix published an “anonymized” dataset of movie ratings by users. After it was released to the public, researchers were able to identify many Netflix users, even though the dataset only contained user ID, movie, rating, and rating date.
Where I work, a tax policy think tank, we purchase from the IRS an anonymized data set of about 100k sample tax returns that we use for modeling the effects of changes to tax policy. We've got an agreement with the IRS that we can never share that dataset, and I'm pretty sure this is why. While I'd love to be able to make our tax model results more transparent, the risk of de-anonymization is too high. I think another tax policy group is trying to create a synthetic dataset that is close to the sample in terms of outputs but is entirely made up, so that it could be used for verification of results by third parties--I hope they succeed.
It's funny, I work for an election services company, how much concern (hard software requirements) there are to keeping said public information secure.
Even more funny are when specifications for engines for weapon systems (tanks, aircraft, not the guns/missiles, etc) are tightly guarded secrets domestically but considered public information overseas.
There are two issues in tension here. One is a right to privacy, but the other is a right to audit who is actually voting. If the voter rolls were secret, there's nothing to stop a massive fraud by those in power to stay in power.
It's largely for primary voting in states that have partisan primaries. If you're registered as a Democrat then you receive a ballot with only the Democratic candidates.
Of course one could argue that all primaries should be non-partisan. Or that governments shouldn't spend money on running partisan primaries on behalf of political parties which are private organizations.
I understand having to select a party for primaries & the pros & cons of it. That is a different & even more complicated issue.
I just don't understand why which party I voted for needs to be "public" record. I don't believe that companies & the general public need to know. I've used campaign software that shows how incredibly easy it is to select a list of people & market towards those who voted one way or another in the primary.
Because this may affect which primary you can vote in. For example, only registered Pastafarians may vote in the Pastafarian primary, to avoid spoiler votes from members of other parties.
I understand having to select a party for primaries & the pros & cons of it.
I just don't understand why that needs to be "public" record. I don't believe that companies & the general public need to know that in 2018 I chose to be part of the Pastafarian party. Maybe I want to avoid all the noodle marketing that comes with that affiliation.
In the US we have a secret ballot. Some effort is gone to to not collect information about what you voted for, only that you voted.
Countries with sham elections tend to be open ballot, as the voters need to show they support the current ruler. This is why the selected candidate wins with 98% of the vote or some such.
You can't even get 70% of people to agree on which direction the sun rises.
No, you can have the lists of voters without having any information about who they voted for (or in this case, party affiliation which gives who they probably voted for).
2 genders times ~25,000 date of births (70-ish years) times 40,000 zip codes equals 2 billion. The population won’t be uniformly distributed over date of birth or zip code, but that gives a factor of six or so leeway.
”It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248 million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where place is basically the city, town, or municipality in which the person resides. And even at the county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S. population. In general, few characteristics are needed to uniquely identify a person.”
I suspect it's more expensive to make it not machine-readable, considering that it likely starts out in a computer, and if it is not stored that way, it become essentially inaccessible even to the people who stored it.
No, but I believe it can't help but be an indicator of the trend toward open government. Right now there are around six states with open data laws.
I have been working for two years with a group trying to add Michigan to the list. Tomorrow I am publishing an open letter to our governor asking her to support our efforts. I will post a link on HN.
Just curious, do government services really not have any reserve funding? It seems like avoiding shutdowns could be solved pretty easily by having reserve funds (at least for 1-3 months or so).
But perhaps that's the point. Shutdowns are supposed to be inconvenient.
Indeed. "Many agencies, particularly the military, would intentionally run out of money, obligating Congress to provide additional funds to avoid breaching contracts." https://en.wikipedia.org/wiki/Antideficiency_Act
> It seems like avoiding shutdowns could be solved pretty easily by having reserve funds (at least for 1-3 months or so
Or by having the government continue working according to last year's budget planning, like many other countries do. I know of no government shutdown in the Western countries except in the US. Even for "third world" countries such things would be disastrous as armies don't like not being fed and paid.
I am deeply afraid of the impact of this law. The amount of meta-work required to consolidate and annotate data we collect, in order to prepare it for public consumption, seems likely to hurt government efficiency.
In addition to the administrative burden, it appears to ignore the fact that non-sensitive information, in sufficient quantity and correlation, becomes sensitive information.
Perhaps my skepticism is misplaced, but my initial reaction is that this sounds better in the abstract than it will turn out to be in practice.
Part of my wife's job is to research Medicaid billing codes for every state (yes, this is a state thing, but I'm just making an example). Once in a while she can get their codes in a form as "advanced" as an excel spreadsheet. But more likely she'll get a PDF doc that she's got to run through an OCR programming to convert it to a spreadsheet, and has to check for errors. Or for some states, nothing is published at all - she's got to piece it together from partner hospital billing records.
There's no doubt that getting this data into a sane format will take the states some extra resources.
But when you consider how much more efficient this will make my wife's company, and every other provider of Medicaid services, it's bound to be a huge win on net. And improving efficiency of delivering healthcare should be important.
The government is big, but the private sector is still much larger. So there's great leverage to make our overall systems more efficient because an investment in efficiency on the government side will be multiplied many times over as seen by the many private entities that the government is overseeing.
Hospitals routinely spend huge sums of money on new equipment that significantly improves their competitive edge on diagnoses and outcomes. They are also willing to spend money on drop-in solutions that lessen the need for paperwork that eats up admin, nurse, or doctor time.
However, you're right that they are notoriously stingy on buying new things if the economics aren't immediately apparent, and will never buy into something that demands radical workflow/org changes.
By hard to sell to, I mean, it's hard to get in front of the right person. The people who control the purse strings are not necessarily the most knowledgeable about the problems at-hand either.
Inefficiency and high expense is the primary burden of open governments, representative governments, and democratic governments. If you want a cheap, efficient government, you want an absolute monarchy. This is why corporations tend to be structured into rather strict hierarchies that bear no small resemblance to feudal kingdoms. That's also why they're terrible at meeting worker demands.
Your reply leans pretty heavily on the assumption that all government agencies consist of bureaucrats. I think you should re-examine that assumption. A small minority of government workers are involved in issuing regulations at places like the EPA. Most are military service members, Homeland Security workers, DOJ law enforcement, etc.
Yeah, I think the proximal issue is that many govt. bureaucrats struggle to keep up with documentation requirements as-is. In some (many?) cases, it takes an exceedingly long time, else does not get done at all.
But--it's possible that better access to open data and a stronger culture in govt. around using data-driven decision and policymaking could improve on this in the long run. Not to mention the fact that much of the pre-existing paperwork requirements could be automated (if implemented carefully).
>The amount of meta-work required to consolidate and annotate data we collect, in order to prepare it for public consumption, seems likely to hurt government efficiency.
I (briefly) thought the same thing when I first read about it, but I think the efficiency gained from having digital, standardized formats will eventually outweigh the inefficiencies of the initial conversion to that format.
I'm also happy that otherwise "dead" data (e.g. papers sitting in boxes in a basement somewhere) could now be used more effectively in aggregate to further increase operating efficiencies. Imagine trying to put together a comparison between a specific subset of finance reports across departments when Department A uses one digital format, Department B uses another digital format, and Departments C through Z all have them in boxes. What would have otherwise been a beaurocratic headache _before you even get to data munging_ now becomes an ordeal that's easier on all fronts, and that data can then be used to fight back against otherwise unknown inefficiencies.
>The amount of meta-work required to consolidate and annotate data we collect, in order to prepare it for public consumption, seems likely to hurt government efficiency.
As a data analyst working for a state government, not consolidating or creating metadata really hurts my efficiency. I've gotten too comfortable with munging tables in PDFs.
>it appears to ignore the fact that non-sensitive information, in sufficient quantity and correlation, becomes sensitive information
This is something we're trying to figure out. The problem is, I doubt many agencies are actually maintaining privacy with their publications. The Census is adopting differential privacy strategies [0], but my own agency relies on practices from the days of printed reports. I know for a fact some of them don't work, but government is slow to adapt.
I am a big supporter of government and do not consider efficiency a primary objective (a good one, but secondary).
To make a cartoony analogy: flight security would be more efficient if everyone flew naked with no hand luggage, but that would defeat the purpose of people traveling from place to place for their own reasons.
Likewise: the government has collected or generated that info; let's put it into a reasonably clean and accessible format so others (who, in the US, have funded its collection/generation anyway) can build upon it.
The inefficiencies of correctly recording and distributing data will be, I think, greatly outweighed by the increased efficiencies of having standardized machine-readable data that's easy to access and use. I work at a think tank that uses various government data sets across agencies and jurisdictions, and the cleaning that goes into analysis is a nightmare. Some agencies have their own quirky conventions--I've seen "-1" used as a flag for "no data" before, which as you can imagine returned some strange results on analysis. A regulation that says "publish data and do it precisely this way" will be a welcome one for me.
I agree when bolted onto data, but if data collections are properly designed to be findable, accessible, interoperable, and reusable from the start, I think long term data management drops due to more efficient processes.
I think closed data or data in pdf masks a lot of technical debt that causes manual labor, expensive proprietary licenses (looking at you SAS for archived data sets).
with the right tools most of it could be very automatic and eventually it will be a non concern. it is getting up to speed that is painful and getting everyone on board
This sounds like an amazing thing!
Anyone serious does their work reproducibly. This shouldn't add more than a little bit about storage on devices and paper, in terms of costs.
I searched/skimmed through the full text of the bill, and it doesn't seem that it applies to data used/generated by grant-based research (or other projects). The language of the bill is heavily focused on executive agencies.
That would be ideal, although a large portion of grant funding goes to medical research (e.g. via NIH), where it would be difficult to anonymize the data. They could require that it be sanitized (differential privacy, etc.), but I don't know how that could be verified effectively/efficiently. The grants process is quite time-consuming for the grantees (and the grantors) without this requirement.
Data generated by a researcher on a government grant generally includes a provision in the grant that says the govt owns the data, doesn't it?
And if the govt owns it, this seems to indicate it should be open...
I'm not really knowledgeable enough on policy to know how this might interact with the new law. But based on my reading, it seems that institutions conducting federally-funded research are required to retain any data generated by the research, but not necessarily that the government "owns" it, per se. There was a change in policy in 1999 that required the ability for the public to use FOIA to access grant-funded research data, with some limitations.
"To balance the need for public access while protecting the research process, OMB’s revision
limits the kinds of data that will be made accessible (it excludes personal and business-related
confidential data) and limits applicability to federally funded data relating to published research
findings produced under a federal award and used in developing an agency action that has the
force and effect of law."
The media would make Trump to be a 5 year old toddler, but the guy isn't stupid. I'm sure there are millions of people that actually think Trump is a complete idiot. The truth is often less colorful than people would hope. Trump is like anyone else -- he has made good choices and bad choices.
Honestly, I'm a bit surprised that people on HN downvote literally anything that is neutral on Trump. "Orange man bad" is an epidemic that affects even the smart people.
It's a pretty sad state of affairs. Makes it nearly impossible to have any sort of honest conversation about politics, and definitely has a chilling effect. I debated even making the comment because I knew it would draw down votes.
I think the entire point of viciously attacking anyone who shows any sign of being part of the "opponents" ideology is to chill that persons speech.
They just don't realize that type of tactic not only fails to silence people but also (quietly) pushes tons of people who normally don't have a horse in the game to the other side. There are plenty of politically ambivalent people who don't want a world ruled by an ideology opposed to open debate/free thought and which actively seeks to destroy context/intention in language in the name of "progress".
There is a comedy skit about what it's like trying to mention the grounded practicalities behind current events in the presence of those who find it outrageous to dig deeper than the headlines:
He might not be stupid, but he is willfully ignorant, and proudly so.
While I would agree the media and public go a bit overboard in infantilizing him, his behavior and persona make it very difficult to give him the benefit of the doubt.
I think ignorance plays a part, but my money is on he just doesn't care.
Infantilizing him might make it more relatable for the average person, but it's just click bait BS. We should call it like it is; He is a sociopath, and anything less is underestimating him. He has a brand and knows how to project it, and will do anything to get "ahead" and feed his ego. Collateral damage be damned.
I think Trump might defy the mainstream definition of intelligence and in a twisted and narrow sense he might have something approaching cunning and a primitive feedback loop modulating his speech like the YouTube algorithm, optimizing for one goal sacrificing everything else
It's rather off-putting to have a corrupt, amoral president who uses transparency as a weapon against his political opponents, lies, and doesn't show respect for civil servants.
I was the deputy director of the Sunlight Foundation and have covered the space for a decade. I tracked Trump's record on open government and documented it daily: https://sunlightfoundation.com/tracking-trumps-attacks-on-tr...
It is, in other words, just my opinion, but a statement of fact. Moreover, bill in question is the OPEN Government Data Act. This president's record on transparency, accountability, ethics and civic engagement is atrocious. Therefore, these issues are directly related.
I am very interested in what this type of lifecycle might look like considering most data I feel should be kept forever. I wonder how a lifecycle might collide with the challenges that bit rot[0] are dealing with.
[0] https://www.theguardian.com/technology/2015/feb/13/what-is-b...
For the most part, common file formats like PDFs, JPGs, and TIFs are likely to be understood for a very, very long time, but you don't just have file storage, you have systems to manage, index, and find those files, and those systems are likely to need constant maintenance.
A lot of this type of work is already being/been done.
"Forever" is a very long time indeed. There's a reason the UK still keeps some records on scrolls of velum: https://www.bbc.co.uk/news/magazine-35569281
[0] - https://news.softpedia.com/news/All-Videos-on-Google-Video-W...
But nothing digital, that's for sure.
Paper can rot if it's not kept climate-controlled, but not nearly as fast as bits stored to non-volatile media if the power goes out and there isn't a hot backup.
Physical storage space definitely does have a cost, but so does software licensing, hardware maintenance contracts, etc. And a lot of government organizations have physical space in abundance, but IT resources, less so.
So one day Jeff Bezos issued a mandate. He's doing that all the time, of course, and people scramble like ants being pounded with a rubber mallet whenever it happens. But on one occasion -- back around 2002 I think, plus or minus a year -- he issued a mandate that was so out there, so huge and eye-bulgingly ponderous, that it made all of his other mandates look like unsolicited peer bonuses.
His Big Mandate went something along these lines:
1) All teams will henceforth expose their data and functionality through service interfaces.
2) Teams must communicate with each other through these interfaces.
3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team's data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
4) It doesn't matter what technology they use. HTTP, Corba, Pubsub, custom protocols -- doesn't matter. Bezos doesn't care.
5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
6) Anyone who doesn't do this will be fired.
https://plus.google.com/+RipRowan/posts/eVeouesvaVX
Perhaps instead of it being "all teams" it should be "all cpu processes"?
It's not a "you must find some data or service to expose". If you have data and someone wants to use it, they do it via service.
https://martinfowler.com/bliki/TellDontAsk.html
I'd also like to point out that with all design decisions there are trade offs. Bezos made these trade offs specifically with Amazon's scale in mind and likely would not make the same trade offs with a team of 5 devs or even a team of a thousand devs that had good habbits around adhering to data boundries.
Clearly the dev culture at Amazon had a habbit of abusing those boundries or it wouldn't have gotten to the point where such a black and white mandate made a net positive change to the status quo.
It's more nuanced for Amazon - they intentionally ignored language skills when looking to hire good developers (difficult in 90's Seattle), so the end result was a hodgepodge of services written in many different languages talking any number of protocols which including Service A directly accessing the database of Service M. This mandate was his answer to teams complaining about working with each other and the bullshit they had to go through when integrating features across service boundaries to get things done.
I'd say you're being a pedant but you're not even technically correct.
The Bezos-as-Demiurge theory, or “Gnostic Amazon” theory? I like it.
Bezos didn't build AWS, people who worked for him did. They also didn't do it as part of the normal day to day at Amazon, no matter how much Amazon pushes the "it's the tech we use already" narrative.
The people that worked on AWS didn't build it, the other people that worked on AWS built it. That's the meaning of your statement.
Bezos works for Amazon, he decides whether AWS gets built or not, how large resources are deployed in the company, and what the priorities are over time. He worked on AWS. If you aren't an engineer, that doesn't mean you didn't help build a product or business. There are a lot of non-engineers that were required to build AWS, including managers, support staff, sales, marketing, HR, accounting and so on.
Saying he didn't build it and that everyone else did, is the same style of mistake as pretending that Bezos built it by himself would be (and nobody actually thinks he built it by himself).
The comment to which you’ve replied said exactly that.
What it really means is that if you're building something that you run on your teams hosts it needs to be exposed through services. For most of the services at amazon that means using the common service interfaces that everyone uses and not just giving out DB credentials (which I've seen happen it took (think it's still in progress 5+ years later)) to untangle.
- you build a matrix math system as a service... or
- you build cloud rendering (as they have done several times)
I'm assuming you're more likely talking about something like GRPC/protobuf which I have similar gripes about.
I advertise a rule set, you give me a function - pattern really - and i return all the data related to that pattern.
There are other organizations that accept a function. effectively the result is return the record in the True case, and don't in the False case.
That sounds pretty awesome (and expensive)!
I'd argue that keeping those public is also a net good for society.
Even more funny are when specifications for engines for weapon systems (tanks, aircraft, not the guns/missiles, etc) are tightly guarded secrets domestically but considered public information overseas.
Of course one could argue that all primaries should be non-partisan. Or that governments shouldn't spend money on running partisan primaries on behalf of political parties which are private organizations.
I just don't understand why which party I voted for needs to be "public" record. I don't believe that companies & the general public need to know. I've used campaign software that shows how incredibly easy it is to select a list of people & market towards those who voted one way or another in the primary.
Laws about this vary from state to state.
I just don't understand why that needs to be "public" record. I don't believe that companies & the general public need to know that in 2018 I chose to be part of the Pastafarian party. Maybe I want to avoid all the noodle marketing that comes with that affiliation.
Countries with sham elections tend to be open ballot, as the voters need to show they support the current ruler. This is why the selected candidate wins with 98% of the vote or some such.
You can't even get 70% of people to agree on which direction the sun rises.
A secret ballot is necessary for free elections.
Of all the comments I've made that should have been down voted, the one I made above was it. HN gods are having mercy on me it seems.
I'm just pointing this out because I'd like to see a movement towards actually more open government.
Source? How were they able to do that?
The 87% paper is https://dataprivacylab.org/projects/identifiability/paper1.p.... It says:
”It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248 million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where place is basically the city, town, or municipality in which the person resides. And even at the county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S. population. In general, few characteristics are needed to uniquely identify a person.”
More info likely available via http://latanyasweeney.org/work/identifiability.html (discussed in https://news.ycombinator.com/item?id=2942967)
If you include year, birth dates carry an awful lot of information.
Kind of depends on what they say is "sensitive". Probably not much of a facility for appealing that designation either.
I sorta look at it like the "First Step" prison reform act, it's only the start.
Baby steps.
I have been working for two years with a group trying to add Michigan to the list. Tomorrow I am publishing an open letter to our governor asking her to support our efforts. I will post a link on HN.
But perhaps that's the point. Shutdowns are supposed to be inconvenient.
Or by having the government continue working according to last year's budget planning, like many other countries do. I know of no government shutdown in the Western countries except in the US. Even for "third world" countries such things would be disastrous as armies don't like not being fed and paid.
In addition to the administrative burden, it appears to ignore the fact that non-sensitive information, in sufficient quantity and correlation, becomes sensitive information.
Perhaps my skepticism is misplaced, but my initial reaction is that this sounds better in the abstract than it will turn out to be in practice.
There's no doubt that getting this data into a sane format will take the states some extra resources.
But when you consider how much more efficient this will make my wife's company, and every other provider of Medicaid services, it's bound to be a huge win on net. And improving efficiency of delivering healthcare should be important.
The government is big, but the private sector is still much larger. So there's great leverage to make our overall systems more efficient because an investment in efficiency on the government side will be multiplied many times over as seen by the many private entities that the government is overseeing.
However, you're right that they are notoriously stingy on buying new things if the economics aren't immediately apparent, and will never buy into something that demands radical workflow/org changes.
It's a meta-comment, but this term sounds slightly misplaced when you're talking about bureaucrats having to do more paperwork.
But--it's possible that better access to open data and a stronger culture in govt. around using data-driven decision and policymaking could improve on this in the long run. Not to mention the fact that much of the pre-existing paperwork requirements could be automated (if implemented carefully).
I (briefly) thought the same thing when I first read about it, but I think the efficiency gained from having digital, standardized formats will eventually outweigh the inefficiencies of the initial conversion to that format.
I'm also happy that otherwise "dead" data (e.g. papers sitting in boxes in a basement somewhere) could now be used more effectively in aggregate to further increase operating efficiencies. Imagine trying to put together a comparison between a specific subset of finance reports across departments when Department A uses one digital format, Department B uses another digital format, and Departments C through Z all have them in boxes. What would have otherwise been a beaurocratic headache _before you even get to data munging_ now becomes an ordeal that's easier on all fronts, and that data can then be used to fight back against otherwise unknown inefficiencies.
As a data analyst working for a state government, not consolidating or creating metadata really hurts my efficiency. I've gotten too comfortable with munging tables in PDFs.
>it appears to ignore the fact that non-sensitive information, in sufficient quantity and correlation, becomes sensitive information
This is something we're trying to figure out. The problem is, I doubt many agencies are actually maintaining privacy with their publications. The Census is adopting differential privacy strategies [0], but my own agency relies on practices from the days of printed reports. I know for a fact some of them don't work, but government is slow to adapt.
[0]: https://privacytools.seas.harvard.edu/why-census-bureau-adop...
To make a cartoony analogy: flight security would be more efficient if everyone flew naked with no hand luggage, but that would defeat the purpose of people traveling from place to place for their own reasons.
Likewise: the government has collected or generated that info; let's put it into a reasonably clean and accessible format so others (who, in the US, have funded its collection/generation anyway) can build upon it.
I think closed data or data in pdf masks a lot of technical debt that causes manual labor, expensive proprietary licenses (looking at you SAS for archived data sets).
Poe's law strikes again.
That would be ideal, although a large portion of grant funding goes to medical research (e.g. via NIH), where it would be difficult to anonymize the data. They could require that it be sanitized (differential privacy, etc.), but I don't know how that could be verified effectively/efficiently. The grants process is quite time-consuming for the grantees (and the grantors) without this requirement.
*not a lawyer
"To balance the need for public access while protecting the research process, OMB’s revision limits the kinds of data that will be made accessible (it excludes personal and business-related confidential data) and limits applicability to federally funded data relating to published research findings produced under a federal award and used in developing an agency action that has the force and effect of law."
https://fas.org/sgp/crs/secrecy/R42983.pdf
1) Open is better
2) Bipartisan legislation, indicates some progress is possible
Edit: It was an honest question I wanted to know the answer to, but I've got my answer.
They just don't realize that type of tactic not only fails to silence people but also (quietly) pushes tons of people who normally don't have a horse in the game to the other side. There are plenty of politically ambivalent people who don't want a world ruled by an ideology opposed to open debate/free thought and which actively seeks to destroy context/intention in language in the name of "progress".
"Stop Making Me Defend Donald Trump!" https://www.youtube.com/watch?v=1eq0X4qDlR0
While I would agree the media and public go a bit overboard in infantilizing him, his behavior and persona make it very difficult to give him the benefit of the doubt.
Infantilizing him might make it more relatable for the average person, but it's just click bait BS. We should call it like it is; He is a sociopath, and anything less is underestimating him. He has a brand and knows how to project it, and will do anything to get "ahead" and feed his ego. Collateral damage be damned.
This was the subject of a previous thread, here: https://news.ycombinator.com/item?id=18746132
"the American president who has done the most to damage democracy in modern history"
Your article starts off with an unsupported opinion irrelevant to the issue, it's not very auspicious for the rest of it.