President Signs Government-Wide Open Data Bill

(datacoalition.org)

310 points | by rmason 1925 days ago

15 comments

ajr0 1925 days ago
> The Chief Data Officer (CDO) will "(1) be responsible for lifecycle data management"
I am very interested in what this type of lifecycle might look like considering most data I feel should be kept forever. I wonder how a lifecycle might collide with the challenges that bit rot[0] are dealing with.
[0] https://www.theguardian.com/technology/2015/feb/13/what-is-b...
[-]
- ocdtrekkie 1925 days ago
  This is already getting really exciting. Some government entities (including lower levels like local, county, and state governments) have moved to digitizing their old paper and microfilm records. But if they're expected to maintain many types of records essentially forever, it places a constant burden to continue to update and migrate data in perpetuity, whereas paper or microfilm can sit in a box in a closet for decades.
  For the most part, common file formats like PDFs, JPGs, and TIFs are likely to be understood for a very, very long time, but you don't just have file storage, you have systems to manage, index, and find those files, and those systems are likely to need constant maintenance.
  [-]
  - swebs 1924 days ago
    I've seen Blu-ray disc manufacturers claim lifespans of over 100 years with capacities of 100 GB per disc. An entire warehouse of paper and microfilm documents would be able to fit in a shoebox.
    [-]
    - ocdtrekkie 1924 days ago
      Note that most record digitization projects are likely looking at live storage (online disk arrays), as it allows constant access to said records. Also note that the discs may last 100 years but having disc readers which can read them may not, and one would need to load all the discs to do a media conversion.
  - tracker1 1925 days ago
    I would think that outsourcing this to Amazon and Azure for redundant copies would be a sufficient start. AWS S3 and Azure Blob Storage would go a long way. Structured directories, document metadata complements and having hot indexes for said metadata in a database would be a very good start.
    A lot of this type of work is already being/been done.
    [-]
    - ForHackernews 1924 days ago
      I mean, sure, until Amazon and Azure decide to delete it all [0] to save money, or those companies go bankrupt and shut off their servers. None of these big cloud providers is even 50 years old.
      "Forever" is a very long time indeed. There's a reason the UK still keeps some records on scrolls of velum: https://www.bbc.co.uk/news/magazine-35569281
      [0] - https://news.softpedia.com/news/All-Videos-on-Google-Video-W...
      [-]
      - tracker1 1924 days ago
        I doubt they'd both close shop at the same time... leaving enough time to shift the data to a new second source.
      - tracker1 1924 days ago
        As an aside, nothing is forever... there are a lot of points in history where records are gone for ever due to fire.
        [-]
        ocdtrekkie 1924 days ago
        Sure, but if you're legally responsible for holding onto records for a certain period of time ("forever" may be 80-100 years, realistically, in a lot of cases), you're expected to employ a reasonable practice to ensure that happens. In the case of digital files, it'd be crazy not to have an off-site backup for records you're legally obligated to retain. And if you do what is reasonable, and still the records are lost (such as to a fire), you're likely legally in the clear. Whereas if you destroy them prematurely with intent, or are even just negligent as to their maintenance or care, you may be in hot water.
        ForHackernews 1924 days ago
        Sure. If I were going for "forever" storage, I would use flame-resistant clay tablets https://www.theatlantic.com/technology/archive/2017/01/human... or perhaps a laser-etched nickel disc: http://rosettaproject.org/disk/concept/
        But nothing digital, that's for sure.
  - talmand 1925 days ago
    It's not like paper in boxes don't have their own maintenance costs.
    [-]
    - fixermark 1925 days ago
      But different and arguably on-average cheaper costs.
      Paper can rot if it's not kept climate-controlled, but not nearly as fast as bits stored to non-volatile media if the power goes out and there isn't a hot backup.
      [-]
      - pbhjpbhj 1925 days ago
        You can keep a whole warehouse worth of paper files on one drive though. 1TB is like 500M pages of text (1000 pallets).
        [-]
        ocdtrekkie 1924 days ago
        Government records aren't "text" though. They're documents, which need to be preserved with any margin notes, markings, signatures, etc. intact. So if you're storing government records, you're storing images.
        Physical storage space definitely does have a cost, but so does software licensing, hardware maintenance contracts, etc. And a lot of government organizations have physical space in abundance, but IT resources, less so.
        [-]
        gaius 1924 days ago
        Don’t know why this is downvoted. When a bank stores a record of a cheque it’s a scanned image for that very reason.
        pbhjpbhj 1924 days ago
        Oh, for sure, imagine trying to warehouse [images of] every submitted form in a condition that can be read - so, climate controlled at least.
teddyh 1924 days ago
I am reminded of the "API" decision made by Jeff Bezos at Amazon, as famously described by Steve Yegge:
So one day Jeff Bezos issued a mandate. He's doing that all the time, of course, and people scramble like ants being pounded with a rubber mallet whenever it happens. But on one occasion -- back around 2002 I think, plus or minus a year -- he issued a mandate that was so out there, so huge and eye-bulgingly ponderous, that it made all of his other mandates look like unsolicited peer bonuses.
His Big Mandate went something along these lines:
1) All teams will henceforth expose their data and functionality through service interfaces.
2) Teams must communicate with each other through these interfaces.
3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team's data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
4) It doesn't matter what technology they use. HTTP, Corba, Pubsub, custom protocols -- doesn't matter. Bezos doesn't care.
5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
6) Anyone who doesn't do this will be fired.
https://plus.google.com/+RipRowan/posts/eVeouesvaVX
[-]
- randyrand 1924 days ago
  This sounds "great", but how exactly does your team that works on say, a matrix library for computer graphics, expose its data over the network?
  Perhaps instead of it being "all teams" it should be "all cpu processes"?
  [-]
  - anth_anm 1924 days ago
    They don't, they write a library and others consume it.
    It's not a "you must find some data or service to expose". If you have data and someone wants to use it, they do it via service.
    [-]
    - lugg 1924 days ago
      ala tell, don't ask.
      https://martinfowler.com/bliki/TellDontAsk.html
      I'd also like to point out that with all design decisions there are trade offs. Bezos made these trade offs specifically with Amazon's scale in mind and likely would not make the same trade offs with a team of 5 devs or even a team of a thousand devs that had good habbits around adhering to data boundries.
      Clearly the dev culture at Amazon had a habbit of abusing those boundries or it wouldn't have gotten to the point where such a black and white mandate made a net positive change to the status quo.
      [-]
      - tomnipotent 1924 days ago
        > Bezos made these trade offs specifically with Amazon's scale
        It's more nuanced for Amazon - they intentionally ignored language skills when looking to hire good developers (difficult in 90's Seattle), so the end result was a hodgepodge of services written in many different languages talking any number of protocols which including Service A directly accessing the database of Service M. This mandate was his answer to teams complaining about working with each other and the bullshit they had to go through when integrating features across service boundaries to get things done.
    - randyrand 1924 days ago
      The Sql services started without any data. The service started with empty tables. But yet they have an Sql service.
      [-]
      - lugg 1924 days ago
        Nobody said you couldn't share, communicate or transfer ownership of your data to another service.
        I'd say you're being a pedant but you're not even technically correct.
      - anth_anm 1924 days ago
        Are you getting off on being deliberately obstinate?
  - sephoric 1924 days ago
    This is how Bezos built AWS, so he meant services that are needed to build websites, such as databases, servers, containers, queues, caches, storage...
    [-]
    - toufiqbarhamov 1924 days ago
      Bezos built AWS like Hadrian built a wall; both ordered that something be done, and it was left to tens of thousands of others to implement it. If you want to credit them with a lone vision, that’s fine if a bit unlikely given the reality of advisors and boards, but let’s not buy into the level of myth making required to say they built things.
      [-]
      - prepend 1924 days ago
        I think that Bezos literally wrote every line of code and assembled every atom on the servers hosting the api. That’s how I intetpreted Yegge’s post and why I think Bezos built AWS.
        [-]
        toufiqbarhamov 1924 days ago
        I think that Bezos literally wrote every line of code and assembled every atom on the servers hosting the api. That’s how I intetpreted Yegge’s post and why I think Bezos built AWS.
        The Bezos-as-Demiurge theory, or “Gnostic Amazon” theory? I like it.
        [-]
        prepend 1924 days ago
        Bezos is definitely Yaltabaoth. Benioff is more of a Demiurge.
    - anth_anm 1924 days ago
      > This is how Bezos built AWS
      Bezos didn't build AWS, people who worked for him did. They also didn't do it as part of the normal day to day at Amazon, no matter how much Amazon pushes the "it's the tech we use already" narrative.
      [-]
      - adventured 1924 days ago
        > Bezos didn't build AWS, people who worked for him did.
        The people that worked on AWS didn't build it, the other people that worked on AWS built it. That's the meaning of your statement.
        Bezos works for Amazon, he decides whether AWS gets built or not, how large resources are deployed in the company, and what the priorities are over time. He worked on AWS. If you aren't an engineer, that doesn't mean you didn't help build a product or business. There are a lot of non-engineers that were required to build AWS, including managers, support staff, sales, marketing, HR, accounting and so on.
        Saying he didn't build it and that everyone else did, is the same style of mistake as pretending that Bezos built it by himself would be (and nobody actually thinks he built it by himself).
      - bobwaycott 1924 days ago
        > Bezos didn't build AWS, people who worked for him did.
        The comment to which you’ve replied said exactly that.
  - ender341341 1924 days ago
    In that example they'd expose it through the source control system and through the build system, though that's really stretching the meaning.
    What it really means is that if you're building something that you run on your teams hosts it needs to be exposed through services. For most of the services at amazon that means using the common service interfaces that everyone uses and not just giving out DB credentials (which I've seen happen it took (think it's still in progress 5+ years later)) to untangle.
    [-]
    - randyrand 1924 days ago
      That seems fair. Any team that was already using the network to communicate should change to a public service architecture.
  - TACIXAT 1924 days ago
    That's a weird example. A team that makes a library probably doesn't share too much data with other teams. I think this directive is more like, don't share a spreadsheet of your quarterly stats.
    [-]
    - randyrand 1924 days ago
      It's supposed to be a weird example. We could think of lots of things that don't belong as services. Bezos's post doesn't offer guidance on what should be a service - only that you'll be fired if your team doesn't have one.
  - grogenaut 1924 days ago
    The non-sarcastic answer is either:
    - you build a matrix math system as a service... or
    - you build cloud rendering (as they have done several times)
  - gaadd33 1924 days ago
    What matrix library for computer graphics was Amazon working on back in the early 2000s? Genuinely curious since I don't hear much about things like that from Amazon.
  - jfoutz 1924 days ago
    Advertise schemas and accept functions over that schema.
    [-]
    - lugg 1924 days ago
      Curious, do you have examples of this working well? It sounds like SOAP, which in my experience ends badly. Yes because, well, SOAP, but also because sharing your schema means you're stuck with it along with any internal assumptions about design that result from it.
      I'm assuming you're more likely talking about something like GRPC/protobuf which I have similar gripes about.
      [-]
      - jfoutz 1924 days ago
        search.
        I advertise a rule set, you give me a function - pattern really - and i return all the data related to that pattern.
        There are other organizations that accept a function. effectively the result is return the record in the True case, and don't in the False case.
mLuby 1925 days ago
>The OPEN Government Data Act requires all non-sensitive government data to be made available in open and machine-readable formats by default.
That sounds pretty awesome (and expensive)!
[-]
- dak1 1925 days ago
  Not if you expand what's considered sensitive!
  [-]
  - mooman219 1925 days ago
    I'd always want to make sure no PII is accidentally leaked. Example: in 1997, researchers at MIT showed that using only gender, date of birth and ZIP code, it is possible to identify the majority of US residents! They proved their point by identifying the Massachusetts governor's medical records in a publicly-available dataset that was presumed anonymous. In 2010, Netflix published an “anonymized” dataset of movie ratings by users. After it was released to the public, researchers were able to identify many Netflix users, even though the dataset only contained user ID, movie, rating, and rating date.
    [-]
    - tvanantwerp 1924 days ago
      Where I work, a tax policy think tank, we purchase from the IRS an anonymized data set of about 100k sample tax returns that we use for modeling the effects of changes to tax policy. We've got an agreement with the IRS that we can never share that dataset, and I'm pretty sure this is why. While I'd love to be able to make our tax model results more transparent, the risk of de-anonymization is too high. I think another tax policy group is trying to create a synthetic dataset that is close to the sample in terms of outputs but is entirely made up, so that it could be used for verification of results by third parties--I hope they succeed.
    - aaomidi 1925 days ago
      Meh. Voter records are all basically public and most have full name, address, phone number and even sometimes party affiliations.
      I'd argue that keeping those public is also a net good for society.
      [-]
      - tracker1 1925 days ago
        It's funny, I work for an election services company, how much concern (hard software requirements) there are to keeping said public information secure.
        Even more funny are when specifications for engines for weapon systems (tanks, aircraft, not the guns/missiles, etc) are tightly guarded secrets domestically but considered public information overseas.
      - skookumchuck 1925 days ago
        There are two issues in tension here. One is a right to privacy, but the other is a right to audit who is actually voting. If the voter rolls were secret, there's nothing to stop a massive fraud by those in power to stay in power.
        [-]
        mattferderer 1925 days ago
        Party affiliation should not be public record. I would like to understand why it is.
        [-]
        nradov 1924 days ago
        It's largely for primary voting in states that have partisan primaries. If you're registered as a Democrat then you receive a ballot with only the Democratic candidates.
        Of course one could argue that all primaries should be non-partisan. Or that governments shouldn't spend money on running partisan primaries on behalf of political parties which are private organizations.
        [-]
        mattferderer 1924 days ago
        I understand having to select a party for primaries & the pros & cons of it. That is a different & even more complicated issue.
        I just don't understand why which party I voted for needs to be "public" record. I don't believe that companies & the general public need to know. I've used campaign software that shows how incredibly easy it is to select a list of people & market towards those who voted one way or another in the primary.
        D-Coder 1924 days ago
        Because this may affect which primary you can vote in. For example, only registered Pastafarians may vote in the Pastafarian primary, to avoid spoiler votes from members of other parties.
        Laws about this vary from state to state.
        [-]
        mattferderer 1924 days ago
        I understand having to select a party for primaries & the pros & cons of it.
        I just don't understand why that needs to be "public" record. I don't believe that companies & the general public need to know that in 2018 I chose to be part of the Pastafarian party. Maybe I want to avoid all the noodle marketing that comes with that affiliation.
        hndamien 1925 days ago
        Kind of defeats the purpose of a public ballot.
        [-]
        skookumchuck 1924 days ago
        In the US we have a secret ballot. Some effort is gone to to not collect information about what you voted for, only that you voted.
        Countries with sham elections tend to be open ballot, as the voters need to show they support the current ruler. This is why the selected candidate wins with 98% of the vote or some such.
        You can't even get 70% of people to agree on which direction the sun rises.
        A secret ballot is necessary for free elections.
        anth_anm 1924 days ago
        No, you can have the lists of voters without having any information about who they voted for (or in this case, party affiliation which gives who they probably voted for).
        [-]
        hndamien 1923 days ago
        Wow! Did I write "public" - I meant "secret" - it's been a long day.
        Of all the comments I've made that should have been down voted, the one I made above was it. HN gods are having mercy on me it seems.
    - anth_anm 1924 days ago
      It's not just PII, it's anything the government just doesn't want people to know. For whatever reason.
      I'm just pointing this out because I'd like to see a movement towards actually more open government.
    - brokenmachine 1924 days ago
      > researchers at MIT showed that using only gender, date of birth and ZIP code, it is possible to identify the majority of US residents!
      Source? How were they able to do that?
      [-]
      - Someone 1924 days ago
        2 genders times ~25,000 date of births (70-ish years) times 40,000 zip codes equals 2 billion. The population won’t be uniformly distributed over date of birth or zip code, but that gives a factor of six or so leeway.
        The 87% paper is https://dataprivacylab.org/projects/identifiability/paper1.p.... It says:
        ”It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248 million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where place is basically the city, town, or municipality in which the person resides. And even at the county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S. population. In general, few characteristics are needed to uniquely identify a person.”
        More info likely available via http://latanyasweeney.org/work/identifiability.html (discussed in https://news.ycombinator.com/item?id=2942967)
      - maxerickson 1924 days ago
        There are large databases that have names, zipcodes and dates of birth in them.
        If you include year, birth dates carry an awful lot of information.
    - wodenokoto 1924 days ago
      What did they cross reference the IDs with for the Netflix data?
    - AckSyn 1925 days ago
      Of course they used identifying information to actually identify people. This is 101 level of forensics and research.
    - mmmmmmmmm 1924 days ago
      You've been doing some bathroom reading I see.
  - bilbo0s 1925 days ago
    True.
    Kind of depends on what they say is "sensitive". Probably not much of a facility for appealing that designation either.
    I sorta look at it like the "First Step" prison reform act, it's only the start.
    Baby steps.
- skookumchuck 1925 days ago
  I suspect it's more expensive to make it not machine-readable, considering that it likely starts out in a computer, and if it is not stored that way, it become essentially inaccessible even to the people who stored it.
  [-]
  - maccio92 1924 days ago
    My hope is that machine-readable means an actual data format, rather than a scanned PDF or something like that.
  - mLuby 1925 days ago
    I would guess it starts out in excel!
- Buttons840 1925 days ago
  Does it apply to state governments?
  [-]
  - rmason 1925 days ago
    No, but I believe it can't help but be an indicator of the trend toward open government. Right now there are around six states with open data laws.
    I have been working for two years with a group trying to add Michigan to the list. Tomorrow I am publishing an open letter to our governor asking her to support our efforts. I will post a link on HN.
crabl 1925 days ago
The deep irony here is that https://www.data.gov/ is still down due to the government shutdown.
[-]
- randyrand 1924 days ago
  Just curious, do government services really not have any reserve funding? It seems like avoiding shutdowns could be solved pretty easily by having reserve funds (at least for 1-3 months or so).
  But perhaps that's the point. Shutdowns are supposed to be inconvenient.
  [-]
  - Maxious 1924 days ago
    Indeed. "Many agencies, particularly the military, would intentionally run out of money, obligating Congress to provide additional funds to avoid breaching contracts." https://en.wikipedia.org/wiki/Antideficiency_Act
  - ghostly_s 1924 days ago
    Agencies would need legislation authorizing them to retain funds for this purpose.
  - mschuster91 1924 days ago
    > It seems like avoiding shutdowns could be solved pretty easily by having reserve funds (at least for 1-3 months or so
    Or by having the government continue working according to last year's budget planning, like many other countries do. I know of no government shutdown in the Western countries except in the US. Even for "third world" countries such things would be disastrous as armies don't like not being fed and paid.
  - anth_anm 1924 days ago
    I don't think there is any point, it's just poor government design in general.
- e40 1924 days ago
  Can we stop with these comments? We get it, and we all know there's a shutdown. It doesn't add anything to the discussion.
torstenvl 1925 days ago
I am deeply afraid of the impact of this law. The amount of meta-work required to consolidate and annotate data we collect, in order to prepare it for public consumption, seems likely to hurt government efficiency.
In addition to the administrative burden, it appears to ignore the fact that non-sensitive information, in sufficient quantity and correlation, becomes sensitive information.
Perhaps my skepticism is misplaced, but my initial reaction is that this sounds better in the abstract than it will turn out to be in practice.
[-]
- CWuestefeld 1925 days ago
  Part of my wife's job is to research Medicaid billing codes for every state (yes, this is a state thing, but I'm just making an example). Once in a while she can get their codes in a form as "advanced" as an excel spreadsheet. But more likely she'll get a PDF doc that she's got to run through an OCR programming to convert it to a spreadsheet, and has to check for errors. Or for some states, nothing is published at all - she's got to piece it together from partner hospital billing records.
  There's no doubt that getting this data into a sane format will take the states some extra resources.
  But when you consider how much more efficient this will make my wife's company, and every other provider of Medicaid services, it's bound to be a huge win on net. And improving efficiency of delivering healthcare should be important.
  The government is big, but the private sector is still much larger. So there's great leverage to make our overall systems more efficient because an investment in efficiency on the government side will be multiplied many times over as seen by the many private entities that the government is overseeing.
  [-]
  - mywittyname 1924 days ago
    There's money to be made selling these information to hospitals. It's just really hard to sell things to hospitals.
    [-]
    - drak0n1c 1924 days ago
      Hospitals routinely spend huge sums of money on new equipment that significantly improves their competitive edge on diagnoses and outcomes. They are also willing to spend money on drop-in solutions that lessen the need for paperwork that eats up admin, nurse, or doctor time.
      However, you're right that they are notoriously stingy on buying new things if the economics aren't immediately apparent, and will never buy into something that demands radical workflow/org changes.
      [-]
      - mywittyname 1924 days ago
        By hard to sell to, I mean, it's hard to get in front of the right person. The people who control the purse strings are not necessarily the most knowledgeable about the problems at-hand either.
  - danmaz74 1924 days ago
    That would make sense... If this provision included additional funding for the additional burden of providing this service.
  - wolco 1924 days ago
    Would this put your wife out of work or force her to change positions internally?
- da_chicken 1925 days ago
  Inefficiency and high expense is the primary burden of open governments, representative governments, and democratic governments. If you want a cheap, efficient government, you want an absolute monarchy. This is why corporations tend to be structured into rather strict hierarchies that bear no small resemblance to feudal kingdoms. That's also why they're terrible at meeting worker demands.
- whatshisface 1925 days ago
  >I am deeply afraid
  It's a meta-comment, but this term sounds slightly misplaced when you're talking about bureaucrats having to do more paperwork.
  [-]
  - torstenvl 1924 days ago
    Your reply leans pretty heavily on the assumption that all government agencies consist of bureaucrats. I think you should re-examine that assumption. A small minority of government workers are involved in issuing regulations at places like the EPA. Most are military service members, Homeland Security workers, DOJ law enforcement, etc.
  - rising-sky 1925 days ago
    Might not be misplaced if the commenter _is_ a bureaucrat
    [-]
    - jointpdf 1924 days ago
      Yeah, I think the proximal issue is that many govt. bureaucrats struggle to keep up with documentation requirements as-is. In some (many?) cases, it takes an exceedingly long time, else does not get done at all.
      But--it's possible that better access to open data and a stronger culture in govt. around using data-driven decision and policymaking could improve on this in the long run. Not to mention the fact that much of the pre-existing paperwork requirements could be automated (if implemented carefully).
- drusepth 1924 days ago
  >The amount of meta-work required to consolidate and annotate data we collect, in order to prepare it for public consumption, seems likely to hurt government efficiency.
  I (briefly) thought the same thing when I first read about it, but I think the efficiency gained from having digital, standardized formats will eventually outweigh the inefficiencies of the initial conversion to that format.
  I'm also happy that otherwise "dead" data (e.g. papers sitting in boxes in a basement somewhere) could now be used more effectively in aggregate to further increase operating efficiencies. Imagine trying to put together a comparison between a specific subset of finance reports across departments when Department A uses one digital format, Department B uses another digital format, and Departments C through Z all have them in boxes. What would have otherwise been a beaurocratic headache _before you even get to data munging_ now becomes an ordeal that's easier on all fronts, and that data can then be used to fight back against otherwise unknown inefficiencies.
- vharuck 1925 days ago
  >The amount of meta-work required to consolidate and annotate data we collect, in order to prepare it for public consumption, seems likely to hurt government efficiency.
  As a data analyst working for a state government, not consolidating or creating metadata really hurts my efficiency. I've gotten too comfortable with munging tables in PDFs.
  >it appears to ignore the fact that non-sensitive information, in sufficient quantity and correlation, becomes sensitive information
  This is something we're trying to figure out. The problem is, I doubt many agencies are actually maintaining privacy with their publications. The Census is adopting differential privacy strategies [0], but my own agency relies on practices from the days of printed reports. I know for a fact some of them don't work, but government is slow to adapt.
  [0]: https://privacytools.seas.harvard.edu/why-census-bureau-adop...
- gumby 1924 days ago
  I am a big supporter of government and do not consider efficiency a primary objective (a good one, but secondary).
  To make a cartoony analogy: flight security would be more efficient if everyone flew naked with no hand luggage, but that would defeat the purpose of people traveling from place to place for their own reasons.
  Likewise: the government has collected or generated that info; let's put it into a reasonably clean and accessible format so others (who, in the US, have funded its collection/generation anyway) can build upon it.
- tvanantwerp 1924 days ago
  The inefficiencies of correctly recording and distributing data will be, I think, greatly outweighed by the increased efficiencies of having standardized machine-readable data that's easy to access and use. I work at a think tank that uses various government data sets across agencies and jurisdictions, and the cleaning that goes into analysis is a nightmare. Some agencies have their own quirky conventions--I've seen "-1" used as a flag for "no data" before, which as you can imagine returned some strange results on analysis. A regulation that says "publish data and do it precisely this way" will be a welcome one for me.
- prepend 1924 days ago
  I agree when bolted onto data, but if data collections are properly designed to be findable, accessible, interoperable, and reusable from the start, I think long term data management drops due to more efficient processes.
  I think closed data or data in pdf masks a lot of technical debt that causes manual labor, expensive proprietary licenses (looking at you SAS for archived data sets).
- mdpopescu 1924 days ago
  ...seems likely to hurt government efficiency.
  Poe's law strikes again.
- Shivetya 1925 days ago
  with the right tools most of it could be very automatic and eventually it will be a non concern. it is getting up to speed that is painful and getting everyone on board
- gaius 1925 days ago
  Trump can’t do anything right, can he?
  [-]
En_gr_Student 1925 days ago
This sounds like an amazing thing! Anyone serious does their work reproducibly. This shouldn't add more than a little bit about storage on devices and paper, in terms of costs.
tracker1 1925 days ago
Haven't dug in... does this include data as part of government funded grant studies? It is a nice start to more open data from the govt.
[-]
- jointpdf 1924 days ago
  I searched/skimmed through the full text of the bill, and it doesn't seem that it applies to data used/generated by grant-based research (or other projects). The language of the bill is heavily focused on executive agencies.
  That would be ideal, although a large portion of grant funding goes to medical research (e.g. via NIH), where it would be difficult to anonymize the data. They could require that it be sanitized (differential privacy, etc.), but I don't know how that could be verified effectively/efficiently. The grants process is quite time-consuming for the grantees (and the grantors) without this requirement.
  *not a lawyer
  [-]
  - count 1924 days ago
    Data generated by a researcher on a government grant generally includes a provision in the grant that says the govt owns the data, doesn't it? And if the govt owns it, this seems to indicate it should be open...
    [-]
    - jointpdf 1924 days ago
      I'm not really knowledgeable enough on policy to know how this might interact with the new law. But based on my reading, it seems that institutions conducting federally-funded research are required to retain any data generated by the research, but not necessarily that the government "owns" it, per se. There was a change in policy in 1999 that required the ability for the public to use FOIA to access grant-funded research data, with some limitations.
      "To balance the need for public access while protecting the research process, OMB’s revision limits the kinds of data that will be made accessible (it excludes personal and business-related confidential data) and limits applicability to federally funded data relating to published research findings produced under a federal award and used in developing an agency action that has the force and effect of law."
      https://fas.org/sgp/crs/secrecy/R42983.pdf
maxxxxx 1925 days ago
I am surprised that in the current political climate any reasonable laws can pass. My only explanation is that there were no lobbyists fighting it.
[-]
- ww520 1925 days ago
  It was passed before the end of last year when the two Houses and the presidency were in the same party's hand. It would be much harder this year.
- vilda 1925 days ago
  Another example is the First Step Act that has been signed Dec 21 last year. It is being considered especially important for poor and black people.
- HumanDrivenDev 1925 days ago
  Lobbyists are probably fighting laws you don't like as well.
  [-]
  - maxxxxx 1925 days ago
    No question.
  - TomMckenny 1925 days ago
    Even a broken clock is right twice a day.
    [-]
    - maxsavin 1925 days ago
      Time isn't real
      [-]
    - yellowapple 1925 days ago
      Depends on how broken. If the hour and minute hands are swapped it could be right many times more than merely twice.
    - craftyguy 1925 days ago
      Unless it is a digital clock, then it is never right.
RickJWagner 1924 days ago
At least two things to like in this:
1) Open is better
2) Bipartisan legislation, indicates some progress is possible
ams6110 1925 days ago
Is this the first act with a recursive name acronym?
kgwxd 1925 days ago
Does he get paid to work today?
Edit: It was an honest question I wanted to know the answer to, but I've got my answer.
[-]
- zbyte64 1925 days ago
  Depends, is the secret service still able to pay trump hotel for their stay?
- iratewizard 1925 days ago
  The whole dollar
earth2mars 1925 days ago
Somehow I saw "re" before the "Signs" that got me excited!
mrcactu5 1925 days ago
how did we convince Trump to do that?
[-]
- anonytrary 1925 days ago
  The media would make Trump to be a 5 year old toddler, but the guy isn't stupid. I'm sure there are millions of people that actually think Trump is a complete idiot. The truth is often less colorful than people would hope. Trump is like anyone else -- he has made good choices and bad choices.
  [-]
  - bhhaskin 1925 days ago
    I think people lose sight of this. You don't make (or keep) that kind of money by being stupid. Ethics is a different story.
    [-]
    - welcome_dragon 1925 days ago
      Wtf guys? You can't have a reasonable discussion like this!
      [-]
      - anonytrary 1925 days ago
        Honestly, I'm a bit surprised that people on HN downvote literally anything that is neutral on Trump. "Orange man bad" is an epidemic that affects even the smart people.
        [-]
        bhhaskin 1925 days ago
        It's a pretty sad state of affairs. Makes it nearly impossible to have any sort of honest conversation about politics, and definitely has a chilling effect. I debated even making the comment because I knew it would draw down votes.
        [-]
        dmix 1925 days ago
        I think the entire point of viciously attacking anyone who shows any sign of being part of the "opponents" ideology is to chill that persons speech.
        They just don't realize that type of tactic not only fails to silence people but also (quietly) pushes tons of people who normally don't have a horse in the game to the other side. There are plenty of politically ambivalent people who don't want a world ruled by an ideology opposed to open debate/free thought and which actively seeks to destroy context/intention in language in the name of "progress".
        [-]
        drak0n1c 1924 days ago
        There is a comedy skit about what it's like trying to mention the grounded practicalities behind current events in the presence of those who find it outrageous to dig deeper than the headlines:
        "Stop Making Me Defend Donald Trump!" https://www.youtube.com/watch?v=1eq0X4qDlR0
    - krapp 1925 days ago
      He might not be stupid, but he is willfully ignorant, and proudly so.
      While I would agree the media and public go a bit overboard in infantilizing him, his behavior and persona make it very difficult to give him the benefit of the doubt.
      [-]
      - bhhaskin 1925 days ago
        I think ignorance plays a part, but my money is on he just doesn't care.
        Infantilizing him might make it more relatable for the average person, but it's just click bait BS. We should call it like it is; He is a sociopath, and anything less is underestimating him. He has a brand and knows how to project it, and will do anything to get "ahead" and feed his ego. Collateral damage be damned.
        [-]
        goliatone 1924 days ago
        I think Trump might defy the mainstream definition of intelligence and in a twisted and narrow sense he might have something approaching cunning and a primitive feedback loop modulating his speech like the YouTube algorithm, optimizing for one goal sacrificing everything else
macawfish 1925 days ago
*This will make it much easier for him to pass the data off to Russian spies
digiphile 1925 days ago
I wrote up this news here: https://e-pluribusunum.org/2019/01/15/president-trump-signs-...
This was the subject of a previous thread, here: https://news.ycombinator.com/item?id=18746132
[-]
- skookumchuck 1925 days ago
  > I wrote up this news here:
  "the American president who has done the most to damage democracy in modern history"
  Your article starts off with an unsupported opinion irrelevant to the issue, it's not very auspicious for the rest of it.
  [-]
  - PaulBGD_ 1925 days ago
    I know the author doesn't claim to be unbiased, but it's rather off-putting to start an article like that.
    [-]
    - digiphile 1923 days ago
      It's rather off-putting to have a corrupt, amoral president who uses transparency as a weapon against his political opponents, lies, and doesn't show respect for civil servants.
  - digiphile 1923 days ago
    I was the deputy director of the Sunlight Foundation and have covered the space for a decade. I tracked Trump's record on open government and documented it daily: https://sunlightfoundation.com/tracking-trumps-attacks-on-tr... It is, in other words, just my opinion, but a statement of fact. Moreover, bill in question is the OPEN Government Data Act. This president's record on transparency, accountability, ethics and civic engagement is atrocious. Therefore, these issues are directly related.
- mmmmmmmmm 1924 days ago
  You're not very good at reporting the news.