I found it both amusing and annoying that the first line of the article mentions VLOOKUP.
In my experience, there's never a good reason to use VLOOKUP. You can always achieve the same functionality using INDEX (in conjunction with MATCH). Using VLOOKUP means that your formulae break as soon as someone inserts a new column in the middle of your table. And clicking the cell with the formula doesn't immediately show you which two columns are used by the function.
That is unless you parametrise that column index as well - you can do a MATCH to retrieve your desired column number and safeguard it against column insertions/deletions. (It weakens the case for VLOOKUP, but being quite a light Excel user, this is usually how I do things.)
I know Hacker News dislikes spreadsheets as opposed to statistical programming languages like R and Python, and there have been many startups trying to disrupt that paradigm, but time and time again they have been the most reliable tools for data analysis and collaborating with nontechnical users.
Even as a data scientist working with other data scientists, my most common deliverable is a Google Sheet.
I don't dislike people using the statistical tools available to them, but in my own field (social sciences) there's a huge replication crisis going on right now. And a lot of that is due to people who were never good at math taking easy-to-use statistical tools like Excel and SPSS and blindly running stats without programming or math training.
Is it too much to ask that people treat a field with a bit of respect? Like, just because NYT reporters can use some of these "data skills", can they hold off a bit until we figure out if they're even any good at statistical analysis after their crash course? We currently have an entire academic field that has to throw away a lot of their findings because tools like sheets and SPSS gave them false confidence. I don't have any higher hopes for the NYT newsroom.
I think that’s unfairly dismissive to the data scientists, statisticians, analysts and engineers who work at the New York Times and other major publications (as well as smaller, but crucially important places like Pro Publica).
The purpose of this material isn’t to suddenly turn normal reporters into data scientists, it’s to give them a better grasp and understanding how how to evaluate different types of information that become important when reporting.
I don’t know how good or bad this material is — a cursory glance shows that it’s very low-level, the type of stuff I learned in my 100 level accounting and stats classes as an undergrad. But I won’t dismiss this material being made available and potentially augmented for all — tho I wish it was stored in GitHub or GitLab.
If you look through the material, there is nothing that actually says that someone who goes through this training will be a skilled data journalist. But it might just prevent poorly-interpreted articles like this  from being written.
And for the record, I’ve worked with data journalists who were more skilled in math and computer science than the engineers I work with at giant tech companies.
In my consulting career, I've seen major organizations track data in Excel, updating each month by creating a new set of columns. An absolute nightmare to handle. Of course every month the structure changed slightly, to make it more fun.
I've also seen values stored as an #rgb only - no actual value in a cell.
To add to that, I have seen billions of dollars in EMD traded off an Excel model that took 30 minutes to run (come in to work, hit F9 to recalc / pull in updated market data) go get coffee, come back, send trades to trading desk
SPSS and it’s native format .sav are the industry standard for survey data. If anyone is in that field and hasn’t used Q (q research software) I highly recommend it. I used to do a lot of crosstabs and banner reports in SPSS and wincross and Q saved my probably 15 hours of work in that process per survey.
Excel is an excellent tool, the most democratic data tool out there (and for the foreseeable short term future).
Whether you are a finance person working on balancing the sheets (or whatever else they do, that's beyond my knowledge) or an operation person building complex macros, it is both versatile, easy to use and yet powerful.
The byproducts of this is that it is also subverted by the genius of human ingenuity and you end up with some pretty interesting, awe-inspiring contraptions.
Such as building a calendar system but using arrow graphs to fill it out.
And that's fine, except when you try to scale things up to automate the process and save people money.
Now you have to do some pretty wasteful engineering to accommodate this pesky creativity we have.
That is the really interesting and a testament to being a really good software (in features and reach).
It breaks the boundaries of the limited vocabulary of computers and therefore can only be fully leveraged by humans.
It's both a great reminder that most people in the world are not on the same level as the crowd around here  and that as the group who create tools used by such an audience, we have to be mindful of that.
On a positive note, I think there are some interesting work being done to rethink how to approach those tasks, I remember using one product in particular that had the right mix of being visually engaging while enforcing boundaries. But this won't solve the issue of users having to hack their way into getting what they want how they want it.
For the Excel pros here: how do you ever template anything in Excel? Like if you want the same formula for two different tables, do you just copy-paste? How do you keep them in sync? There's just so many things I want to do in Excel and it seems so limited in what it can do that I can't fathom how non-programmers end up using it so successfully.
I have seen formulas entered as plain text in a cell, and then a macro used to update other cells with that formula. You can then change the formula in one place and run the macro to update it everywhere.
The best way to create reusable formulas in your workbooks is to add your table to the data model (which creates an in-memory tabular cube) and write measures using the DAX formula language. This has the added benefit that a single formula can be written to aggregate data at different levels, for example, a sum can be calculated over days, months, or years. This will only allow you to share formulas in a single workbook. The data model in Excel is powerful and under utilized.
I've seen folks in excel say "I'm not a programmer, so I did this in excel." and ... I was pretty damn impressed. Some of them had some of the logical thinking basics down to be pretty good at programming, if they gave it a try I suspect some might be damn good.
I’m not really very facile with excel, but one place it’s really handy for me is situations that combine manual data entry with computation.
A super simple example is keeping track of monthly bills when you have flat mates, and need to split them every month. It’s not so critical that I feel the need to version control it etc, but it’s still nice to have a visually inspectable record. Even though I spend most of my days writing code, anything I can dream up that involves python or whatever just seems unnecessarily opaque and baroque. A spreadsheet is ideally suited to the task.
I don't know who dislikes spreadsheets, I am a programmer by trade and I regularly use spreadsheets to accomplish dozens of tasks, from managing household budget to making semi-automated expense reports to keeping scores in local gaming club. Spreadsheets are an excellent tool if used properly, and more you learn about them more uses they reveal.
While there are a number of ScholarlyArticle journals that can publish notebooks,
I'm not aware of any newspapers that are prepared to publish notebooks as NewsArticles. It's pretty easy to `jupyter convert --to html` and `--to markdown` or just 'Save as'
Does this course recommend linking to every source dataset and/or including full citations (with DOI) in the article? Does this course recommend getting a free DOI for the published revision of an e.g. GitHub project repository (containing data, and notebooks and/or the article text) with Zenodo?
Attempts to get journalists more up-to-speed with this sort of stuff are to be applauded.
But the real problem is journalists (and their audiences) who, for a lack of professional ethics, don't give a crap about which parts of their stories can/cannot be backed up quantitatively. Besides selling newspapers, not giving a crap also has the great benefit that now they don't have to learn math, or programming, or logical thinking, or any of that.
Having done both journalism and cs in undergrad and grad, respectively, I'd say the former is more nuanced.
Both are relatively easy to dabble in, both relatively hard to reach expertise.
Things like the inverted pyramid, sourcing and neutral voice will get you fairly far in terms of basic information relaying, but great journalists are specifically skilled at interviewing, data diving and other things tertiary to pen on paper.
In retrospect, I was never suited for journalism. I can write well, but that's not really a great (traditional) journalism skill. The finished product for hard news is fairly bland and paint by numbers. Anything else treads into entertainment-journalism and the kinds of things that have a bunch of people screaming "fake news." I call that Race-To-The-Bottom Journalism and it's very in vogue these days.
I have less than great communications skills but have had great experience pairing with people who have good skills. Eg to give corporate seminars we both go onstage and the partner will structure the talk and interrupt me/clarify if I blur over something/watch for the audience’s reaction.
As a parallel, maybe journalists should pair with subject matter people (even subject matter generalists) rather than have them as “sources”. There are of course people who are great at both things (your Nate Silvers) but the whole process might be cheaper and more efficient if, gee, the New York Times would have a couple onstaff PhD economists (not star columnists) that sit in the newsroom trying to give shape to the facts about to be reported, side by side with journalists.
Just imho: I’ve found journalism awkward to teach because at its heart, it’s all very much just skills you use as an everyday human, with an extra emphasis on being unafraid to ask questions. My high school journalism teacher told me that if I wanted to learn journalism, the best way was to just do it. And not to go to j-school but to just work for the college paper. And I think that still applies to any working adults today. Besides doing it, being an avid news consumer is great preparation.
As a former journalist, yes and no. When I first started, my company treated me to a 1 week internal course, that was very good. As I recall, there was a day on story structure - news leads v feature, pyramid, delayed drop etc. A day on the basics of law for journalism - libel and the like. A day on interviewing and keeping transcripts. A day on ethics,, on the record, off the record, non-attributable. There was also a dy on what subs do - headline writing and the like. All in in all a good, very useful grounding.
No. This was in the mid to early 1980s in the UK, for a large B2B computer magazine publisher (VNU). Journalism school really wasn't 'a thing' to nearly the same degree then. Instead you were trained on the job. My degree was biology, but I had worked a lot on the student newspaper.
I don't think this kind of induction training was typical - it was just put on by that particular company at that particular period. But it was very good. Shout out to the trainer who I still remember 30 years later. You were excellent Tim https://www.linkedin.com/in/tim-ring-33b7233
Yeah I interned and worked at 2 large regional papers and I never got trained in writing or story structure, just on how to use the CMS, LexisNexis, and occasionally the in-house lawyer would come in to do off-the-record q&a’s about legal issues
The skills of a good journalist — interviewing, elicitation, focus on critical details, and the ability to build a compelling story — are skills that /anyone/ should find valuable. But a developer or architect responsible for product, process, or service design should find it particularly useful when interviewing users and stakeholders and determining functional and experiential design.
Communication skills would be a prerequisite for journalist skills, and I see the front page of Hacker News filled with discussions on communication skills every other day.
Small things like putting the executive summary on top, starting every speech/article/email with a hook that tells the user why they care, etc. If a bunch of coders could master those skills, and combine it with data analysis and web design skills, then they could become a force that can compete with the mainstream media.
It's not symmetrical like that. Journalists report on data based phenomena all the time, so understanding statistics is a fairly useful skill.
Programmers don't typically have to do journalism. They do have to write, but writing =/= journalism. If we're talking about writing classes for programmers, personally I don't think there's anything special about the programmer use case, as opposed to "writing in professional contexts" in general.
If you currently work at the NYT as a data-science researcher, doing the job that journalists without data-science experience can’t do, and you see your job imperilled by this, you’ll want to do exactly what these journalists are doing, but in reverse: Expanding their role to something they didn’t previously do in order to stay competitive in the job market.