Google exec challenges Berners-Lee (2006)

(web.archive.org)

91 points | by ColinWright 913 days ago

17 comments

bravogamma 911 days ago
In reading this article, I found myself first completely agreeing with Berners-Lee, and then, completely agreeing with Norvig. I would love to see Berners-Lee vision come to life; however, I am not sure how to overcome the human vice elements Norvig is talking about.
[-]
- echelon 911 days ago
  > The second problem is competition. Some commercial providers say, 'I'm the leader. Why should I standardize?'
  This is the reason. The only reason. Google was the leader in this story.
  Google built infrastructure and intelligence to handle Norvig's other concerns: provenance and validity. It's not perfect, but it's good enough, and it's a major competitive advantage.
  Why would they want everyone on the internet to have Google powers at low cost?
  Wikipedia solves for these concerns. Open source does too. There's no reason we couldn't have had maintainers and curators and a distributed web of pubkey signing to vouch for good data. Most of the data would have been social by nature anyway, and sharing news and articles p2p would have been an early take at the fediverse. But broader than just the Twitter focus.
  Google launched the WHATWG partially to supplant the W3C and their semantic web push. They knee-capped XHTML and its strong semantics in favor of a loosely typed, messy, and forgiving HTML5. Because Google is one of only a few players that can deduce the semantics on their own. (They then pushed ahead unilaterally so that Chrome was dominant. They fashion the web into an image that suits them.)
  The Semantic Web would have threatened Google by lowering the barrier to entry to parties that wanted to connect and query data. Of course Google hates it.
  [-]
  - zozbot234 910 days ago
    HTML5 does have a fully-supported XML representation, there's no regression from XHTML. And Google themselves are working with schema.org to provide standards that endow web pages with strong semantics, along a semantic-web model - this is basically what's powering "rich" SERP results in Google and other search engines. That doesn't look like they "hate" the semantic web all that much.
    [-]
    - codetrotter 910 days ago
      > HTML5 does have a fully-supported XML representation, there's no regression from XHTML.
      Perhaps I am not quite understanding what you mean but, HTML5 allows things that are not legal in XML.
      For example:
      <!doctype html> <html lang=en> <meta charset=utf-8> <title>Home – ACME, Inc.</title> <div id=outer-wrap> <header id=header-main> <h1><a href="/">ACME, Inc.</a></h1> <h2>Happy times</h2> </header> <nav id=navig-main> <ul> <li><a href="/">Home</a> <li><a href="/products/">Products</a> <li><a href="/blog/">Blog</a> <li><a href="/kb/">Knowledge Base</a> <li><a href="/support/">Support</a> <li><a href=/about.htm>About Us</a> </ul> </nav> <div id=content-main> <div class=quux> <h3>Etaoin shrdlu</h3> <p>Today is a nice day :) <p>Here are some <a href="http://www.example.com/">things we find important</a>: <ul> <li><a href="https://www.google.com/search?q=foo">foo</a> <li>bar <li>baz </ul> </div> <article class=snippet> <header> <figure class=article-illustration-image-hero> <img src=/static/images/bees.jpg alt="Buzzing bees."> <figcaption>Buzzing bees.</figcaption> </figure> <h3>The Baz and the Bees</h3> </header> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. <p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. <p class=snippet-more><a href=/blog/2021/the-baz-and-the-bees.htm>Read more</a> </article> </div> <footer id=footer-main> <p>Copyright © 2021 ACME, Inc. </footer> </div>
      This above example is a complete, valid HTML5 document, and representative of how I like to write HTML using HTML5.
      I leave attributes unquoted where allowed. Mainly, as long as the attribute value does not contain space, equal sign, quote marks, or trailing slash, the value can be left unquoted and so I do. In XHTML this is not allowed. In XML I’m not sure.
      I omit closing tags where allowed. For example you see in the example above that I’ve left both the p tags and the li tags unclosed. In XML, this is not allowed.
      No XML schema and no DTD inside of the file itself. In HTML5 neither an XML schema nor a DTD is specified as part of the markup.
      In short, HTML5 as a whole is not valid XML. A subset of HTML5 may be valid XML. But a document can be valid HTML5 without being valid XML.
      Personally I like HTML5 a lot better than XHTML etc, exactly because the rules for HTML5 are so much more permissive than XHTML, so typing HTML5 by hand lets me type less to achieve the same and more than I did back before HTML5 existed.
      [-]
      - nicoburns 910 days ago
        > I omit closing tags where allowed. For example you see in the example above that I’ve left both the p tags and the li tags unclosed.
        The genius of the HTML5 spec is that it allows this loose parsing while specifying an unambiguous mapping to the stricter syntax, so semantically this makes no difference at all (unlike the prior situation where different browsers parsed this kind of HTML differently). Of course you need an HTML5 parser rather than an XML parser, but these are common and don't represent a big hurdle in parsing.
        There are other bits like namespaces and DTDs that differ.
      - zozbot234 910 days ago
        See https://html.spec.whatwg.org/multipage/xhtml.html#the-xhtml-... . Note that this XML representation can be derived automatically by parsing the "permissive" HTML5 syntax, and reuses the same vocabulary as far as practicable. However, it is fully compatible with XML tools, and even with other XML namespaces within the same document, which are not allowed in the HTML5 syntax.
- jdavis703 911 days ago
  This is where editorial judgement comes in. It could either be highly centralized with Google or another search engine providing the curation. Or it could be farmed out using PKI, similar to how extended verification was done back in the day, except the verification is more encompassing of “human vice” and serves as a feedback signal for the search engine.
mark_l_watson 911 days ago
This was in 2006. In 2002 I had lunch with Peter and when I mentioned the Semantic Web, he was not enthusiastic.
I would argue that today Google’s Knowledge Graph, DBPedia, and WikiData are all successful projects.
[-]
- raggi 911 days ago
  Maybe, but to tell a story, these projects ended up making flights to a non-existent place.
  I'm from Bermuda, a small group of islands in the Atlantic ocean.
  What's probably now approaching a decade ago, we were alarmed to see airlines starting to sell tickets to "Greater Bermuda". This isn't a thing that exists - and initially it was assumed that someone at the airline had made some kind of mistake.
  Over the next six months since we first saw this, the concept of "Greater Bermuda" and "Great Bermuda" started cropping up all over the place, eventually making it into Google's knowledge graph as well.
  At this point, and encouraged by some fellow Bermudians, I dug in. It turns out that this all started as a fiction Wikipedia article written on the de subdomain. At some point someone translated all these orphaned articles to English bringing them over to the main wikipedia.org. Sometime later, an editor came along and merged several Bermuda articles into the Greater Bermuda article, and then various knowledge bases that absorb data from Wikipedia started aligning to this new "reality". Some time after that, various airline databases also started being reseeded with metadata from these data sets.
  I happened to work for Google (still do) and was able to start filing bugs. At first, most were ignored/discarded, but after some persistence, both on Wikipedia and inside Google, I was able to start removing some of these invalid sources.
  As for the data, well, if you fly into Bermuda, you'll land in St. George's, which is absolutely not the larger of the islands, though if it's cupmatch (https://www.gotobermuda.com/article/cup-match-time-bermuda), some say it is the greater.
  These projects are successful, yes, and they're mostly reasonable data sets, but if you are a motivated adversary, or an innocent fallible, you can use these projects, directly or indirectly to make significant change. While this story didn't have much more than a cosmetic effect, some of the folks prodding me at the time predicted worse outcomes had the situation continued. Bringing the issue closer to home for most here, integrate this example with the social political strife and active participants, and you may question whether these projects are even a good idea. Perhaps we could instead once again employ people to write encyclopedias, and fact check? We sure need the jobs, and there's plenty of money around.
  [-]
  - sellyme 910 days ago
    > Perhaps we could instead once again employ people to write encyclopedias, and fact check?
    We have a couple of centuries' worth of evidence that this system is subject to exactly the same kinds of flaws, along with the downside of them being much harder to correct.
  - dtech 911 days ago
    As always, there's a relevant XKCD
    [1] https://xkcd.com/978/
    [-]
    - ftrobro 910 days ago
      And funnily enough that problem also happens for encyclopedias. Check the article about airplanes/flight in any old encyclopedia and you will probably see this incorrect theory about how airplanes can fly:
      https://www.grc.nasa.gov/WWW/k-12/airplane/wrong1.html
      I suppose most of the people writing encyclopedias check their facts in other encyclopedias.
      [-]
      - Geenkaas 910 days ago
        Also a lot of people working for NASA check their facts from those:
        https://www.grc.nasa.gov/www/k-12/UEET/StudentSite/dynamicso...
        [-]
        ftrobro 910 days ago
        The irony... I guess this proves that inter-team-communication is more difficult than rocket science...
- bawolff 911 days ago
  True, but i feel like central repos like wikidata are quite a bit different from the original vision of many small websites all interoperating into a semantic web.
  Then again the same thing happened with the normal web. We have wikipedia instead of thousands of people writing articles on their personal homepages.
  [-]
  - mulmen 911 days ago
    But Wikipedia isn't a primary source and presumably it could (and does) link out to those personal sites.
    [-]
    - teh_klev 910 days ago
      Unfortunately you then run into "reliable sources" and "notability" battles with other wikipedians over citing personal websites.
      [-]
      - bawolff 910 days ago
        For reference, official rules are https://en.wikipedia.org/w/index.php?title=Wikipedia:RSSELF
  - zozbot234 910 days ago
    > True, but i feel like central repos like wikidata are quite a bit different from the original vision of many small websites all interoperating into a semantic web.
    One of the main roles for Wikidata is to act as a directory for these "many small websites" that enables them to interoperate seamlessly. You can look up a real-world entity on any website that Wikidata supports and use a third-party service ("Entity Explosion", see https://www.wikidata.org/wiki/Wikidata:Entity_Explosion ) to get links to the same entity as it appears on Wikidata and other sites.
  - wibagusto 911 days ago
    What is IPFS though if not a glimpse of the future? Distributed makes more sense to me and P2P concepts are being heavily researched with all the blockchain hoo-ha!
    [-]
    - bawolff 911 days ago
      A proof-of-concept that's generated a lot of buzz but not a lot in terms of concrete or unique usage?
- aerovistae 911 days ago
  How do people remember specific years for such ordinary events as lunch, so long ago? My mind simply doesn't tag things that way.
  [-]
  - tsm 910 days ago
    Usually there are a few life events to anchor it to (school graduation, marriage, child birth, moving, changing jobs, etc.)
  - musicale 911 days ago
    now: google calendar, search for 'lunch with peter norvig'
    2002: maybe iCal
- SquareWheel 911 days ago
  Not to mention schema.org. These are used heavily for Google search integration (reviews, business information, show times).
andyjohnson0 910 days ago
I'd like to agree with tbl, but I have a nagging feeling that semantic web is a bit like Ted Nelson's ideas for Xanadu: somehow too neat and tidy, wheras the web has been successful despite that.
At internet scale the semantic graph is going to have to emerge from automated, deductive effort. There aren't enough people, and there isn't enough human attention or appropriate reward systems, to bring about a widespread semantic web by building metadata librarian-style. I wish it were otherwise.
[-]
jhbadger 910 days ago
I think calling Peter Norvig "a Google executive" (yes, I know that's the title of the article being linked to) is a great disservice to him. Yes, he worked at Google for a time, but he wasn't some beancounter -- he's an extremely accomplished computer science researcher both in academia and industry.
oh_sigh 911 days ago
Well, it looks like Norvig won that fight. Now I guess the question is was Norvig prescient or did his reluctance to get involved in the semantic web given his position of power set the mood for google and the other big players?
[-]
- Arnavion 911 days ago
  "Won" in that the average website's not using RDF, yes.
  "Won" in that Google's doing a better job than RDF could, no. https://i.imgur.com/UkcR945.png
  ( https://news.ycombinator.com/item?id=27622613 )
  [-]
  - cowmoo728 911 days ago
    There are some really funny ones.
    https://i.imgur.com/K7GdKKY.png
    https://i.redd.it/tucvzslvh3151.png
    ^ the grass one is because lawns became a popular fashion statement in the 17th century, but still.
    https://i.redd.it/yjclt3q9nh111.jpg
    [-]
    - amscanne 911 days ago
      The second is actually a decent example of where the semantic web would probably fail because of the imprecision, but Google has succeeded to some degree. “When was grass invented” has no good answer because it’s a stupid question. You could interpret this as “when was grass discovered” but I suspect this is not the question being asked. Google has interpreted this as “when was the grass lawn invented”, which is probably what the user was actually asking. In both cases, the results are actually great (my first link was to the Wikipedia Lawn article).
      [-]
      - Cybiote 911 days ago
        I don't think it's a stupid question, just oddly phrased (more on that later). I think there are two stable interpretations. "When did grass evolve" or "when were grass lawns invented".
        Later:
        Why isn't it an obviously stupid question? Because I think the accompanying "who (or what) invented grass" is validly answered as "by evolution". I feel the act of invention requires no intentionality and is simply the output of learning processes where generated artifacts have material and dynamic properties embodying deep knowledge of physical laws and help achieve some goal relative to an environment. Evolution learns in the sense that the mathematics of natural selection mirror that of bayesian filters.
  - google234123 911 days ago
    "nvidia 2080 number of cuda core" works
- lemmsjid 911 days ago
  I wouldn't think Norvig won the fight. In fact, Google has been part of the charge to make it typical for federated content sites to provide structured data. For example, Google "recipe for pumpkin pie". You will see a bunch of recipes from a variety of sites. Go to one of them, and you'll see markup for schema.org's Recipe class (https://schema.org/Recipe). A ton of Google's structured search results are powered by this. Norvig's concerns were valid, and over the years they have largely been overcome by a combination of tooling (CMS systems automatically emitting structured data alongside HTML) and incentivization (emitting structured data allowing one's site to be included in Google's structured search results).
- wmf 911 days ago
  The Semantic Web didn't need any help dying; very few people were ever interested in it.
- warkdarrior 911 days ago
  What's really going to bake your noodle is, did Norvig launch massive multi-year campaigns of spamming and misinformation on the Internet to prove his point and win over Berners-Lee?
arthurcolle 911 days ago
"He said the next stage of the Web is about making data accessible for artificial intelligence to locate and analyze."
Meanwhile, in reality: "F* it, we'll do it live!" with a bit of "hmm what it we add more layers"
1shooner 910 days ago
I've never seen these two approaches as necessarily in opposition. Yes, search engines can and probably should use proprietary algorithms to derive semantic meaning, relationships, and credibility, but we can also have a more expressive semantic medium to create that content, and give authors more agency when being evaluated by those algorithms. It does not have to be a binary choice between Google black-box and endless viagra spam.
Today Google does consume all sorts of structured data. With low/no code web publishing tools, incompetence is not really a blocker.
arpit 911 days ago
Interestingly Google today does support (a subset of) Microdata embedded in HTML within Gmail, Google Assistant and Search. Much simpler than the complex constructs of the Semantic Web as originally conceived though.
rektide 911 days ago
> "What I get a lot is: 'Why are you against the Semantic Web?' I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,. . . We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step."
Thankfully we have a bit more practice writing HTML today. Rdfa & microdata are both fine markup formats for html that are not hard to understand. Rdfa first got semi-started in 2004, & honestly, was not that challenging. More code-centric systems also have JSON-LD as an option, which is fairly simple to author.
What concerns me most though is that Norvig seems to be arguing a strong case of worse-is-better, claiming that since some people will find semantic web difficult to publish, it's not worth doing at all. That we could be creating value, that we could be making information more rich is not of interest: the focus is on the downsides, is on creating an image of unsuccess, unpossibility. It's a dissapointing framing & difficult to try to move a conversation forward past this low bogland Norvig sets up.
> "The second problem is competition. Some commercial providers say, 'I'm the leader. Why should I standardize?'"
Today Google is a primary instigator/driver of schema.org, along with Microsoft Yahoo, and Yandex. I'd say Google is the best evidence this hard problem wasn't really so hard.
Also, there are other perfectly good ontologies out there that also coexist.
> "The third problem is one of deception. We deal every day with people who try to rank higher in the results"
Again a problem specific to Google. Semantic web doesn't have to be the authoritative source of truth & 100% accurate to start to add value to the world, to make the information on the page meaningful & semantic. But again Norvig is applying his specific demands to the Semantic Web, & making an incredibly high bar.
---
The page has been a glossy synthetic produced whole object-artifact for a long time now. I for one continue to hope we can do what the semantic web promised: start to break down the things on the page, to define the page in terms of a construction of a bunch of different, individual, meaningful objects.
I don't think truly empowering information & augmenting people (in this semantic web style) is compatible with the mission statement of a company whose goal has been to "organize and make available the world's information," since such a company would be less needed if the information were better able to present & make available itself. Norvig was taking down a huge threat, and one that indeed is not particularly helpful to his cause.
[-]
- verifex 911 days ago
  That's the subtext missing from the article. It seems actually rather obvious that a big company built on owning the mechanism of information retrieval and priority organization of that information would see a decentralized, or at least more open, means of organization that takes the reigns away from the search engines would see this as an existential threat.
  [-]
  - throwawaylinux 911 days ago
    Absolutely. Not that any organization should really default to getting the benefit of the doubt, but an executive from Google? Anything they do is about increasing company profits, and if there are any benefits to anybody else it is purely coincidental.
    This is the first thing I thought of too. Improving organization of data would lower the barrier of entry for competitors and so be a threat to Google. Malicious and incompetent actors have to be dealt with anyway so he inserts a sneaky little non-sequitur there by implying that not having this metadata would somehow avoid that problem.
- bawolff 911 days ago
  > It's a dissapointing framing & difficult to try to move a conversation forward past this low bogland Norvig sets up.
  Wishing things were different than they are doesn't make it so. If there's an argument that's hard to move past, it probably means its a good argument.
  Its not even a unique argument - its something every open-membership distributed system has to answer. The spam issue itself has been the death of many such systems (usenet comes to mind)
  [-]
  - rektide 911 days ago
    I'm not sure what you are trying to contribute to the discussion with this abstract, high minded generalization? I need to return to what I said the sentence before, for some context about what we're actually trying to debate, I think?
    > What concerns me most though is that Norvig seems to be arguing a strong case of worse-is-better, claiming that since some people will find semantic web difficult to publish, it's not worth doing at all.
    I continue to believe this & don't see how it's challenged. It doesn't matter at all if not everyone does the semantic web.
    I continue to think Norvig is making us wade out of a swamp of irrelevant disbelief that has no bearing on what use there is. It's insisting on an absolutist framing when we can have ongoing, continual gain & wins from some folk trying to do better.
    It's hard to move past because it's hard to debate irrelevant sticky points. Norvig's point was entirely a non-issue, and finding that meta-analysis to explain that is harder than having a point that actually means something that we can then discuss straightforwardly. This objection Norvig raised is irreelvant & nonsensical. It was actively harmful to suggest everyone has to be onboard and everyone has to do a thing 100% right for a thing to be ok.
    [-]
    - bawolff 911 days ago
      Ah ok, i thought you were generalizing that to all his points (incompetence and malice) not just the adoption one. I do think his adoption criticism is by far his weakest argument, but i am still convinced by it.
      That said, I think there's a lot of missing elaboration for the argument. My reading of it is:
      * users want web crawling tools that work for the majority of the web
      * most authors will find semantic web too complex to implent (Say > 95% don't implement)
      * as a result, if we want to serve our users, we would need to make two tools,one for semantic sites and one for not
      * The non-semantic tool is going to work for both semantic and non semantic sites, so why bother building two tools?
      To reiterate my previous comment - this is a bog standard product problem. Its so common it even has a name: chicken-or-the-egg. Tons of products require some sort of critical mass or network effects to be useful. Succesful products find a way to get there or make the product somewhat useful before that point so incentives are such that there is a path to get from 0 to a stable sustaining point.
      > It's hard to move past because it's hard to debate irrelevant sticky points. Norvig's point was entirely a non-issue, and finding that meta-analysis to explain that is harder than having a point that actually means something that we can then discuss straightforwardly.
      If you really believe lack of adoption is a non issue, then i'm not sure why you think you have to move past this. Success is the best come-back to criticism, and if the lack of adoption is a non-issue, why are we debating whether or not its an issue instead of just pointing to the successes?
      And to be fair, there have been successes in specific limited domains. Domain-specific successes just aren't the grand idealistic vision of a unified semantic web that people like to present. I suspect that its this grand ideal that Norvig is arguing against, not against niche specific publishing of structured data.
      [-]
      - rektide 911 days ago
        > "users want web crawling tools that work for the majority of the web"
        again the overriding belief that the lowest common denominator is the only function worth serving. again refusing the challenge of doing better outright, & opting into only believing in worse. you've started off by excluding all use cases except the worst one.
        and you just made that up with no evidence. we'd need to have a better web already built & rolled out to know this, to a/b test. maybe if we had gone down the semantic web paths harder we'd have really good parameterizable search that totally changed everything, became a core workflow for literally everyone. maybe maybe social media never turned into a toxic cesspool because we emerged web of trust systems, and social democracy flourished world-wide via the web. maybe we built a better more decrntralized web that got people information some better way. you've started with the view that what we have now is optimal & that serving the lowest user is the only goal, but we don't know what other kinds of success might have looked like; we don't even know how this specific limited end you've imposed might be served.
        > "most authors will find semantic web too complex to implent (Say > 95% don't implement)"
        again just assuming worse is the only real option. assuming incompetence, like Peter.
        https://developers.google.com/search/docs/advanced/structure...
        here's google's docs for how to implement semantic web today. i'm pretty sure most of the lower quartile of developers i've worked with (still not a terrible lot) could get something ok going on an existing project in half a day, based on this.
        peter just said the world was "incompetent" & basically left it at that. this argument feels like a slightly less harsh re-assertion of that; again, i disagree.
        > "as a result, if we want to serve our users, we would need to make two tools,one for semantic sites and one for not"
        i understand nothing about how we lept here. even assuming i was sympathetic to either of your first points, i see no advantage or benefit from this. it makes me quake. it's anti-participatory- it believes there is some web today which must be conseved as it is, never grown further. it proposes any changes in use must fork off & start their own separate sphere of community, use. this sounds godforsaken awful & ruinous. not only is worse better but it must be prrserved so against the better. this is a stunning conclusion which i am still working to recover from, and wholly contradictory to both Postel's Law, and to the malleability of the medium which has, imo, made it the most competent & powerful shared medium we have on the planet.
        > chicken-or-the-egg"
        i think this is a chicken & egg problem too, sort of, in that both must arise & it's not clear how exactly it gets started. but like the real chicken & egg, in truth there are only many many small microshifts, adjustments, small evolutions, allowed diversification & many different flourishings that can then arise. i see no reason the extensible powerful web & the extensible powerful user agent should do as you say & split. we obviously clearly should enrich, should make the web that is semantic, so the chicken can become more chicken like, and better tools, so the egg can become more egglike. we sacrifice nothing by working for better. we can make our information better, easily, at little cost to us & none to the lower users, & we can over time find & create a wide range of tools that dont just search, but which generally improve the user experience, in a range of ways. when pages take the step of marking up their contents in machine readable ways, we enable more possibilities, more diversification, more possible flourishing.
        > if you really believe lack of adoption is a non issue,
        again i think there's some very abstract words that again miss the topic of conversation. a reminder, i was talking to this:
        >> "We deal with millions of Web masters who can't configure a server, can't write HTML"
        i do believe in a far wider range of adoptability than picking up the lens of the narrowest view. picking up the crudest typecast of bad operators one can reach for, & using that to browbeat those who want to try better, more advanced, is what i described as a low bog. peter called the world "incompetent", as his opening salvo, and i think that's just incredibly small & not worth giving an ounce of time or energy.
        whatever truth there are to peter's views, the world ought try to be better. a better hypermedia is more likely to, in the long run, breed better hypermediaists. a view that we're bad is likely to breed less.
        > instead of just pointing to the successes?
        ok so this time you're willing to possibly entertain my notion- not assuming everyone is incompetent & goning to be unable to manage it- great- and now we're going to evaluate semantic web by what successes we have today to our name.
        i do think there are some great user tools/extensions for seeing the semantic web about them as they go about.
        but mostly i think there's a whole range of ehat the web could be that's been bypassed, ignored, by cant-do worse-is-betterism trash talking. the semantic web we have now is some islands of json-ld not so much marking up the web, making it semantic, but providing m2m hooks for search engines.
        there are still some successes. some cool tools to see embedded data, to make better breadcrumbs of where we go. activitypub is showing the value of extensible, interlinked structured data, is a hotbed of neat social systems & apps & decentralizations. the js libraries have only recently got good, got modern, but they have; that's a subtle but promising sign to me, a form of success.
        but mostly disbelief & scorn has carried the day against the semantic web, kept it from accruing even the slow-burning background interest i think it deserves. mostly we havent been incrementally growing, making small advances. more energy is spent telling pop culture that semantic web is dead than there is time or words spent exploring what might be or what would be possible.
        personally, measuring success under these conditions of not just neglect but hostility seems like yet another convenient way to refuse the call, to deny the real exploration & engagement with what could be or what we could improve on. we can keep at setting up hurdles for the semantic web to jump over for you & Peter's pleasure, but none of these topics, these trials, i think, have much value for a hungry or open mind, one seeming to go further.
        [-]
        bawolff 910 days ago
        > again the overriding belief that the lowest common denominator is the only function worth serving. again refusing the challenge of doing better outright, & opting into only believing in worse. you've started off by excluding all use cases except the worst one.
        I feel that you are mis-stating my position.
        > this is a stunning conclusion which i am still working to recover from
        Ffs, melodramatic much?
        > here's google's docs for how to implement semantic web today. i'm pretty sure most of the lower quartile of developers i've worked with (still not a terrible lot) could get something ok going on an existing project in half a day, based on this.
        To be clear, i think most could do even the original rdf if they so desired. My argument is not that they are literally incapable, but that their interests are not aligned with doing so (replace incompetent with lazy if it makes you feel better)
        > we sacrifice nothing by working for better. we can make our information better, easily, at little cost to us & none to the lower users, & we can over time find & create a wide range of tools that dont just search, but which generally improve the user experience, in a range of ways. when pages take the step of marking up their contents in machine readable ways, we enable more possibilities, more diversification, more possible flourishing.
        We sacrafice opportunity cost. Time spent working on things with unsolved problems and no plan for conceievably solving them is time that can be better spent on solutions that are actually practical (or could be spent on solving those unsolved problems). But by all means anyone who thinks they can make it work, should go make cool things.
        I'm not saying that people shouldn't work on semantic web - just that if they do, they should work on solving norvig's complaints or try to make them inapplicable to the specific niche they are working on.
        > again i think there's some very abstract words
        I don't understand how you are using abstract in this sentence or how my previous comment was abstract.
        > picking up the crudest typecast of bad operators one can reach for, & using that to browbeat those who want to try better, more advanced, is what i described as a low bog
        Then you should make the argument that the few are sufficient. There's plenty of technologies where you don't need everyone on board. Semantic web as originally envisioned seems like one where you do. If you disagree that you do, this is a good place to attack peter's argument.
        > whatever truth there are to peter's views, the world ought try to be better. a better hypermedia is more likely to, in the long run, breed better hypermediaists.
        I agree, but we should work on things that are practical given the constraints we have or figure out ways to remove those constraints. Running head first into a known wall, pretending it isn't there, doesn't solve anything.
        To give an example - take gnutella vs bittorrent. Gnutella had a problem that many participants were selfish peers (downloaded but didn't upload). If we just said that selfish peers were "bad operators" and we should ignore them in favour of the people who are good, we would be stuck with gnutella. Instead, bit torrent came along, recognized the issue, created the tit-for-tat algorithm, which changed the social dynamics, overcoming the problem.
        > ok so this time you're willing to possibly entertain my notion- not assuming everyone is incompetent & goning to be unable to manage it-
        That's not my notion. My notion is that people do what is in their personal best interest, and a significant (but not universal) portion have interests that differ from what would be good for a hypothetical semantic web.
        In my view, in order for semantic we to suceed you need to either shift the social dynamics so that more operators interests align with what is good for the semantic web, or you need to make the semantic web more robust to misaligned interests. Preferably both.
        > activitypub is showing the value of extensible, interlinked structured data, is a hotbed of neat social systems & apps & decentralizations. the js libraries have only recently got good, got modern, but they have; that's a subtle but promising sign to me, a form of success.
        I think this is an interesting example, because it modifies the vision of the semantic web just enough to address norvig's points.
        Its an isolated system - you don't need the world to use it, just the people you want to follow.
        Incompetence is again fixed by only having people who want to participate participate. Additionally most participants use something like mastadon, so dont have to implement it themselves.
        Similiarly, there are tools to deal with spam, as well as procedures like requiring invites or admin approval to join.
        If anything, i think this is tge exception that proves the validity of norvig's argument. The semantic web doesn't work due to norvig's reasons. Modify the idea slightly to address those reasons, and suddenly it works great.
        > but mostly disbelief & scorn has carried the day against the semantic web, kept it from accruing even the slow-burning background interest i think it deserves
        When has disbelief & scorn ever actually kept a good technology down? Scorn tends to go away the moment the tech does something valuable to someone (for exame, take drop box).
        -------
        To conclude, i think you're defining the semantic web differently than how Norvig is.
        Could something related to but not quite the same as the original vision of the semantic web take off? sure definitely. Your example of activitypub is a good one in my mind.
        Could the original vision of the semantic web take off - i don't think so unless it fixes its incentive problem.
        Of course if someone creates it and it works, i would be happy to be proven wrong.
musicale 911 days ago
Now former Google exec.
[-]
- paganel 911 days ago
  That's a TIL for me, apparently it all happened pretty recently [1]. I would have expected that to be on HN as a dedicated post when it happened, or maybe I just missed it.
  [1] https://fudzilla.com/news/ai/53689-novig-leaves-google-for-s...
  [-]
  - pronoiac 910 days ago
    There was a post here, and he corrected it:
    > "Joins Stanford HAI" is correct; "Leaves Google" is not right–I'm keeping my Google badge, but will spend most of my time at Stanford.
    https://news.ycombinator.com/item?id=28836275
MichaelMoser123 911 days ago
I remember reading about of projects, that were trying to extract ontology data from wikipedia. Does anyone know, if these efforts were successful?
[-]
- easton 911 days ago
  Wikidata was the most successful one, its data is used for Google's knowledge cards and the Siri cards that pop up if you use the Look Up feature in iOS.
  https://www.wikidata.org/wiki/Wikidata:Main_Page
  [-]
  - Vinnl 910 days ago
    DBPedia was the one that extracted from Wikipedia; Wikidata is the community curating the data from the ground up (with the goal of importing them into Wikipedia - i.e. the data flows in the opposite direction).
mrkramer 910 days ago
But why Google isn't automating the process? They could suggest webmasters what metadata to put or Google could create metadata themselves.
gorgoiler 911 days ago
Web … incompetence … PHP … profit …
I don’t know much about semantic search. But.
PHP is the success story of the Internet and it is the complete opposite of perfection. Go figure.
[-]
- bawolff 911 days ago
  As the unix people say, "worse is better".
  [-]
  - TeMPOraL 910 days ago
    Another way of looking at "worse is better" is observing that, while "worse" delivers some results sooner, it also sucks out the oxygen from the room, preventing "better" from ever happening. Iterate it a few times, building one "worse" thing on another "worse" thing, and it's not hard to see why our whole tech stacks are so bad.
    [-]
    - bawolff 910 days ago
      I'll still take code that works in practise over code that works in theory, any day. Turning an idea into practise is not easy and often involves lots of edge cases nobody ever talks about because they aren't the "interesting" part of the solution. I suspect people overestimate how workable some of these theoretically better systems are in reality.
hendry 911 days ago
WHO REMEMBERS XHTML?