Ask HN: Can we create a new internet where search engines are irrelevant?

If we were to design a brand new internet for today's world, can we develop it such a way that:

1- Finding information is trivial

2- You don't need services indexing billions of pages to find any relevant document

In our current internet, we need a big brother like Google or Bing to effectively find any relevant information in exchange for sharing with them our search history, browsing habits etc. Can we design a hypothetical alternate internet where search engines are not required?

374 points | by subhrm 1758 days ago

118 comments

adrianmonk 1758 days ago
I think it would be helpful to remember to distinguish two separate search engine concepts here: indexing and ranking.
Indexing isn't the source of problems. You can index in an objective manner. A new architecture for the web doesn't need to eliminate indexing.
Ranking is where it gets controversial. When you rank, you pick winners and losers. Hopefully based on some useful metric, but the devil is in the details on that.
The thing is, I don't think you can eliminate ranking. Whatever kind of site(s) you're seeking, you are starting with some information that identifies the set of sites that might be what you're looking for. That set might contain 10,000 sites, so you need a way to push the "best" ones to the top of the list.
Even if you go with a different model than keywords, you still need ranking. Suppose you create a browsable hierarchy of categories instead. Within each category, there are still going to be multiple sites.
So it seems to me the key issue isn't ranking and indexing, it's who controls the ranking and how it's defined. Any improved system is going to need an answer for how to do it.
[-]
- jtolmar 1758 days ago
  Some thoughts on the problem, not intended as a complete proposal or argument:
  * Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.
  * How fast a ranking algorithm is depends on how the indexing is done. Is there some common set of features we could agree on that we'd want to build the shared index on? Any ranking that wants something not in the public index would need either a private index or a slow sequential crawl. Sometimes you could do a rough search using the public index and then re-rank by crawling the top N, so maybe the public index just needs to be good enough that some ranker can get the best result within the top 1000.
  * Maybe the indexing servers execute the ranking algorithm? (An equation or SQL-like thing, not something written in a Turing Complete language). Then they might be able to examine the query to figure out where else in the network to look, or where to give up because the score will be too low.
  * Maybe the way things are organized and indexed is influenced by the ranking algorithms used. If indexing servers are constantly receiving queries that split a certain way, they can cache / index / shard on that. This might make deciding what goes into a shared index easier.
  [-]
  - ergothus 1758 days ago
    > Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.
    But what are you storing in your index? The content that is considered in your ranking will vary wildly by your ranking methods. (example - early indexes cared only for the presence of words. Then we started to care about the count of words, then the relationships between words and the context. Then about figuring out if the site was scammy, or slow.
    The only way to store an index of all content (to cover all the options) is to...store the internet.
    I'm not trying to be negative - I feel very poorly served by the rankings that are out there, as I feel on 99% of issues I'm on the longtail rather than what they target. But I can't see how a "shared index" would be practical for all the kinds of ranking algorithms both present and future.
    [-]
    - jgtrosh 1756 days ago
      > The only way to store an index of all content (to cover all the options) is to...store the internet.
      An index cannot hope to cover all options, these ideas are antithetical.
  - jppope 1757 days ago
    this is a pretty killer idea
- mavsman 1758 days ago
  How about open sourcing the ranking and then allowing people to customize it. I should be able to rank my own search results how I want to without much technical knowledge.
  I want to rank my results by what is most popular to my friends (Facebook or otherwise) so I just look for a search engine extension that allows me to do that. This could get complex but can also be simple if novices just use the most popular ranking algorithms.
  [-]
  - josephjrobison 1758 days ago
    I think Facebook really missed the boat on building their own "network influenced" search engine. They made some progress in allowing you to search based on friends' posting and recommendations to some degree but it seems to have flatlined in the last few years and is very constricting.
    One thing I haven't seen much on these recent threads on search is the ability to create your own Google Custom Search Engine based on domains you trust - https://cse.google.com/cse/all
    Also, not many people have mention the use of search operators, which allows you to control the results returned. Such as "Paul Graham inurl:interview -site:ycombinator.com -site:techcrunch.com"
  - aleppe7766 1758 days ago
    That would bring to an even bigger filter bubble issue, more precisely to a techno élite which is capable, willing and knowledgeable enough to feel the need go through the hassle, and all the rest navigating in such an indexed mess that would pave the way to all sort of new gatekeepers, belonging to the aforementioned tech élite. It’s not a simple issue to tackle, perhaps a public scrutiny on the ranking algorithms would be a good first step.
    [-]
    - samirm 1758 days ago
      I disagree. The people who don't know anything and are unwilling to learn wouldn't be any worse off than they are today and everyone else would benefit from an open source "marketplace" of possible ranking algorithms that the so called "techno elite" have developed.
      [-]
      - aleppe7766 1757 days ago
        I think the proposed improvement to the web in its intentions should mostly benefit the "ignorants", not those that can already navigate through the biases of today's technological gatekeepers. Please note, ignorants are not at fault for being so. Especially when governments cut funds for public education, and media leverages (and multiplies) ignorance to produce needs and sales, fears and votes. Any solution must work first to make the weak stronger, more conscious. A better and less biased web can help people grow their unbiased knowledge, and therefore exercise their right of vote with a deeper understanding of the complexity. Voting ignorants are an opportunity for the ill intentioned politicians, as much as are a problem for me, you and the whole country.
  - greglindahl 1758 days ago
    blekko and bing both implemented ranking by popularity with your Facebook friends, and the data was too sparse to be useful.
  - smitop 1756 days ago
    If the details of a ranking algorithm are open source, it would be easy to manipulate them.
  - dex011 1755 days ago
    Open sourcing the ranking... YES!!!
- MadWombat 1758 days ago
  I wonder if indexing and ranking could be decentralized. Lets say we design some data formats and protocols to exchange indexing and ranking information. Then maybe instead of getting a single Google, we could have a hierarchical system of indexers and rankers and some sort of consensus and trust algorithm to aggregate the information between them. Maybe offload indexing to the content providers altogether, i.e. if you want your website found, you need to maintain your own index. Maybe do a market on aggregator trust, if you don't like a particular result, the corresponding aggregator loses a bit of trust and its rankings become a bit less prominent.
- allworknoplay 1758 days ago
  Spitballing here, but what if instead of a monolithic page rank algorithm, you could combine individually maintained, open set rankings?
  ===Edit=== I mean to say you as the user would gain control over the ranking sources, the company operating this search service would perform the aggregation and effectively operate marketplace of ranking providers. ===end edit===
  For example, one could be an index of "canonical" sites for a given search term, such that it would return an extremely high ranking for the result "news.ycombinator.com" if someone searches the term "hacker news". Layer on a "fraud" ranking built off lists of sites and pages known for fraud, a basic old-school page rank (simply order by link credit), and some other filters. You could compose the global ranking dynamically based off weighted averages of the different ranked sets, and drill down to see what individual ones recommended.
  Seems hard to crunch in real time, but not sure. It'd certainly be nicer to have different orgs competing to maintain focused lists, rather than a gargantuan behemoth that doesn't have to respond to anyone.
  Maybe you could even channel ad or subscription revenue from the aggregator to the ranking agencies based off which results the user appeared to think were the best.
  [-]
  - TaylorAlexander 1758 days ago
    Well I suppose Google has some way of customizing search for different people. The big issue for me is that google tracks me to do this. Maybe there could be a way to deliver customized search where we securely held the details of our customization. Or we were pooled with similar users. I suppose if a ranking algorithm had all the possible parameters as variables, we could deliver our profile request on demand at the time of search. That would be nice. You could search as a Linux geek or as a music nut or see the results different political groups get.
  - dublin 1756 days ago
    Building something like this becomes much easier with Xanadu-style bidirectional links. Of course, building those is hard, but eliminating the gatekeeper-censors may finally be the incentive required to get bidi links built. It's also worth noting that such a system will have to have some metrics for trust by multiple communities (e.g. Joe may think say, mercola.com is a good and reliable source of health info, while Jane thinks he's stuck in the past - People should be able to choose whether they value Joe's or Jane's opinion more, affecting the weights they'll see). In addition (and this is hard, too), those metrics should not be substantially game-able by those seeking to either promote or demote sites for their own ends. This requires a very distributed trust network.
  - bobajeff 1758 days ago
    I like the idea of local personalized search ranking that evolves based off of a on device neural network. I'm not sure how that would be work though.
  - sogen 1756 days ago
    Sounds like ad-Blocker repos, nice!
- asdff 1758 days ago
  Not to mention all the people who will carefully study whatever new system, looking for their angle to game the ranking.
- daveloyall 1756 days ago
  > When you rank, you pick winners and losers.
  ...To which people responded with various schemes for fair ranking systems.
  ...To which people observed that someone will always try to game the ranking systems.
  Yep! So long as somebody stands to benefit (profit) from artificially high rankings, they'll aim for that, and try to break the system. Those with more resources will be better able to game the system, and gain more resources... ad nauseam. We'd end up right where we are.
  The only way to break that [feedback loop](https://duckduckgo.com/?q=thinking+in+systems+meadows) is to disassociate profit from rank.
  Say it with me: we need a global, non-commercial network of networks--an internet, if you will. (Insert Al Gore reference here.)
  (Note: I don't have time to read all the comments on this page before my `noprocrast` times out, so please pardon me if somebody already said this.)
- zeruch 1758 days ago
  This is a bang on distillation of the problem (or at least one way to view the problem, per "who controls the ranking and how it's defined").
- aleppe7766 1758 days ago
  That’s a very useful distinction, that brings me to a question: are we sure that automating ranking in 2019, on the basis of publicly scrutinized algorithms, would bring us back to a pre-Google accuracy? Also, ranking on the basis of the sole query instead of the individual, would lead to much more neutral results.
- tracker1 1758 days ago
  Absolutely spot on... I've been using DDG as my default search engine for a couple months. But, google has a huge profile on me. I find myself falling back to google a few times a day when searching for technical terms/issues.
- Retra 1758 days ago
  Couldn't you just randomize result ordering?
  [-]
  - penagwin 1758 days ago
    You know how google search results can get really useless just a few pages in? And it says it found something crazy like 880,000 results? Imagine randomizing that.
    ---
    Unrelated I searched for "Penguin exhibits in Michigan". Of which we have several. It reports 880,000 results but I can only go to page 12 (after telling it to show omitted results). Interesting...
    https://www.google.com/search?q=penguin+exhibits+in+michigan
    [-]
    - Theodores 1758 days ago
      If you think of it as like an old fashioned library or an old fashioned Blockbuster video store.
      Sure you could read any book ever printed in the English language in the local library. They might have to get it in from the national collection or the big library in the city. But you ain't going to see every book in the local library. There is more than you could wish for and you will never read every book in the local library. But all the classics are there, the talked about new books are there (or out on loan, back soon). All the reference books that school kids are there, there is enough to get you started in any hobby.
      Google search results are like that. Those 880,000 'titles' are a bit like the Library of Congress boasting how big it is, it is just a number. All they have really got for you is a small selection that is good enough for 99% of people 99% of the time. Only new stuff by people with Page rank (books with publishers) get indexed now and put into the 'main collection'.
      Much like how public libraries do have book sales, Google do let a lot of the 880,000 results drop off.
      It's a ruse!
    - cruano 1757 days ago
      I heard that they also filter results by some undisclosed parameters, like they don't show you anything that hasn't been modified in the last ~10 years, no matter how hard you try
      [-]
      - dublin 1756 days ago
        Yeah, this is a real problem for research into older things that have no need to change. Google seems to think that information has a half-life. That's really only true in the social space. Truth is eternal.
  - ZeroBugBounce 1758 days ago
    Sure, but then whoever gets to populate the index chooses the winners and losers, because you could just stuff it with different versions of the content or links you wanted to win and the random ranking would should those more often, because they appear in the pool of possible results more often.
  - cortesoft 1758 days ago
    That would make it waaaay less useful to searchers and wayyy easier to game by stuffing results with thousands of your own results
  - onion2k 1757 days ago
    I suspect just randomizing the first 20 or so results would fix most problems. The real issue is people putting effort in to hitting the first page, so if you took the benefit out of doing that people would look for other ways to spend their energy.
  - z3t4 1758 days ago
    If you find nothing useful, just refresh for a new set. It would also help discovery.
    [-]
    - brokensegue 1758 days ago
      Sounds like a great ux
iblaine 1758 days ago
Yes, it was called Yahoo and it did a good job of cataloging the internet when hundreds of sites were added per week: https://web.archive.org/web/19961227005023/http://www2.yahoo...
I'm old enough to remember sorting sites by new to see what new URLs were being created, and getting to that bottom of that list within a few minutes. Google and search was a natural response to solving that problem as the number of sites added to the internet grew exponentially...meaning we need search.
[-]
- kickscondor 1758 days ago
  Directories are still useful - Archive of Our Own (https://archiveofourown.org/) is a large example for fan fiction, Wikipedia has a full directory (https://en.wikipedia.org/wiki/Category:Main_topic_classifica...), Reddit wikis perform this function, Awesome directories (https://github.com/sindresorhus/awesome) or personal directories like mine at href.cool.
  The Web is too big for a single large directory - but a network of small directories seems promising. (Supported by link-sharing sites like Pinboard and HN.)
  [-]
  - ninju 1758 days ago
    How about this
    https://en.wikipedia.org/wiki/List_of_lists_of_lists
    [-]
    - kickscondor 1758 days ago
      Yes! But, of course, for directories outside of Wikipedia. This is very interesting for its classification structure. It's so typical of Wikipedia that a 'master list of lists' (by my count, there are 589 list links on this page) contains lists such as "Lists of Melrose Place episodes" and "Lists of Middle-earth articles" alongside lists such as "lists of wars" or "lists of banks".
  - brokensegue 1758 days ago
    Ao3 isn't really a directory since they do the actual hosting
    [-]
    - kickscondor 1758 days ago
      Yes, thank you - I only mean in terms of organization.
- adrianmonk 1758 days ago
  I used Yahoo back in those days, and it literally proved the point that hand-cataloging the internet wasn't tractable, at least not the way Yahoo tried to do it. There was just too much volume.
  It was wonderful to have things so carefully organized, but it took months for them to add sites. Their backlog was enormous.
  Their failure to keep up is basically what pushed people to an automated approach, i.e. the search engine.
  [-]
  - bitwize 1758 days ago
    I found myself briefly wondering if it were possible to have a decentralized open source repository of curated sites that anyone could fork, add to, or modify. Then I remembered dmoz, which wasn't really decentralized -- and realized that "awesome lists" on GitHub may be a critical step in the direction I had envisioned.
    [-]
    - insulanus 1757 days ago
      I think this could work for small, specific areas of interest. For example, there are only so many people writing about, and interested in reading about, programming language design. Those small communities could stand ready with their community-curated index when an "outsider" wants to research something they know well.
- stakhanov 1758 days ago
  You don't have to go all the way back into Yahoo-era when it comes to manually curated directories: DMOZ was actively maintained until quite recently, but ultimately given up for what seems like good reasons.
  [-]
  - iblaine 1757 days ago
    This is true, and DMOZ was used heavily by Google's earlier search algorithms to rank sites within Google. Early moderators of DMOZ had god like powers to influence search results.
  - gerbilly 1758 days ago
    Earlier than that there was a list of ftp sites giving a summary of what was available on each.
- alangibson 1758 days ago
  I wonder if you could build a Yahoo/Google hybrid where you start with many trusted catalogs run by special interest groups then index only those sites for search. Doesn't fully solve the centralization problem, but interesting none the less.
ovi256 1758 days ago
Everyone has missed the most important aspect of search engines, from the point of view of their core function of information retrieval: they're the internet equivalent of a library index.
Either you find a way to make information findable in a library without an index (how?!?) or you find a novel way to make a neutral search engine - one that provides as much value as Google but whose costs are paid in a different way, so that it does not have Google's incentives.
[-]
- davemp 1758 days ago
  The problem is that current search engines are indexing what is essentially a stack of random books thrown together by anonymous library goers. Before being able to guide readers to books, librarians have to the following non-trivial tasks over the entire collection:
  - identify the book's theme
  - measure the quality of the information
  - determine authenticity / malicious content
  - remember the position of the book in the colossal stacks
  Then the librarian can start to refer people to books. This problem was actually present in libraries before the revolutionary Dewy Decimal System [1]. Libraries found that the disorganization caused too much reliance on librarians and made it hard to train replacements if anything happened.
  The Internet just solved the problem by building a better librarian rather than building a better library. Personally I welcome any attempts to build a more organized internet. I don't think the communal book pile approach is scaling very well.
  [1]: https://en.wikipedia.org/wiki/Dewey_Decimal_Classification
  [-]
  - jasode 1758 days ago
    >I welcome any attempts to build a more organized internet. I don't think the communal book pile approach is scaling very well.
    Let me know if I misunderstand your comment but to me, this has already been tried.
    Yahoo's founders originally tried to "organize" the internet like a good librarian. Yahoo in 1994 was originally called, "Jerry and David's Guide to the World Wide Web"[0] with hierarchical directories to curated links.
    However, Jerry & David noticed that Google's search results were more useful to web surfers and Yahoo was losing traffic. Therefore, in 2000 they licensed Google's search engine. Google's approach was more scaleable than Yahoo's.
    I often see several suggestions that the alternative to Google is curated directories but I can't tell if people are unaware of the early internet's history and don't know that such an idea was already tried and how it ultimately failed.
    [0] http://static3.businessinsider.com/image/57977a3188e4a714088...
    [-]
    - organsnyder 1758 days ago
      I remember trying to get one of my company's sites listed on Yahoo! back in the late 1990s. Despite us being an established company (founded in 1985) with a good domain name (cardgames.com) and a bunch of good, free content (rules for various card games, links to various places to play those games online, etc.), it took months.
      [-]
      - dsparkman 1758 days ago
        That was not a bad thing. It was curated. Most of the crap never made it in the directory precisely because humans made decisions about what got in. If you wanted in the directory faster, you could pay a fee to get to the front of the queue. The result is that Yahoo could hire people to process the queue and make money without ads.
        [-]
        organsnyder 1758 days ago
        Isn't paying money to jump to the front of the queue just another form of advertising?
      - Stronico 1758 days ago
        That was my experience as well. For old companies and new. Yahoo was just really slow.
    - davemp 1758 days ago
      > I often see several suggestions that the alternative to Google is curated directories but I can't tell if people are unaware of the early internet's history and don't know that such an idea was already tried and how it ultimately failed.
      ¿Por qué no los dos?
      1) The idea is that a more organized structure is easier for a librarian to index. Today, libraries still have librarians. The book pile just wouldn't take decades to build familiarity.
      2) Times change. New technology exists, people use the internet differently, and there's more at stake. Just because an approach didn't work before doesn't mean that it won't work now.
      There are real problems with an organizational approach, but I don't see why the idea isn't worth a revisit.
      [-]
      - toast0 1758 days ago
        There are plenty of these, wikipedia has a list [1].
        I think these efforts get bogged down in the huge amount of content out there, the impermanence of that content and also the difficulty in placing sites into ontologies.
        And at the end of the day, there's not a large enough value proposition to balance the immense effort.
        I think, if you were to do it today, you would want to work on / with the internet archive, so at least things that were categorized wouldn't change or disappear (as much)
        [1] https://en.m.wikipedia.org/wiki/List_of_web_directories
        [-]
        davemp 1758 days ago
        Obviously a naïve web directory isn't going to cut it.
        What would make the approach viable is if there were a nice way to automate and crowd source most/all of the effort. Maybe that means changing the idea of what makes a website. Maybe there could just be little grass roots reddit-esque communities that are indexed/verified (google already favors reddit/hn links). Who knows, but it's an interesting problem to kick around.
        [-]
        jasode 1758 days ago
        >What would make the approach viable is if there were a nice way to automate and crowd source most/all of the effort.
        But to me, crowdsourcing is also what Jerry & David did. The users submitted links to Yahoo. AltaVista also had a form for users to submit new links.
        Also, Wikipedia's list of links are also crowdsourced in the sense that many outside websurfers (not just staff editors) make suggested edits to the wiki pages. Looking at a "revision history" of a particular wiki page makes the crowdsourced edits more visible: https://en.wikipedia.org/w/index.php?title=List_of_web_direc...
        [-]
        davemp 1758 days ago
        Sometimes it just takes a small changes to make an idea work. Neural networks weren't viable until GPUs/backpropagation. Dismissive comments like this aren't very useful.
        [-]
        jasode 1758 days ago
        >Dismissive comments
        I wasn't being dismissive. I was trying to refine your crowdsourcing idea by explicitly surfacing what's been tried in the past.
        The thread's op asks: "Can we create a new internet where search engines are irrelevant?"
        If the current best answer for op is: "I propose crowdsourced curated directories is the alternative to Google/Bing -- but the implementation details is left as an exercise for the reader" ... that's fine that our conversation terminates there and we don't have to go around in circles. The point is I didn't know this thread's discussion ultimately terminates there until I ask more probing questions so people can try to expand on what their alternative proposal actually entails. I also don't know what baseline knowledge the person proposing the idea has. I.e. does person suggesting an idea have knowledge of internet's evolution and has that been taken into account?
        msla 1758 days ago
        > Maybe there could just be little grass roots reddit-esque communities that are indexed/verified
        Verified by who, exactly?
        I know, I know... "dismissive comment", but it's an important thing to think about: Who decides what goes in the library? It's an evergreen topic, even in real, physical libraries, as those tedious lists of "Banned And Challenged Books" attest. It seems every time a copy of Huckleberry Finn gets pulled from an elementary school library in Altoona everyone gets all upset, so can you imagine what would happen if the radfems got their hands on a big Web Directory and cleansed it of all positive mentions of trans people?
        [-]
        davemp 1758 days ago
        I imagine the communities would kind of serve as a public index in aggregate that have a barrier to entry / reputation. If one turns to crap just ignore it with whatever search tool you're using.
        It wouldn't be about policing, just organizing.
    - ehnto 1758 days ago
      Consider the sheer size of the internet now. Even if you could categorize and file that many websites accurately, how do you display that to the user in a way that's usable? It will probably look a lot like a search engine, no matter which way you frame it.
      The underlying goal: "Get a user the information they want when they don't know where it lives" isn't really going to be helped by a non-searchable directory of millions of sites.
  - PeterisP 1758 days ago
    The current search engines are also indexing books maliciously inserted in the library in a way to maximize their exposure e.g. a million "different" pamphlets advertising Bob's Bible Auto Repair Service inserted in the Bible category.
    A "better library" can't be permissionless and unfiltered; Dewey Decimal System relies on the metadata being truthful, and the internet is anything but.
    You can't rely on information provided by content creators; Manual curation is an option but doesn't scale (see the other answer re: early Yahoo and Google).
    [-]
    - davemp 1758 days ago
      Perhaps there exists a happy medium between: manual curation -- unfiltered
      PageRank is kind of a pseudo manual curation. The manual effort is just farmed out to the greater internet and analyzed.
  - zaphar 1758 days ago
    The really hard part of this to scale is the quality metric. Google was the first to really scale quality measurement by outsourcing it to the web content creators themselves.
    Any attempt to create a decentralized index will need to tackle the quality metric problem.
  - agumonkey 1758 days ago
    Also, there's an massive economic market on top on what is on the closest shelves. Libraries are less sensitive to these forces.
- basch 1758 days ago
  They are also a spam filter. It's not just an index of whats relevant, but removal of what maliciously appears to be relevant at first glance.
  [-]
  - izendejas 1758 days ago
    This. Everyone's missing the point of a search engine.
    We're talking about billions of pages and if not ranked (authority is a good hueristic), filtered (de-ranked), etc then good luck finding valuable information because everyone is gaming the systems to improve their ranking.
    I think this is part of the reason you get a lot of fake news on social media. It's a constant stream of information (a new dimension of time has been added to the ranking, basically) that needs to be ranked and with humans in the loop, there's no way to do this very easily without filtering for noise and outright malicious content.
    [-]
    - basch 1758 days ago
      i disagree that there isnt a way, just that nobodies tried a good one yet.
      take reddit for example. it should be very easy to establish a few voters who make "good" decisions, and then extrapolate their good decisions based on people with similar voting patterns. it would combine a million monkeys with typewriters with expert meritocracy. you want different sorting, sort by different experts until you get the results you want. it seems every platform is too busy fighting noise to focus on amplifying signal, or are focused on teaching machines to do the entire task, instead of using machines to multiply the efficiency of people with taste who can make a good judgement call with regard to whether something is novel or or pseudo-intellectual. Not to pick on them, but I would suspect an expert to be better at deranking aeon/brainpickings type clickbait than an eruditelike ai, if only because humans can still more easily determine if someone is making an actual worthwhile point, vs repeating a platitude, conventional wisdom, or something hollow.
      [-]
      - abathur 1758 days ago
        It should, but if anyone knows who these kingmakers are, it's still probably just a matter of time before they accrue enough power for it to be worth someone's time to at least try to track them down and manipulate their decisions (bribe, blackmail, sponsor, send free trials, target with marketing/propaganda campaigns, etc.)
        [-]
        basch 1758 days ago
        Who says it even has the same kingmakers every day? Slashdot solved that part of metamoderation two decades ago.
        A person might be an expert in cars but not horses. A car expert might be superseded . The seed data creators could be a fluid thing.
        [-]
        cthaeh 1758 days ago
        This is a technocracy. Noone wants this but Hacker News.
        [-]
        basch 1757 days ago
        Let's say you have a subreddit like /r/cooking. You think exposing a control in the user agent (browser, app, ui) that let's you sort recipe results by lay democracy, professional chefs, or restaurant critics taste is a technocracy?
        Are consumer reports and wirecutter less valuable than Walmarts best sellers? Is techmeme.com worse than Hackernews by virtue of being a small cabal of voters? Should I dismiss longform.org and aldaily as elitist because they aren't determining priority solely from the larger populations preferences. Is Facebooks news algorithm better because it uses my friends to suggest content?
        Is it a technocracy that metacritic and rotten tomatoes show both user and critic score? I'm proposing an additional algorithm that compares critic score with user score to find like voters and extrapolate how a critic would score a movie they have never seen. I think that would be useful without diminishing the other true scores. I would find it useful to be able to choose my own set of favorite letterboxd or redef voters and see results it predicts they would recommend, despite them never having actually voted on a movie or article. Instead of seeding a movie recommendation algorithm with my thoughts, I could input others already well documented opinions to speed up the process.
        This idea would work better if people voted without seeing each others votes until after they vote. It might be hard to extrapolate Roger Ebert's preferences if voters formed their opinions of movies based on his reviews. You'd end up with a false positive that mimics his past but poorly predicts his future.
    - luxuryballs 1758 days ago
      The reverse is a problem too, Google filtering things out based on their political leanings in an attempt to shape public opinion.
      [-]
      - Nasrudith 1758 days ago
        I haven't seen any examples which were anything other than runaway persecution complexes of those who found their world view was less popular than they believed - which were greeted with exasperation by testifying engineers who had to explain how absurdly unscaleable it would be to do it manually.
        [-]
        aslaan 1757 days ago
        https://gohmert.house.gov/news/documentsingle.aspx?DocumentI...
- IanSanders 1758 days ago
  I think heavy reliance on human language (and its ambiguity) is one of the main problems.
  Maybe personal whitelist/blacklist for domains and authors could improve things. Sort of "Web of trust" but done properly.
  Not completely without search engines, but for example, if every website was responsible for maintaining it's own index, we could effectively run our own search engines after initialising "base" trusted website lists. Let's say I'm new to this "new internet", I ask around what are some good websites for information I'm interested in. My friend tells me wikipedia is good for general information, webmd for health queries, stackoverflow for programming questions, and so on. I add wikipedia.org/searchindex, webdm.com/searchindex and stackoverflow.com/searchindex to my personal search engine instance, and every time I search something, these three are queried. This could be improved with local cache, synonyms, etc. As you carry on using it, you expand your "library". Of course it would increase workload of individual resources, but has potential to give feel of that web 1.0 once again.
  [-]
  - dsparkman 1758 days ago
    This was devised by Amazon in 2005. They called it OpenSearch (http://www.opensearch.org/) Basically it was a standard way to expose your own search engine on your site. It made it is to programmatically search a bunch of individual sites.
  - TheOtherHobbes 1758 days ago
    This would be ludicrously easy to game. Crowdsourcing would also be ludicrously easy to game.
    The problem isn't solvable without a good AI content scraper.
    The scraper/indexer either has to be centralised - an international resource run independently of countries, corporations, and paid interest groups - or it has be an impossible-to-game distributed resource.
    The former is hugely challenging politically, because the org would effectively have editorial control over online content, and there would be huge fights over neutrality and censorship.
    (This is more or less where are now with Google. Ironically, given the cognitive distortions built into corporate capitalism, users today are more likely to trust a giant corporation with an agenda than a not-for-profit trying to run independently and operate as objectively as possible.)
    Distributed content analysis and indexing - let's call it a kind of auto-DNS-for-content - is even harder, because you have to create an un-hackable un-gameable network protocol to handle it.
    If it isn't un-gameable it become a battle of cycles, with interests with access to more cycles being able to out-index those with fewer - which will be another way to editorialise and control the results.
    Short answer - yes, it's possible, but probably not with current technology, and certainly not with current politics.
    [-]
    - pharke 1758 days ago
      Just want to point out that you're on a site that successfully uses crowd sourcing combined with moderation to curate a list of websites, news, and articles that people find interesting and valuable. Why not a new internet built around communities like this where the users actively participate in finding, ranking, and moderating the content they consume? It's not a stretch to add a decent search index and categories to a news aggregator, most do it already. If these tools could be built into the structure of the web we'd be half way there.
      [-]
      - abathur 1758 days ago
        Edit: I had myself convinced that comments have a different ID space from submissions, but that obviously isn't true. I've partly rewritten to correct for an over-guess on how many new submissions there are each day.
        I agree with your general suggestion, but just want to highlight that scale issues still make me think whatever finds traction on HN is a bit of a crapshoot.
        It looks like there were over 10k posts (including comments) in the last day, and the list of submissions that spent time on the front page day yesterday has 84 posts. I don't how normal the last 2 days were, but by eyeball I'd guess around a quarter of the posts are comments on the day's front-page posts. This means there are probably a few thousand submissions that didn't get much if any traction.
        Any time I look at the "New" page, I still end up finding several items that sound interesting enough to open. I see more than 10 that I'm tempted to click on right now. The current new page stretches back about 40 minutes, and only 10 of the 30 have more than 1 point (and only 1 has more than 10). Only 2 of the links I was tempted to click on have more than 1 point.
        I suspect that there's vastly more interesting stuff posted to HN than its current dynamics are capable of identifying and signal-boosting. That's not bad, per se. It'd be an even worse time-sink if it were better at this task. But it does mean there are pitfalls using it as a model at an even larger scale and in other contexts.
    - IanSanders 1758 days ago
      User's search engine doesn't have to trust suggestions verbatim, it can always run its own heuristic on top of returned results. And the user could reduce the weight of especially uncooperative domains or blacklist them altogether.
- ehnto 1758 days ago
  So long as there is a mechanism for categorizing information and ranking the results, people will try to game the mechanism to get the top spot regardless of your own incentives.
  Despite their incentives to make money, Google have actually been trying for years to stop people from gaming the system. It's impressive how far they've been able to come, but their efforts are thrwarted at every turn thanks to the big budgets employed to get traffic to commercial websites.
- Nasrudith 1758 days ago
  The only assured way to have a "neutral" search engine is to run your own spiders and indexers which you understand completely.
  Neutral in that sense is only "not serving the agenda or judgement of another" at the obvious cost of labor and not just as a one off thing as the searched content often attempts to optimize for views. It isn't like a library of passive books to sort through but a Harry Potter wizard portrait gallery full of jealous media vying for attention.
  And pendantically it isn't true neutral - but serves your agenda to the best of your ability. A "true neutral" would serve all to the best of their ability.
  Besides neutrality in a search engine on a literal level is oxymoronic and self defeating - its whole function is to prioritize content in the first place.
- narag 1758 days ago
  A few years ago there was that blogs thing, with rss... all things that favoured federation, independent content generation, etc. Now it's all about platforms. I understand that "regular people" are more comfortable with Facebook but, other than that, why are blogs and forums less popular now?
  [-]
  - JaumeGreen 1758 days ago
    The problem with forums is that you end visiting 5~10 different forums, each with their own login, and some of them might be restricted at work (not that you should visit them often).
    So it's easier to have 2~4 aggregators in where all the information you desire resides, even if in each of them there are different forums.
    A unified entry point helps adoption.
    [-]
    - ajot 1758 days ago
      So, instead of platforms, other option would be a client software for different forums. Like Tapatalk. Is there anything like that but libre and/or desktop?
      [-]
      - sosborn 1758 days ago
        Reddit really did a good job of moving the masses away from site-specific forums.
      - asdff 1758 days ago
        rss
  - r3bl 1758 days ago
    I'd argue that forums and blogs require more effort.
    Read a cool blog post? Nobody around you will ever give a shit, because in order to do so, they'd have to read it too. Shared a photo from a vacation? It might start a conversation or two with people around you, while you receive dozens or hundreds of affirmations (in the form of likes).
    I don't like to use social networks, but that's what I fall back on when I have a few minutes to spare. I rarely look at my list of articles I've saved for later — who has time for that?
    [-]
    - asdff 1758 days ago
      >I don't like to use social networks, but that's what I fall back on when I have a few minutes to spare. I rarely look at my list of articles I've saved for later — who has time for that?
      Plenty of people. Ever push an article to a reader view service and see how long it takes to read? Most articles posted here on HN or the nyt front page can be read in 3-5 mins. Occasionally you'd get a 20 min slog.
      I used to use social media way more, and by far my biggest wastes of time on the platform were those spare minutes you get a dozen times a day. On the elevator, waiting for the bus, waiting on food, anytime I could sit still the phone went out and my head went down because that's what everyone around me was also doing while waiting on their coffee.
      Eventually I realized I was just idly scrolling and not retaining anything at all from those 30s-2m sessions on instagram. Just chomping visual popcorn. Now, anytime I have a spare 10 mins, I'll read an article or two from my reading list. Anytime I have less than a spare 10 mins, I'll twiddle my thumbs and keep the phone in the pocket.
      I used to be much more scatterbrained and had trouble winding down for the evening and getting good rest. Now, I feel like a monk.
  - arpa 1758 days ago
    the problem is multiple actually: a) most internet-connected devices these days favor content consumption vs content creation (blogs vs instagram),
    b) mainstream culture > closely-knit communities (facebook > forums)
    c) big-player takeovers (facebook for groups, google for search) over previously somewhat niche areas and, actually, internet infrastructure
    d) if you're not a big player, you don't exist... and back to c)
    [-]
    - asark 1758 days ago
      > a) most internet-connected devices these days favor content consumption vs content creation (blogs vs instagram),
      You chose Instagram as your example, to make the point that phones favor consumption over creation?
      [-]
      - arpa 1758 days ago
        Yes! Instagram has the appearance of a OC/creation platform, but, typically of such platforms (such as twitter/fb) the "content" is low-effort "convenient" opportunistic trivia, and the product consumed is likes, followers, etc.
- z3t4 1758 days ago
  A search engine is more like putting the books in a paper schredder and writing the book title on every piece, then ordering the pices by whatever words you can find on it, putting all pieces that has the word "hacker" on it in the same box. Where as the problem becomes how you sort the pieces. Want to find a book about "hacking"? This box has all the shreds that has the word "hacker" on it, you can find the book title on the back of the piece. Second problem becomes how relevant the word is to the book.
- greglindahl 1758 days ago
  The library index only indexes the information that fits on a card catalog card. That's extremely unlike a web search engine.
  If you'd like to see an experimental discovery interface for a library that goes deeper into book contents, check out https://books.archivelab.org/dateviz/ -- sorry, not very mobile friendly.
  Not surprisingly, this book thingie is a big centralized service, like a web search engine.
- arpa 1758 days ago
  maybe crowdsourcing would be a solution - something similar to "@home" project, only for web indexes/cache - maybe even leverage the browsers via plugin for web scraping. It already kind of works for getpocket.
- tracker1 1758 days ago
  I don't think it would be an issue if Google wasn't creating "special" rules for specific winners and losers (overall). Hell, I really wish they'd make it easy to individually exclude certain domains from results.
  The canonical example to me of something to exclude would be the expertsexchange site. After stack overflow, ee was more than useless, and even before it was just annoying. There are lots of sites with paywalls, and other obfuscations to content and imho these sites are the ones that should be dropped/low-ranked.
  But the fact that there's no autocomplete for "Hillary Clinton is|has" (though "Donald Trump is" is also filtered). Yes, it's been heavily gamed. It's also had active meddling. And their control over YouTube seems to be even worse, with disclosed documents/video that indicate they're willing to go so far as outright election manipulation. With all indications that Facebook, Pinterest and others are going the same route.
- ScottFree 1758 days ago
  > or you find a novel way to make a neutral search engine
  Just because nobody's said it in this thread yet: blockchain? I never bought into the whole bitcoin buzz, but using a blockchain as an internet index could be interesting.
  [-]
  - KirinDave 1758 days ago
    How would Merkle DAGs be relevant?
  - arpa 1758 days ago
    even better, have something like git for the web - effectively working as an archive.
    [-]
    - ScottFree 1758 days ago
      The problem with git is countering nefarious forces. The blockchain is better in that regard because the consensus algorithm can be used to verify that the listings are legitimate.
      [-]
      - arpa 1758 days ago
        content change signed by creators private key, otherwise merge is rejected?
        or, wiki approach...
        [-]
        fehrnstr 1758 days ago
        Just signing with a private key isn't a guarantor of anything other than that if you trust that the person with the key is who they say they are, then the actual content is from them. But that would require a massively large web of trust in itself: that all the private keys would be trusted. And if you only let in private keys that you explicitly trusted, then it's very likely you could end up with an echo chamber
        [-]
        arpa 1758 days ago
        good point, but we already have the PKI in place, and use it for SSL.
neoteo 1758 days ago
I think Apple's current approach, where all the smarts (Machine Learning, Differential Privacy, Secure Enclave, etc.) reside on your device, not in the cloud, is the most promising. As imagined in so much sci-fi (eg. the Hosaka in Neuromancer) you build a relationship with your device which gets to know you, your habits and, most importantly in regard to search, what you mean when you search for something and what results are most likely to be relevant to you. An on-device search agent could potentially be the best solution because this very personal and, crucially, private device will know much more about you than you are (or should be) willing to forfeit to the cloud providers whose business is, ultimately, to make money off your data.
[-]
- jasode 1758 days ago
  >, where all the smarts [...] reside on your device, not in the cloud, is the most promising. [...] An on-device search agent could potentially be the best solution [...]
  Maybe I misunderstand your proposal but to me, this is not technically possible. We can think of a modern search engine as a process that reduces a raw dataset of exabytes[0] into a comprehensible result of ~5000 bytes (i.e. ~5k being the 1st page of search result rendered as HTML.)
  Yes, one can take a version of the movies & tv data on IMDB.com and put it on the phone (e.g. like copying the old Microsoft Cinemania CDs to the smartphone storage and having a locally installed app search it) but that's not possible for a generalized dataset representing the gigantic internet.
  If you don't intend for the exabytes of the search index to be stored on your smartphone, what exactly is the "on-device search agent" doing? How is it iterating through the vast dataset over a slow cellular connection?
  [0] https://www.google.com/search?q="trillion"+web+pages+exabyte...
  [-]
  - ken 1758 days ago
    The smarts living on-device is not necessarily the same as the smarts executing on-device.
    We already have the means to execute arbitrary code (JS) or specific database queries (SQL) on remote hosts. It's not inconceivable, to me, that my device "knowing me" could consist of building up a local database of the types of things that I want to see, and when I ask it to do a new search, it can assemble a small program which it sends to a distributed system (which hosts the actual index), runs a sophisticated and customized query program there, securely and anonymously (I hope), and then sends back the results.
    Google's index isn't architected to be used that way, but I would love it if someone did build such a system.
    [-]
    - ativzzz 1758 days ago
      To some extent, doesn't Google already do this? Meaning that based on your location/Google account/other factors such as cookies or search history, it will tailor your results. For instance, searching the same query on different computers will result in different results.
      Though to your point, google probably ends up storing this information in the cloud
      [-]
      - bduerst 1758 days ago
        Also instant search results, which were common search terms that were cached at lower levels of the internet.
    - dymk 1758 days ago
      I think you're suggesting homomorphic encryption to execute the user's ranking model. Unfortunately, homomorphic encryption is pretty slow, and the types of operations you can do are limited. But it's viable if the data you're operating on is relatively small - e.g. just searching through (encrypted) personal messages or something.
      [-]
      - ken 1758 days ago
        I think you've got the right general idea, but I don't know that it has to be homomorphic encryption. After all, an index of the public web is not really secret, and the user doesn't have a private key for it.
        In the simplest case, you could make a search engine in the form of a big, public, regularly-updated database, and let users send in arbitrary queries (run in a sandbox/quota environment).
        That's essentially what we've got now, except the query parser is a proprietary black box that changes all the time. I don't see any inherent reason they couldn't expose a lower-level interface, and let browsers build queries. Why can't web browsers be responsible for converting a user's text (or voice) into a search engine query structure?
- packet_nerd 1758 days ago
  Or even an online search engine that was configurable where you could customize the search engine and assign custom weights to different aspects.
  I'd love to be able to configure rules like:
  +2 weight for clean HTML sites with minimal Javascript
  +5 weight for .edu sites
  -10 weight for documents longer than 2 pages
  -5 weight for wordy documents
  I'd also like to increase the weight for hits on a list of known high quality sites. Either a list I maintain myself, or one from an independent 3rd party.
  Once upon a time I tried to use Google's custom search engine builder with only hand curated high quality sites as my main search engine. It was to much trouble to be practical, but I think that could change with an actual tool.
- ntnlabs 1758 days ago
  I think this is not what was the original question. A device that knows You still needs indexing service to find data for You. IMHO.
- bogomipz 1758 days ago
  I remember hearing something about Differential Privacy from a WWDC keynote a few years back however I haven't heard much lately. Can you say how and where Apple is currently using Differential Privacy/
  [-]
  - esmi 1758 days ago
    https://www.apple.com/privacy/docs/Differential_Privacy_Over...
    Apple uses local differential privacy to help protect the privacy of user activity in a given time period, while still gaining insight that improves the intelligence and usability of such features as: • QuickType suggestions • Emoji suggestions • Lookup Hints • Safari Energy Draining Domains • Safari Autoplay Intent Detection (macOS High Sierra) • Safari Crashing Domains (iOS 11) • Health Type Usage (iOS 10.2)
    Found via Google...
alfanick 1758 days ago
I see a lot of good comments here, I got inspired to write this:
What if this new Internet instead of using URI based on ownership (domains that belong to someone), would rely on topic?
In examples:
netv2://speakers/reviews/BW netv2://news/anti-trump netv2://news/pro-trump netv2://computer/engineering/react/i-like-it netv2://computer/engineering/electron/i-dont-like-it
A publisher of webpage (same html/http) would push their content to these new domains (?) and people could easily access list of resources (pub/sub like). Advertisements are driving Internet nowadays, so to keep everyone happy, what if netv2 is neutral, but web browser are not (which is the case now anyway)? You can imagine that some browsers would prioritise some entries in given topic, some would be neutral, but harder to retrieve data that you want.
Second thought: Guess what, I'm reinventing NNTP :)
[-]
- decasteve 1758 days ago
  Inventing/extending a new NNTP is nice idea too.
  The Internet has become synonymous with the web/http protocol. The web alternatives to NNTP won instead of newer versions of Usenet. New versions of IRC, UUCP, S/FTP, SMTP, etc., instead of webifying everything would be nice. But those services are still there and fill an important niche for those not interested in seeing everything eternal septembered.
  [-]
  - bogomipz 1758 days ago
    I believe there is/was an extension to NNTP for full text search or at least a draft proposal no?
- alfanick 1758 days ago
  Another inspiration: DNS for searching.
  What if we implement DNS-like protocol for searching. Think of recursive DNS. Do you have "articles about pistachio coloured usb-c chargers"? Home router says nope, ISP says nope, Cloudflare says nope, let's scan A to Z. Eventually someone gives an answer. This of course can (must?) be cached, just like DNS. And just like DNS, it can be influenced by your not-so-neutral browser or ISP.
  [-]
  - quickthrower2 1758 days ago
    The proliferation of Black hat SEOs would render this useless.
- PeterisP 1758 days ago
  How would topic validity get enforced?
  For example, if a publisher has a particular pro-Trump article, they would likely want (for obvious financial reasons) to push it to both etv2://news/anti-trump and netv2://news/pro-trump . What would prevent them from doing that?
  Also, a publisher of "GET RICH QUICK NOW!!!" article would want to push it to both netv2://news/anti-trump and netv2://computer/engineering/electron/i-dont-like-it topics.
  You can't simply have topics, you can have communities like news/pro-trump that are willing to spend the labor required for moderation i.e. something like reddit. But not all content has such communities willing and able to do so well.
- swalsh 1758 days ago
  I like this idea of people dreaming about a new internet :D
  The idea of moving to a pub-sub like system is a good one. It makes a lot of sense for what the internet has become. It's more than simple document retreival today.
- leadingthenet 1758 days ago
  To me it seems that you’ve just recreated Reddit.
- WhompingWindows 1758 days ago
  You want to silo information and create built-in information echo chambers? That seems so bad for polarization.
  [-]
  - volkk 1758 days ago
    im starting to think echo chambers are just something that will forever be prevalent and its up to the users to try to view alternate viewpoints
- bouk 1758 days ago
  If netv2 is neutral, I would just stuff all of the topics with my own content millions of time, so everyone can only see my content
- dymk 1758 days ago
  Who maintains, audits, and does validation for content submitted to these global lists of topics?
codeulike 1758 days ago
That was what the early internet was like (I was there). People built indexes by hand, lists of pages on certain topics. There was the Gopher protocol that was supposed to help with finding things. But this was all top-down stuff, the first indexing/crawling search engines were bottom-up and it worked so much better. And for a while we had an ecosystem of different search engines until Google came along, was genuinely miles better than everything else, and wiped everything else out. Really, search isn't the problem, its the way that search has become tied to advertising and tracking thats the problem. But then DuckDuckGo is there if you want to avoid all that.
[-]
- m-i-l 1758 days ago
  In the very early days, you didn't need a search engine because there weren't that many web sites and you knew most of the main ones anyway (or later on had them in your own hotlists in Mosaic). Nowadays you need a search because there is so much content.
  The problem is that the amount of content and the size of the potential user base are so large that is is impossible to offer search as a free service, i.e. it has to be funded in some way. Perhaps instead of having a free advertising-driven search, there would be space for a subscription-based model? Subscription based (and advert free) models seem to be working in other areas, e.g. TV/films and music.
  Another problem though is that more and more content seems to be becoming unsearchable, e.g. behind walled gardens or inside apps.
  [-]
  - vpEfljFL 1758 days ago
    Exactly my thought. But it definitely wouldn't get mass adoption which is good because mass-market content websites are questionable in terms of user experience (they also need to cover content creating costs by popups/ads/pushes). One thing, though, ad based search engines lift ad based websites because they can sell ad on a second end.
    Maybe we'll see advent of specialised paid search engines SaaSs with authentic and independent content authors like professional blogs.
- supernovae 1758 days ago
  Search is the problem. If you don’t rank in google you don’t exist on the internet. There is an entire economy built on manipulating search that is pay to play in addition to google continually focusing on paid search of natural SERPs. Controlling search right now is controlling the internet.
  [-]
  - bduerst 1758 days ago
    >If you don’t rank in google you don’t exist on the internet.
    Maybe in 2009. Today there are businesses today that exist solely on Instagram, Facebook, Amazon, etc.
  - codeulike 1758 days ago
    Whatever you replace Search with would be gamed in the same way.
    [-]
    - supernovae 1758 days ago
      true, but when it was lycos, hotbot, altavista, google, webcrawler, aol, gopher, archy, usenet and so many other sources it was much easier to exist in many ways (harder to dominate) - people used to ‘surf the web’, join “webrings” and share stuff.. now they consume and post memes. so i blame behavior as much as monopoly
      [-]
      - codeulike 1758 days ago
        A lot of other things have changed since then, so the difference in tone you are noticing might not have much to do with search engines. In 1996 there were only about 16 million people on the internet, and usage obviously skewed towards the more technical nerdy crowd. Now there are 4,383 million people on the internet. Which is about 57% of everyone.
        [-]
        Sohcahtoa82 1757 days ago
        I see this a lot on HN. People forget that a lot of things in the early days of the Internet only worked because there were so few people on the Internet.
        If you were rich and had a T1 in your home in the days everyone was on dialup, sure you could host a website yourself. But these days, even if you're one of the lucky residents on a gigabit symmetrical connection, there's a limit to how much you can serve. Self-hosting isn't an option unless your website is a niche.
        supernovae 1757 days ago
        More people and fewer companies dominating how everything is found... i don't think that change is for the better.
  - Fjolsvith 1758 days ago
    If your target audience isn't on Google, then you don't have to rank there.
    Almost all of my customers find me through classified advertising websites. Organic and paid search visitors to my site tend to be window shoppers.
davidy123 1758 days ago
I think in one sense the answer is it always depends who or what you are asking for your answers.
The early Web wrestled with this, early on it was going to be directories and meta keywords. But that quickly broke down (information isn't hierarchical, meta keywords can be gamed). Google rose up because they use a sort of reputation system based index. In between that, there was a company called RealNames, that tried to replace domains and search with their authoritative naming of things, but that is obviously too centralized.
But back to Google, they now promote using schema.org descriptions of pages, over page text, as do other major search engines. This has tremendous implications for precise content definition (a page that is "not about fish" won't show up in a search result for fish). Google layers it with their reputation system, but these schemas are an important, open feature available to anyone to more accurately map the web. Schema.org is based on Linked Data, its principle being each piece of data can be precisely "followed." Each schema definition is crafted by participation from industry and interest groups to generally reflect its domain. This open world model is much more suitable to the Web, compared to the closed world of a particular database (but, some companies, like Amazon and Facebook, don't adhere to it since apparently they would rather their worlds have control; witness Facebook's open graph degeneration to something that is purely self-serving).
_nalply 1758 days ago
The deeper problem is advertising. It is sort of a prisoner's dilemma: all commercial entities have a shouting contest to attract customer attention. It's expensive for everybody.
If we could kill advertisement permanently, we can have an internet as described in the question. This will almost be like an emergent feature of the internet.
[-]
- worldsayshi 1758 days ago
  We could supercharge word of mouth. I've been thinking about an alternative upvote model where content is ranked not primarily based on aggregate voting but by:
  - ranking content that users you have upvoted higher
  - ranking content that users with similar upvote behaviour higher
  While there is a risk of upvote bubbles, it should potentially make it easier for niche content to spread to interested people and make it possible for products and services to spread using peer trust rather than cold shouting.
  [-]
  - thekyle 1758 days ago
    > ranking content that users with similar upvote behaviour higher
    This is what Reddit originally tried to do before they pivoted.
    https://www.reddit.com/r/self/comments/11fiab/are_memes_maki...
    [-]
    - worldsayshi 1758 days ago
      Oh, interesting!
      Makes me think that their original plan could still work if they just put a bit more effort into crafting that algorithm.
      For example, the main criticism brought up is that things that you dislike that your peers like keep getting recommended. Why not add a de-ranking aspect into it and try adding downvote-peers in addition to upvote peers.
      I imagine you could create this interesting query language that could answer questions like: what things do you like if you like X and Y but not Z? (I kind of remember that something akin to this have been hacked together using subreddit overlap.)
  - endymi0n 1758 days ago
    As long as there are big companies making money off their products, you can be sure they'll find a way to advertise them to you.
    [-]
  - eterps 1758 days ago
    I've had similar ideas recently. Especially niche content (or shared research) would probably be notoriously hard (WRT false positives) for machine learning to decide whether it is relevant to you, people with similar interests know that much better.
    I was also wondering what would be good options to store votes/upvotes in a decentralized way.
    [-]
    - worldsayshi 1758 days ago
      > people with similar interests know that much better
      Yeah, I wonder if there is a cheap way to test this. Actually! There could be! Like using favorite's here on hacker news. That could be mined and visualized in various ways. (Although a quick sample shows me that it's a rarely used feature)
      > I was also wondering what would be good options to store votes/upvotes in a decentralized way.
      Yeah there are a lot of interesting optimization challenges if you really want to utilize upvote graphs for ranking.
  - scrollaway 1758 days ago
    Not to echo a R&M quote on purpose but that just sounds like targeted advertising with extra steps.
  - fifnir 1758 days ago
    > ranking content that users with similar upvote behaviour higher
    That's how you make echochambers
    [-]
    - worldsayshi 1758 days ago
      All social media have echo chamber characteristics. You have to counteract it with transparency and opt-in/out.
  - loxs 1758 days ago
    So, basically Facebook?
  - Fjolsvith 1758 days ago
    This sounds so much like Facebook.
    [-]
    - worldsayshi 1758 days ago
      Any "social" ranking algorithm is going to sound at least superficially similar to what's already out there.
- vfinn 1758 days ago
  Maybe if IPFS (~web 3.0) succeeds in the future, you could solve the advertising problem by inventing a meta network, where all the sites involved would agree to follow certain standardized criteria of site purity. You'd tag the nodes (or sites), and then have an option to search only sites from the pure network. Just a thought. edit: Maybe this would lead to a growing interest in the site purity, and as the network's popularity would grow, you could monetize the difference to its advance.
  [-]
  - ativzzz 1758 days ago
    Be careful what you wish for, as you might get AMP or some propriety Facebook format as a standard instead.
    [-]
    - vfinn 1757 days ago
      Well, I was thinking we could have endless number of (meta) networks / network configurations / standards. I mean each node could have as many tags as needed, e.g. #safe_for_children_v1.1 #pure_web_v2.0. Then you could configure your search engine / browser according to these tags. You could also stack tags to simplify things, e.g. pure_stack would include both #safe_for_children and #pure_web, etc. Maybe I'm missing something, but it seems doable.
- olegious 1758 days ago
  If we kill advertisement, you can say goodbye to the vast majority of content on the internet. The better approach is to make advertising a better experience and to create incentives for advertisers to spend ad dollars on quality content.
  [-]
  - rglullis 1758 days ago
    There will always be bottom-feeders as long as there is a market where people are not forced to choose with their wallets. Killing the "vast majority of content on the internet" seems like a good thing to me, honestly.
    [-]
    - Fjolsvith 1758 days ago
      > Killing the "vast majority of content on the internet" seems like a good thing to me, honestly.
      I sure hope my content of preference beats out yours for not getting killed.
      [-]
      - rglullis 1758 days ago
        I am reasonably sure that even if our preferences are complete opposite and we eliminate 99% of content in general, you would still have enough quality content for what your interests are. But just to be extra sure, please vote with your wallet and actively support the things you like and don't let advertisers do the choosing for you.
  - _nalply 1758 days ago
    Advertisement just should not be the central means of income of content producers. I really hope this point of view gets killed together with advertisement.
    [-]
    - pif 1758 days ago
      > Advertisement just should not be the central means of income of content producers.
      Can you propose any viable alternative?
      [-]
      - anchpop 1758 days ago
        Ads are placed via an automatic auction upon pageview. GM and Ford both want to show me an ad when I google "what car to buy", and have automatic systems that decide how much they'd be willing to pay to show me that ad based on my likelihood of purchase (income, sex, location, etc). Why not have a system that follows me around and outbids them using funds from my bank account, to show me an ad which is just a transparent image? That way I don't have to see ads but content creators still get what they need?
        [-]
        thekyle 1758 days ago
        What you are describing is exactly what Google Contributor is trying to do. We'll have to see how it turns out.
        https://contributor.google.com/v/beta
        [-]
        anchpop 1758 days ago
        It says it only works with "participating sites". I wonder why
        [-]
        gerash 1758 days ago
        The first version worked exactly as you proposed. The UX however was meh. You'd place a monthly limit on your ad (outbidding) spend (eg. $2) and it ended up outbidding only some of the ads: those served by Google which were also outbid by your amount.
        So from a user's perspective it didn't fully work. Also the ad space wasn't fully removed (perhaps due to technical reasons) but was replaced with a blank image. It also didn't catch on much.
        So they tried to pivot and now the program works with certain cooperating websites to fully get rid of all ads but I'm sure bigger websites would rather be in total control of monetizing themselves and can spend on the necessary IT infra. similar to most online newspapers these days.
        I think an advertiser (eg. a legal firm) might be willing to pay eg. $10 per ad impression but no user is willing to outbid it so I think the first model (outbid in the auction) is more sustainable and profitable for both parties but needs to have all ad exchanges on board.
        So in short, it's been tried but wasn't an instant (or even a slow) success and idk whether Google will continue investing in it or not.
        pif 1758 days ago
        Are you actually proposing for people to gasp! pay gasp! for content?
        [-]
        anchpop 1758 days ago
        Google makes around 30 billion/quarter on ads. Assuming most of that comes from 200 million users (they have more than that but I assume a lot are not worth very much to advertisers), and their ad revenue comes from a 50% cut of the total ad payments, that comes out to around $300/quarter or $75 a month. I'd pay it, but I think most wouldn't.
      - asdff 1758 days ago
        Certain % of your internet bill goes to helping pay to host the sites you are visiting every billing period. If a site is large enough hosting would be sustained by the visiting userbase rather than the site owner. If a site is too small for that, chances are hosting has been cheap anyway.
      - bluGill 1758 days ago
        Subscription. It is only viable for content that well off people use a lot of though, even then only when you are much better than the free competition.
      - arpa 1758 days ago
        whatever wikimedia organisation does :)
  - asark 1758 days ago
    1) Not to most of the best content, 2) other business models may have an actual chance when not competing with "free", 3) actually-free, community-driven sites and services (and standards and protocols—those used to be nice) will have a larger audience and larger creator interest when not competing with "free" (and well-bankrolled).
  - fifnir 1758 days ago
    The vast majority of content is absolute shit though, so speaking strictly for me, I'm willing to try
  - amelius 1758 days ago
    The question was about search engines, not about content.
    But I think the combination of advertising+search engines is particularly bad, so paying for search would be a great first step.
  - arpa 1758 days ago
    maybe it's worth saying goodbye to "8 reasons why current internet sucks that drive spammy copywriters mad". The whole more-clicks-more-revenue based approach did not do good things to the online content.
- marknadal 1758 days ago
  I wrote up a proposal on this, changing the economics to adapt to and account for post-scarce resources like information:
  https://hackernoon.com/wealth-a-new-era-of-economics-ce8acd7...
- wolco 1758 days ago
  To kill advertising would mean the web would live behind many walled gardens where each site requires membership.
  For the remaining free sites you will see advertising in different forms (self promotion blog, the upsell, t-shirt stores on everysite, spam-bait).
  Advertising saved the internet.
  Now tracking.. for advertising or other purposes is the real problem.
- BjoernKW 1758 days ago
  Other than a completely new approach for producing value such as the 'Freeism' one described in the article suggested in this comment https://news.ycombinator.com/item?id=20282851 (which I hadn't time to read yet and hence I'm neither in favour of or against) this simply boils down to the questions of who will pay for relevant content and what the business model will be.
  By and large, people don't seem to be willing to pay for content on the web. Hence, advertising became the dominant business model for content on the web.
  Find another way for someone to pay for relevant content and you can do away with advertising. It's as simple as that.
  [-]
  - TeMPOraL 1758 days ago
    > By and large, people don't seem to be willing to pay for content on the web. Hence, advertising became the dominant business model for content on the web.
    I don't think the causality is right here. People might not be willing to pay for content on the web because advertising enables competitors to offer content for free. If you removed that option, if people had no choice but to pay, it might just turn out that people would pay.
    [-]
    - BjoernKW 1758 days ago
      How would you achieve that? By outrightly outlawing advertising?
      There absolutely are paid options on the web. It's just that they don't seem to appeal to a sufficient number of buyers so advertising could become irrelevant.
      [-]
      - TeMPOraL 1758 days ago
        > How would you achieve that? By outrightly outlawing advertising?
        Yes.
        > There absolutely are paid options on the web. It's just that they don't seem to appeal to a sufficient number of buyers so advertising could become irrelevant.
        They aren't appealing in the presence of ad-subsidized free alternatives. Remove the latter, and they just might become appealing again.
        [-]
        notahacker 1758 days ago
        Few things sound less likely to improve the internet than some entity having the power to content-police the web and remove anything it accuses of the thoughtcrime of advertising...
        [-]
        politician 1758 days ago
        You can block third-party advertising structurally, so that a content-cop isn't required. First-party advertising cannot be blocked, of course, since that's just content.
        For example, using browsers that impose a Content Security Policy that prevents anything from being loaded from domains other than the origin.
        [-]
        notahacker 1758 days ago
        Sure, but if the only ad restriction was mandatory blocking of third party content, you'd just see ad agencies work out ways they can get the content they want to serve hosted locally (and lots of more interesting third party embedded content cease to exist due to it not having the same commercial rationale for workarounds...). If you start forcing companies not to promote third party products with anything that even looks like an ad, you'll just see a greater proportion of the free-to-access internet turn into paid-for reviews and influencer marketing. Not sure that'd be an improvement, and I'm pretty sure the next logical step of getting the content cops ruling which content looks too commercially-oriented for us proles to look at is even worse.
        You can block third party advertising structurally using uBlock without ruining the internet for everyone else.
        TeMPOraL 1757 days ago
        Advertising isn't a thoughtcrime, it's a cognitive/psychological assault.
        I think a combination of consumer protection laws, truth in advertising laws and data protection laws, all turned up to 11 (even GDPR), could achieve most of the desired outcome on the Internet without much problematic "content-policing". But I'm not sure. You won't eliminate advertising from the Internet entirely, but making it illegal would make undesirable advertising more expensive, by creating vast amount of risk for advertisers and simultaneously destroying the adtech industry, thus rendering most of the abusive practices that much less efficient.
        (Also, to be clear, I want all advertising gone. Not just on-line, the meatspace one too.)
    - Fjolsvith 1758 days ago
      Huh. That sounds like a free market model.
      Isn't this what different newspapers like NYT and WSJ are moving towards? Why can't both models coexist?
      [-]
      - TeMPOraL 1758 days ago
        Because one totally destroys the other.
        Slave labour, selling poison or dumping waste into rivers are all superior business models too, but that doesn't mean they should exist in a civilized society.
        [-]
        Nasrudith 1758 days ago
        The train also destoyed the horse drawn wagon train for bulk land transport.
        Just because it totally destroys another business model doesn't mean it is wrong. Felony interference with a business model protectionism isn't good for societies. Historically this stagnant "stability" gets them lapped and forced into the modern world if lucky or conquered if not no matter how vigorously they insist that it is the only and right way.
        [-]
        TeMPOraL 1757 days ago
        Of course. I'm not saying displacing business models is bad per se. I'm saying that just because one business model can displace a different one, doesn't immediately mean it's good. Plenty of business models are morally bankrupt, and I believe "free but subsidized by advertising" is such, by virtue of advertising itself[0] being morally bankrupt.
        --
        [0] - as seen today; not the imaginary "informing customers about what's on the market" form, but the real "everyone stuck in a shouting contest of trying to better manipulate customers" form.
  - Fjolsvith 1758 days ago
    > Find another way for someone to pay for relevant content and you can do away with advertising. It's as simple as that.
    Not so simple. What is relevant for me may be irrelevant for you.
    [-]
    - BjoernKW 1758 days ago
      You pay for content that's relevant to you. I pay for what's relevant to me.
      [-]
      - Fjolsvith 1758 days ago
        Oh, okay. I was assuming we had someone like the government pay for content.
- jppope 1757 days ago
  Promotion is a need, and a very important need for ideas to spread. We all know that the concept of "if you build it they will come" doesn't work". Google's adaptation for this was to make advertising relevant... which is actually a considerable improvement over historical media models...
  There's a saying in sales: "people hate to be sold, but they love to buy"... which is akin to what you are saying here. Advertising isn't the problem... the problem is that the reasons why people are promoting aren't novel enough... (rent seeking... which creates noise)
- bduerst 1758 days ago
  The only way to kill advertising is to have perfectly efficient markets.
  Until then, you're going to have demand for ferrying information between sellers and buyers, and vice versa, because of information asymmetry. You may disagree with some of the mediums currently used, finding them annoying, but advertising is always evolving to solve this problem, as is evident in the last three decades.
quelsolaar 1758 days ago
Yes, we need search engines, but they don't need to be monolithic. Imagine that indexing the text of your average web page takes up 10k. Then you get 100.000 pages per Gig. It means that you if you spend ~270USD on a consumer 10 tera drive you can index a billion webpages. Google no longer says how many pages they index, but its estimated to be with in one order of magnitude of that.
This means that in terms of hardware, you can build your own google, then you get to decide how it rates things and you don't have to worry about ads and SEO becomes much harder because there is no longer one target to SEO. Google obviously don't want you to do this (and in fairness google indexes a lot of stuff that isn't keywords form web pages), but it would be very possible to build an open source configurable search engine that anyone could install, run, and get good results out of.
(Example: The piratebay database, that arguably indexes the vast majority of avilable music / tv / film / software was / is small enough to be downloaded and cloned by users)
[-]
- rhmw2b 1758 days ago
  Google's paper on Percolator from 2010 says there are more than 1T web pages. 9 years later there is surely way more than that.
  https://ai.google/research/pubs/pub36726
  The real issue would be crawling and indexing all those pages. How long would it take for an average user's computer with a 10Mb internet connection to crawl the entire web? It's not as easy a problem as you make it seem.
  [-]
  - quelsolaar 1758 days ago
    I'm not saying its easy, its not, but people tend to think that because Google is so huge, you have to be that huge to do what Google does. My argument is that in terms of hardware google need expensive hardware because they have so many users, not because what they do requires that hardware to deliver the service for one or a few users.
    I have a gigabit link to my apartment (go Swedish infrastructure!). At that theoretic speed I get 450 gigs an hour, so I could download ten tera in a day. We can easily slow that down by an order of magnitude and its still a very viable thing to do. If someone wrote the software to do this, one could imagine some kind of federated solution for downloading the data, so that every user doesn't have to hit every web server.
  - z3t4 1758 days ago
    Could be done with a p2p "swarm". Peers get asigned pages to index then share the result.
- tudelo 1758 days ago
  How would you begin indexing everything?
  [-]
  - Jaruzel 1758 days ago
    This is good question. Crawling and storing the pages is the easy part... searching them with a sub 1 second response time is much harder. Which current DB platforms can handle the size of data that Google indexes?
theon144 1758 days ago
Almost definitely not.
Search engines are there to find and extract information in an unstructured trove of webpages - no other way to process this than with something akin to a search engine.
So either you've got unstructured web (the hint is in the name) and GoogleBingYandex or a somehow structured web.
The latter has been found to be not scalable or flexible enough to accomodate for unanticipated needs - and not for a lack of trying! This has been the default mode of web until Google came about. Turns out it's damn near impossible to construct a structure for information that won't become instantly obsolete.
[-]
- 0815test 1758 days ago
  > A structured web ... has been found to be not scalable or flexible enough to accomodate for unanticipated needs - and not for a lack of trying!
  Linked Open Data (the latest evolution of Semantic Web technologies) is actually working quite well at present - Wikidata now gets more edits per unit of time than Wikipedia does, and its data are commonly used by "personal assistant" AIs such as Amazon's Alexa. Of course, these can only cover parts of the web where commercial incentives, and the bad actors that sometimes pursue them, are not relevant.
swalsh 1758 days ago
I've had this idea floating in my head for a while, that one thing that might make the world better is some kind of distributed database, and a gravitation back to open protocols (though instead of RFC's... maybe we could maintain an open source library for the important bits) I was thinking the architecture of DNS is a good starting point. From there we can create public indexes of data. This includes searchable data, but also private data you want to share (which could be encrypted, and controlled by you (think PGP). I'd modify browsers so that I don't have to trust a 3rd party service)
Centralization happens because the company owns the data, which becomes aggregated under one roof. If you distribute the data it will remove the walled gardens, multiple competitors should be able to pop up. Whole ecosystems could be built to give us 100 googles.... or 100 facebooks, where YOU control your data, and they may never even see your data. And because we're moving back to a world of open protocols, they all work with each other.
These companies aren't going to be worth billions of dollars any more.... but the world would be better.
[-]
- davidy123 1758 days ago
  You've just more or less described Solid. https://solid.mit.edu/
  I think a lot of people dismiss Solid based on its deep origins in Semantic Web, or because it's a slow project, based on Web standards, intended to solve long term problems.
  But being part of the Web is a huge process, and with DIDs it maps just fine into decentralized worlds.
- m-i-l 1758 days ago
  Unless there's a significant change in human behaviour, convenience is always going to trump everything else including privacy - we have seen over and over again that people will happily hand over their personal data in return for a free service that is simple to use. So any solution where you control your own data is going to have to be as convenient as alternatives, otherwise there'll be an opening for a new centralised "we'll do all the hard work of owning your data for you" mega corp tech titan.
- arpa 1758 days ago
  i really like that idea.
alangibson 1758 days ago
The 2 core flaws of the Internet (more precisely the World Wide Web) are lack of native search and native payments. Cryptocurrencies have started to address the second issue, but no one that I know of is seriously working on the first.
Fast information retrieval requires an index. A better formulation of the question might be: how do we maintain a shared, distributed index that won't be destroyed by bad actors.
I wonder if the two might have parts of the solution in common. Maybe using proof of work to impose a cost on adding something to the index. Or maybe a proof of work problem that is actually maintaining the index or executing searches on it.
[-]
- asdff 1758 days ago
  Why does there need to be one central source of truth on the internet? It seems like it would be impossible to implement. Even if google worked like it did 15 years ago and you got decently relevant results to your search terms, that's still not even scraping the surface of the whole internet that is relevant to your search terms.
  It's an impossible problem to solve because we don't have good consistent metadata to draw on. Libraries work because they have good metadata to catalog their collections. Good metadata needs to be generated by hand, doing it automatically is bound to lead to errors and special cases that will pollute your search results.
  I say we abandon the idea of the ideal search engine, accept the fact that we will never be able to find every needle in every haystack, and defer to a decentralized assortment of thousands of topic-specific indexes of relevant information. Some of them will be shit, but that's fine, the internet has always been a refuge for conspiracy theorists and other zaney interests. The good stuff will shine through the mud, as it's always done.
lefstathiou 1758 days ago
My approach to answering this would entail:
1) Determining what percentage of search engine use is driven by the need for a short cut to information you know exists but dont feel like accessing the hard way
2) Information you are actually seeking.
My initial reaction is that making search engines irrelevant is a stretch. Here is why:
Regarding #1, the vast majority of my search activity involves information I know how and where to find but seek the path of least resistance to access. I can type in "the smith, flat iron nyc" and know I will get the hours, cross street and phone number for the Smith restaurant. Why would I not do this instead of visiting the yelp website, searching for the Smith, set my location in NYC, filtering results etc. Maybe I am not being open minded enough but I don't see how this can be replaced short of reading my mind and injecting that information into it. There needs to be a system to type a request and retrieve the result you're looking for. Another example, when I am looking for someone on LinkedIn, I always google the person instead of utilizing LinkedIn's god awful search. Never fails me.
2. In the minority of cases I am looking for something, I have found that Google's results have gotten worse and worse over the years. It will still be my primary port of call and I think this is the workflow that has potential disruption. Other than an Index, I dont know what better alternatives you could offer.
peteyPete 1758 days ago
You'd still want to be able to retrieve "useful" information which can't be tampered with easily which I think is the biggest issue.
You can't curate manually.. That just doesn't scale. You also can't let just anyone add to the index as they wish or any/every business will just flood the index with their products... There wouldn't be any difference between whitehat/blackhat marketing.
You also need to be able to discover new content when you seek it, based on relevancy and quality of content.
At the end of the day, people won't be storing the index of the net locally, and you also can't realistically query the entire net on demand. That would be an absolutely insane amount of wasted resources.
All comes back to some middleman taking on the responsibility (google,duckduckgo,etc).
Maybe the solution is an organization funded by all governments, completely transparent, where people who wish to can vote on decisions/direction. So non profit? Not driven by marketing?
But since when has government led with innovation and done so at a good pace? Money drives everything... And without a "useful" amount of marketing/ads etc, the whole web wouldn't be as it is.
So yes, you can.. But you won't have access to the same amount of data, as easily, will likely have a harder time finding relevant information (especially if its quite new) without having to parse through a lot of crap.
kyberias 1758 days ago
If we were to design a brand new DATABASE ENGINE for today's world, can we develop it such a way that:
1. Finding information is trivial
2. You don't need services indexing billions of rows to find any relevant document
[-]
- RhysU 1758 days ago
  How far can one get with content-addressable storage? It's not obvious to me how to emulate search results ranking (well, anyhow), but it could give you a list of documents satisfying some criteria according to the authors who stored them.
- dymk 1758 days ago
  Google throws billions of dollars at this problem, nothing about it is "trivial".
fghtr 1758 days ago
>In our current internet, we need a big brother like Google or Bing to effectively find any relevant information in exchange for sharing with them our search history, browsing habits etc.
The evil big brothers may not be necessary. We just need to expand alternative search engines like YaCy.
azangru 1758 days ago
I can't imagine how this is possible. Imagine I have a string of words (a quote from a book or an article, a fragment of an error message, etc), and I want to find the full text where it appears (or pages discussing it). How would you do that without a search engine?
[-]
- rmsaksida 1758 days ago
  I think OP's idea is that search services would be built into the Internet, and not provided by a third party. That is, when a website is published or updated, it is somehow instantly indexed and made available for search as a feature of the platform on which it was published.
  [-]
  - jedberg 1758 days ago
    But you still need a third party to rank the results. I don't just want any page about my error message, I want the best page.
    [-]
    - jonathanstrange 1758 days ago
      The page rank could be a transparent algorithm, which is regularly updated by a consortium like W3C.
      The question is whether this would work in an adversarial setting where every party tries to inflate their page rankings by any trick they can find.
      [-]
      - dageshi 1758 days ago
        Not a chance it would survive. Google has enough problems fighting SEO right now and they don't publish their algorithm and have incredibly deep pockets.
      - swalsh 1758 days ago
        Personally, I don't want "The page rank algorithm" I want 100 page rank algorithms made by 100 people. Transparency is important, but I think competition is more important.
    - zdxt 1758 days ago
      The platform could provide useful metadata, leaving the ranking up to the client..
  - Sohcahtoa82 1757 days ago
    > built into the Internet
    Uh...what? How do you define this?
  - gshdg 1758 days ago
    Indexed where?
- Kaiyou 1758 days ago
  You ask a question (possibly on Stack-Something) and don't get ridiculed for not using Google, since you live in a world where search engines don't exist.
  [-]
  - anc84 1758 days ago
    And how would you find out if that question has been answered before? That would only work if there was single unified centralised question site. And then we are pretty much back at Google's single search field.
    [-]
    - Kaiyou 1758 days ago
      You don't need to. It's not a problem to ask and answer the same question repeatedly. School never had a problem with that.
      [-]
      - candu 1758 days ago
        School also never had navigability of past questions / answers as its explicit objective.
        [-]
        Kaiyou 1757 days ago
        Neither would the replacement for search engines.
    - kpbird 1758 days ago
      question and answer site is available - http://quora.com/ but it has the same problem, index and centralized information controlled by one company.
  - bpye 1758 days ago
    And then 5 other users ask the same question because they have no search engine.
    I think this gets boring quick...
    [-]
    - boblebricoleur 1758 days ago
      We could have website-centered search engines. You ask the question on whatsthatquote.com and find out if someone has already asked it. If yes, you have your answer, if not someone answers and no one is annoyed. Stack overflow does that. You don't get ridiculed for asking a quesiton on so that has already been asked on another website you don't know of.
      I guess that would be the age of smaller communities centerd around a few websites only? Maybe, I don't know if we can consider google as enabling a real global community as of today. I pretty much browse around the same websites. Anything I want to find without a precise source of information in mind, I use google and stumble upon ads and ads and sometimes ads, but rarely an answer.
      I sometimes still search stuff manually browsing through websites indexes. Some things are difficult to find with keywords. Equations of which the name you forgot. Movies with a plot so generic billions of result would be associated with it on a search engine. That piece of music of which you could write the notes on a sheet but don't remember the title.
    - Kaiyou 1758 days ago
      Teachers teaching the same class every year can't use this as an excuse either.
      [-]
      - timlatim 1758 days ago
        The most informative answers I've encountered on StackOverflow are either a product of research (benchmarking, analyzing multiple sources) or very specific knowledge, sometimes written by the author of the framework/library in question. I'm not sure your analogy applies since these answers demand substantially more effort than the (usually) predictable and repetitive questions teachers face in class.
        [-]
        Kaiyou 1757 days ago
        At some point someone has to write it, like school textbooks. After that your just looking at distributing that knowledge. The replacement for search engines solves the latter problem.
- dredds 1758 days ago
  Have an ip addressing and redirecting system designed around a Tower of Babel lookup.
  https://libraryofbabel.info/search.cgi
lxn 1758 days ago
Most of the search engines now days have the advantage of being closed source (you don't know how their algorithm actually work). This makes the fight against unethical SEO practices easier.
With a distributed open search alternative the algorithm is more susceptible to exploits by malicious actors.
Having it manually curated is too much of a task for any organization. If you let user vote on the results... well, that can be exploited as well.
The information available on the internet is to big to make directories effective (like it was 20 years ago).
I still have hope this will get solved one day, but directories and open source distributed search engines are not the solution in my opinion unless there is a way to make them resistant to exploitation.
[-]
- pavas 1758 days ago
  I've been thinking that the only way to get around the bad-actor (or paid agent) problem when dealing with online networks is to have some sort of distributed trust mechanism.
  I feel like manually curated information is the way to go, you just have to find some way to filter out all the useless info and marketing/propaganda. You can't crowd source it because it opens up avenues for gaming the system.
  The only solution I can think of is some sort of transitive trust metric that's used to filter what's presented to you. If something gets by that shouldn't have (bad info/poor quality), you update the weights in the trust network that led to that action so they are less likely to give you that in the future. I never got around to working through the math on this, however.
  [-]
  - pbhjpbhj 1758 days ago
    Domain authority is a distributed 'trust' system?
    But you want 'manually curated' but not 'crowd sourced', which suggests you want an individual to or small group to find, record, and curate all pages (? or domains, or <articles>, or ...) across more than 60 Billion pages of content??
    There's something like 1000 FOSS CMSs - I would be surprised if there's a million domains with relevant info to sift through just for that small field.
    There's no way you're curating _all_ that without crowd sourcing.
    Of course you don't have to look at everything to curate, but how are you going to filter things ... use a search engine?
  - dorusr 1758 days ago
    That's very workable.Any agent should have a private key with which it signs it's pushes. Age of an agent and score of feedback for that agent determine its ranking.Though that still leaves gaming possible with the feedback. But heavy feeback like "this is malicious content" could be moderated. (So that people cant just report stuff they don't like).
    [-]
    - pavas 1758 days ago
      The reason I mentioned that the trust metric should be transitive and distributed is so that it prevents gaming as much as possible. You wouldn't want to have a trusted central authority (for everyone) because that could always be corrupted or gamed if it's profitable enough. Rather every individual would have a set of trusted peers with different "trust" weights for each based on the individual's perception of their trustworthiness, that could be changed over time.
      This trust (weighting) should be able to propagate as a (semi-)transitive property throughout the network to take advantage of your trusted peers' trusted peers. This trust weight propagation would need to converge, and when you are served content that has been labeled incorrectly ("high-value" or "trustworthy" or whatever metric, when you don't see it that way), then your trust weights (and perhaps your peers') would need to re-update in some sort of backpropagation.
      The hard part is keeping track of the trust-network in a way that is O(n^c) and having the transitive calculations also be O(n^c) at most. I'm quite sure there are ways of doing this (at least with reasonably good results) but I haven't been able to think through them.
      [-]
    - pbhjpbhj 1758 days ago
      >But heavy feeback like "this is malicious content" could be moderated. //
      You're just shifting around your trust problem. You need to handle 4chan level manipulation (million of users coordinating to manipulate polls), or Scientology depth (getting thousands of people in to USA government jobs in order to get recognised as a religion). If it's "we'll catch it in moderation" then whoever wants to manipulate it just gets a moderator ...
      "Super-moderation": will a dictatorship work here? I don't see how.
      "Meta-moderation": you're back to bad actors manipulating things with pure numbers.
      [-]
      - pavas 1758 days ago
        You can't get around the problem of manipulation if your trustworthiness metric for content will be the same for all people, as it is on reddit, hacker news, or Amazon for example. Having moderators just concentrates the issue into a smaller number of people and you haven't solved the central problem--manipulation is profitable.
        But think of how we solve this problem in our personal interactions with other people, and this should be a clue for how to solve it with computational help. We have a pretty good idea of which people are trustworthy (or capable, or dependable, or any other characteristic) in our daily lives, and based on our interactions with them we update these internal measures of trustworthiness. If we need to get information from someone we don't know, we form a judgement of their trustworthiness based off of input from people we trust--e.g. giving a reference. This is really just Bayesian inference at its core.
        We should be able to come up with a computational model for how this personal measure of trustworthiness works. It would act as a filter over content that we obtain. Throw a search engine on top of this, sure, but in the end you'd still need to get trustworthiness weights onto information if you want it to be manipulation-resistant. This labeling is what I mean by manual curation. You can't leave that up to the search engine or the aggregator because those can be gamed, like the examples you gave for aggregators and SEO for search engines have shown.
        [-]
        pbhjpbhj 1758 days ago
        >We have a pretty good idea of which people are trustworthy (or capable, or dependable, or any other characteristic) in our daily lives //
        We really don't. People get surprised all the time that someone had an affair, or cheated, or ripped someone off, or whatever. "But I trusted you" ...
        It's actually relatively easy to fool people in to trusting you, as many red team members will probably confirm.
        Look at someone like Boris Johnson, people are trusting him to lead the country knowing that he's well known to betray people's trust and that he even had a court case lodged against him based on his very blatant lying to the entire country. You can even watch the video of him being interviewed where the interviewers says (paraphrasing) "but we all know that's a half truth" and BoJo just pushes it and pushes it and refuses to accept that it's anything other than absolute truth.
        >If we need to get information from someone we don't know, we form a judgement of their trustworthiness based off of input from people we trust--e.g. giving a reference. //
        This is domain authority again - trust some domains manually, let it flow from there. If that domain trusts another domain then they link to it, trust flows to the other domain, and so on. Maintaining such trust for a long time adds to a particular domains trust factor, linking to domains not trusted by others detracts from it.
        [-]
        pavas 1758 days ago
        So how do _you_ make any sort of judgments based off of what people say? What information do you use to judge whether their statements are accurate? Or do you always start with the assumption that everything everyone says is suspect? What sort of information do you use to come to any sort of conclusion, and how do you determine the trustworthiness of that information?
        >This is domain authority again - trust some domains manually, let it flow from there. If that domain trusts another domain then they link to it, trust flows to the other domain, and so on. Maintaining such trust for a long time adds to a particular domains trust factor, linking to domains not trusted by others detracts from it.
        This can be gamed if you're able to update the trustworthiness of a domain for other people, and that's why a trust metric needs to be mostly personal, and should update dynamically based on your changing trust valuations.
        [-]
        pbhjpbhj 1758 days ago
        Pyrrhonism, you start on the assumption that no-one [else] even exists and go from there ... ;o)
        Seriously, I'm not so sure -- I try to trust first and then update that status as more information becomes available; but that's more of a religious position.
        I don't think it's necessarily instructive to look at my personal modes here. I guess my main point is that if you're going to say "well humans have cracked trust, we'll just model it on that" then I think you're shooting wide of the mark.
        PeterisP 1758 days ago
        Any trust needs some kind of root. The big problem is that you need to prevent a billion real users from being "outvoted" in that Bayesian inference by a billion fake agents (augmented by thousands of paid 'influencers') saying that spam is ham and vice versa, and ensuring that they all have good reputation.
- 0815test 1758 days ago
  > Having it manually curated is too much of a task for any organization.
  ODP/DMOZ worked quite well while it was around. I don't think it would work equally well nowadays as a centralized project, because bad actors are so much more common today than they were in the 1990s and early 2000s; and because the Internet is so astoundingly politicized these days that people will invariably try to shame you and "call you out" for even linking to stuff that they disagree with or object to in a political sense (and there was a lot of that stuff on ODP, obviously!). But federation could be used to get around both issues.
VvR-Ox 1758 days ago
This would be the internet that was used in Star Trek I think. The computers they use can just be asked about something and the whole system is searched for that - so the service to find things is inherent to the system itself. In our world things like that are done by entities who try to maximize their profits (like the Ferengi) without thinking too much about effectiveness or ethics.
This phenomena can be seen throughout many systems we built - e.g. use of internet, communication, access to electricity or water. We have to pay the profit-maximizing entities for all of this though it could be covered by global cooperatives who manage this stuff in a good way.
blue_devil 1758 days ago
I think "search engines" is misleading. These are relevance engines. And relevance sells - the higher the relevance, the better.
https://www.nytimes.com/2019/06/19/opinion/facebook-google-p...
Ultramanoid 1758 days ago
This is what we had in early internet days, directories of links. Early Yahoo was the perfect example of this. You jumped from one site to another, you asked other people, you discovered things by chance. You went straight to a source, instead of reading a post of a summary of a site that after 20 redirections loaded with advertising and tracking gets you to the intended and actually useful destination.
Most web sites then also had a healthy, sometimes surprising link section, that has all but disappeared these days.
[-]
- HNLurker2 1758 days ago
  >directories of links.
  This is what I did back in 2015 as a project to increase my SEO rank to my business. Basically spam directory (and create my own) just to increase my pagerank.
d-sc 1758 days ago
Indexing information is a political problem as much as a technical one. Ultimately there will always be people who will put more effort into getting their information known than others. These people would game whatever technical solution exists.
[-]
- albertgoeswoof 1758 days ago
  Thats true. What if you could artificially limit the amount of effort someone can put in to getting content out there? Or even make it known to the consumer how/why the content is ranked highly?
  [-]
  - d-sc 1758 days ago
    Most closed platforms do a subset of what you mention: I can only put so many posts on my Facebook before they stop making it to all my friends. If I pay more for higher ranking it’s labeled an advertisement.
    However, creating rules transitions the contention point to who makes the rules. If you think that my algorithm will rank my sources better than your sources, you may be less interested in my algorithm regardless of its technical merits.
  - tgvaughan 1758 days ago
    Or allow users to specify precisely how to weight the results? (Popular with linkers, best matching results first, etc.)
vbsteven 1758 days ago
I was recently thinking about an open search protocol with some federation elements in two parts. A frontend and an indexer. The idea is that anyone can run his own search frontend or use a community hosted one (like Matrix). And then each frontend has X amount of indexers configured.
Each indexer is responsible for a small part of the web and by adding indexers you can increase your personal search area. And there is some web of trust going on.
Entities like stackoverflow and Wikipedia and reddit could host their own domain specific indexers. Others could be crowdsourced with browser extensions or custom crawlers and maybe some people want to have their own indexer that they curate and want to share with the world.
It will never cover the utility and breadth of Google Search but with enough adoption this could be a nice first search engine. With DDG inspired bang commands in the frontend you could easily retry a search on Google.
With another set of colon commands you can limit a search to one specific indexer.
The big part I am unsure about in this setup is how a frontend would choose which indexers to use for a specific query. Obviously sending each query to each indexer will not scale very well.
dalbasal 1758 days ago
Just as a suggestion, this question might be rephrased as can we have an Internet that doesn't require search companies, or even just massive search monopolies.
I'm not sure what the answer is re:search. But, an easier example to chew on might've social media. It doesn't take a Facebook to make one. There are lots of different social networking sites (including this one) that are orders of magnitude smaller in terms of resources/people involved, even adjusting for size of the userbase.
It doesn't take a Facebook (company) to make Facebook (site). Facebook just turned out to be the prize they got for it. These things are just decided as races. FB got enough users early enough. But, if they went away tomorrow.. users will not lack for social network experiences. Where they get those experiences is basically determined by network effects, not the product itself.
For search, it doesn't take a Google either. DDG make a search engine, and they're way smaller. With search though, it does seem that being a Google helps. They have been "winning" convincingly even without network effects and moat that make FB win.
zzbzq 1758 days ago
https://medium.com/@Gramshackle/the-web-of-native-apps-ii-go...
Cliff's notes:
- Apps should run not in a browser, but in sandboxed App containers loaded from the network, somewhat between Mobile Apps and Flash/Silverlight. Mobile apps that you don't 'install' from a store, but navigate to freely like the web. Apps have full access to the OS-level APIs (for which there is a new cross-platform standard), but are containerized in chroot jail.
- An app privilege ("this wants to access your files") should be a prominent feature of the system, and ad networks would be required to built on top of this system to make trade-offs clear to the consumer.
- Search should be a functionality owned and operated by the ISPs for profit and should be a low-level internet feature seen as an extension of DNS.
- Google basically IS the web and would never allow such a system to grow. Some of their competitors have already tried to subvert the web by the way they approached mobile.
btbuildem 1758 days ago
You don't remember how it was before search engines, do you?
It was like a dark maze, and sometimes you'd find a piece of the map.
Search coming online was a watershed moment -- like, "before search" and "after search"
[-]
- bduerst 1758 days ago
  Yep.
  - You had your web rings, which would cycle from site to site based on a category, some pages having multiple rings.
  - You had your "communities", organizing sites by URL structure, where similar pages were grouped together like a strip mall or something (i.e. neighborhoods for geocities).
  - You had scammy services that would submit your pages to multiple search engines, at a cost, but would guarantee you would show up in results.
  - You had your aggregators, like dogpile, where you would sift through pages of results from different search engines, hoping to find something different.
  It wasn't a good time. If you think about the problem that search engines solve today - connecting people with information that they want - we're currently at a peak.
chriswwweb 1758 days ago
Sorry, but this was too tempting: https://imgur.com/a/6UcAOnF
But seriously, I'm not sure it is feasible, I wish the internet could auto-index itself and still be decentralized, where any type of content can be "discovered" as soon as it is connected to the "grid".
The advantage would be that users could search any content without filters, without AI tempering with the order based on some rules ... BUT on the other hand, people use search engines because their results are relevant (what ever that means these days), so having an internet that is searchable by default would probably never be a good UX and hence not replace existing search engines. It not just about the internet being searchable, it would have to solve all the problems search engines have solved in the last ten years too
mhandley 1758 days ago
We could always ask a different question: what would it take for everyone to have a copy of the index? Humans can only produce new text-based content at a linear rate. If storage continues to grow at an exponential rate, eventually it becomes relatively cheap to hold a copy of the index.
Of course those assumptions may not be valid. Content may grow faster than linear. Content may not all be produced by humans. Storage won't grow exponentially forever. But good content probably grows linearly at most, and maybe even slower if old good content is more accessible. Already it's feasible to hold all of the English wikipedia on a phone. Doing the same for Internet content is certainly going to remain non-trivial for a while yet. But sometimes you have to ask the dumb questions...
[-]
- candu 1758 days ago
  Or: what would it take for everyone to have part of a copy of the index, according to their capacity / needs / etc., and for them to be able to easily search the parts of the index they don't have?
  To pre-empt the "you've just described BitTorrent" comment - only in the vaguest sense; you'd need search functionality on the chunks themselves, and ideally you wouldn't have to copy (or even stream) your peers' search index chunks to search them.
  I guess there's a trust / security issue here around "search index poisoning"; to resolve that, you'd probably have to lean on SSL verification and all of its attendant infrastructure for now.
- thekyle 1758 days ago
  > Humans can only produce new text-based content at a linear rate.
  True but that assumes a fixed number of humans. In reality, the number of humans is also increasing exponentially.
- tjansen 1758 days ago
  You may have the storage to store it, but do you have the bandwidth to receive everything that's being produced?
  [-]
  - mhandley 1758 days ago
    There are 8 billion people. If half of them were awake, and 10 percent of that half are actually typing at 40 words/minute, that would be about 13Gbit/s. I couldn't receive that feed today at home, but my work could. A satellite feed could work today too. And I wasn't really talking about today, but 10-20 years from now. Storage will be a problem for a lot longer than network capacity.
    [-]
    - JoeAltmaier 1758 days ago
      Storage grows geometrically, while network capacity has grown less so. It's possible we will soon have more storage than humans can create content to fill. No problem; automated systems also fill storage!
- BerislavLopac 1758 days ago
  More and more content is decidedly not created by humans.
tooop 1758 days ago
The question should be how can we create a new internet where we don't need a centralized 3rd party search engine not a new internet where there is no search engine. You can't find anything if there is no search (engine) and you can't change that.
GistNoesis 1758 days ago
Yes, download data, create indices on your data yourself as you see fit, execute SQL queries.
If you don't have the resources to do so yourself, then you'll have to trust something, in order to share the burden.
If you trust money, then gather enough interested people to share the cost of construction of the index, at the end everyone who trust you can enjoy the benefits of the whole for himself, and you now are a search engine service provider :)
Alternatively if you can't get people to part with their money, you can get by needing only their computations, by building the index in a decentralized fashion. The distributed index can then be trusted at a small computation cost by anyone who believe that at least k% of the actors constructing it are honest.
For example if you trust your computation and if you trust that x% of actors are honest :
You gather 1000 actors and have each one compute the index of 1000th of the data, and publish their results.
Then you have each actor redo the computation on the data of another actor picked at random ; as many times as necessary.
An honest actor will report the disagreement between computations and then you will be able to tell who is the bad actor that you won't ever trust again by checking the computation yourself.
The probability that there is still a bad actor lying is (1-x)^(x*n) with n the number of times you have repeated the verification process. So it can be made as small as possible, even if x is small by increasing n. (There is no need to have a majority or super-majority here like in byzantine algorithms, because you are doing the verification yourself which is doable because 1000th of the data is small enough).
Actors don't have the incentive to lie because if they do so, it will be exposed provably as liars forever.
Economically with decreasing cost of computation (and therefore decreasing cost of index construction), public collections of indices are inevitable. It will be quite hard to game, because as soon as there is enough interest gathered a new index can be created to fix what was gamed.
cf141q5325 1758 days ago
There is an even deeper problem then surveillance, the results of search engines get more and more censored with more and more governments putting pressure on them to censor results according to their individual wishes.
wlesieutre 1758 days ago
Taking a step back to before search engines were the main driver for finding content online, who remembers webrings?
Is there a way to update that idea of websites deliberately recommending each other, but without having it be an upvote/like based popularity contest driven by an enormous anonymous mob? It needs to avoid both easy to manipulate crowd voting like reddit and the SEO spam attacks that PageRank has been targeted by.
Some way to say "I value recommendations by X person," or even give individual people weight in particular types of content and not others?
[-]
- jefftk 1758 days ago
  > who remembers webrings?
  I recently configured openring [1] and am liking it a lot. Example of one of my pages with it [2]
  [1] https://git.sr.ht/~sircmpwn/openring
  [2] https://www.jefftk.com/p/adventures-in-upstreaming scroll down to "Recent posts on blogs I like"
- myself248 1758 days ago
  I think this is the most promising approach.
  I value ratings by X person because they've never upvoted spam. I devalue ratings by Y person because they're a spam shill, and everyone associated with Y person because they're all a hive of spammers. And then https://xkcd.com/810/ .
  It would devolve into semi-isolated enclaves of interconnected inter-trusting users, but as you discover them, you could "trust their trust" and instantly include their enclave by reference. Which I think is a good thing -- you'd find a community that's all about some topic, and instantly benefit from their years of content gathering.
topmonk 1757 days ago
What we should have is an open, freely accessible meta-information database of things like, whether user X liked/disliked, what other page/sites linked to this site, what their admins ranked this site as, if they did, etc., etc.
Then we have individual engines that take this data and choose for the user what to display for that user only. So if the user is unhappy with what they are seeing, they simply plug in another engine.
Probably a block chain would be good to store such a thing.
jonathanstrange 1758 days ago
There is still YaCy [1]. I'm not sure whether it's this one or another distributed search engine I tried 10 years ago, but the results were not very convincing. I believe that's to some extent because of a lack of critical mass, if more people would use these engines, they could improve their rankings and indexing based on usage.
[1] https://yacy.net/en/index.html
_Nat_ 1758 days ago
> In our current internet, we need a big brother like Google or Bing to effectively find any relevant information in exchange for sharing with them our search history, browsing habits etc.
Seems like you could access Google/Bing/etc. (or DuckDuckGo, which'd probably be a better start here) through an anonymizing service.
But, no, going without search engines entirely doesn't make much sense.
I suspect that what you'd really want is more control over what your computer shares about you and how you interact with services that attempt to track you. For example, you'd probably like DuckDuckGo more than Google. And you'd probably like Firefox more than Chrome.
---
With respect to the future internet...
I suspect that our connection protocols will get more dynamic and sophisticated. Then you might have an AI-agent try to perform a low-profile search for you.
For example, say that you want to know something about a sensitive matter in real life. You can start asking around without telling everyone precisely what you're looking for, right?
Likewise, once we have some smarter autonomous assistants, we can ask them to perform a similar sort of search, where they might try to look around for something online on your behalf without directly telling online services precisely what you're after.
gesman 1758 days ago
I think there is a grain of a good idea here.
As i see it - new, "free search" internet would be a specially formatted content for each page published that will make it content easily searchable. Likely some tags within existing HTML content to comply with new "free search" standard.
Open source, distributed agents would receive notifications about new, properly formatted "free search" pages and then index such page into the public indexed DB.
Any publisher could release content and notify closest "free search" agent.
Then - just like a blockchain - anyone could download such indexed DB to do instant local searches.
There will be multiple variations of such DB - from small ones (<1TB) to satisfy small users giving just "titles" and "extracts" to large ones who need detailed search abilities (multi TB capacity).
"Free search", distributed agents will provide clutter-free interface to do detailed search for anyone.
I think this idea could easily be pickup up pretty much by everyone - everyone would be interested to submit their content to be easily searchable and escape any middlemen monopoly that is trying to control aspects of searching and indexing.
hokus 1758 days ago
https://tools.ietf.org/html/rfc1436
salawat 1758 days ago
The problem isn't search engines per se.
The problem is closed algorithms, SEO, and advertising/marketing.
Think about it for a minute. Imagine a search engine that generates the same results for everyone. Since it gives the same results for everyone, the burden of looking for exactly what you're looking for is put back exactly where it needs to be, on the user.
The problem though, is you'll still get networks of "sink pages" that are optimized to show up in every conceivable search, that don't have anything to do with what you're searching for, but are just landing pages for links/ads.
Personally, I liked a more Yellow Pageish net. After you got a knack for picking out the SEO link sinks, and artificially disclose them, you were fine. I prefer this to a search provider doing it for you because it teaches you, the user, how to retrieve information better. This meant you were no longer dependant on someone else slurping up info on your browsing habits to try to made a guess at what you were looking for.
tablethnuser 1758 days ago
One way to replace search is to return to curation by trusted parties. Rather than anyone putting a web page up and then a passive crawler finding it and telling everyone about it, (why should I trust any search engine crawler,) we could "load" our search engine with lists of websites. These lists are published and maintained by curators that we have explicitly chosen to trust. When we type into the search box it can only return results from sites present on our personal lists.
e.g. someone's list of installed lists might look like:
- New York Public Library reference list
- Good Housekeeping list of consumer goods
- YCombinator list of tech news
- California education system approved sources
- Joe Internet's surprisingly popular list of JavaScript news and resources
How do you find out about these lists and add them? Word of mouth and advertising the old fashioned way. Marketplaces created specifically to be "curators of curators". Premium payments for things like Amazing Black Friday Deals 2019 which, if you liked, you'll buy again in 2020 and tell your friends.
There are two points to this. First, new websites only enter your search graph when you make a trust decision about a curator - trust you can revoke or redistribute whenever you want. Second, your list-of-lists serves as an overview of your own biases. You can't read conspiracy theory websites without first trusting "Insane Jake's Real Truth the Govt Won't Tell You". Which is your call to make! But at least you made a call rather than some outrage optimizing algorithm making it for you.
I guess this would start as a browser plugin. If there's interest let's build it FOSS.
Edit: Or maybe it starts as a layer on top of an existing search engine. Are you hiring, DDG? :P
[-]
- joaobeno 1758 days ago
  Kind of like the Good Old Days™? Jokes aside, Search engines are for stuff we don't know... I get my news from some sites, this one for example, and I don't depend o Google for that... But when I want to know about the viability of zipping a UTF-8 encoded text to save space on my DB, there is no way to get may answer without a search engine...
  [-]
  - tablethnuser 1758 days ago
    If there's anything the modern internet has taught me it's that the Good Old Days were doing some things right! We don't know how to scale trust to internet-sized communities yet so a little tribalism may be warranted.
    The way this solution solves the I Don't Know What I Don't Know Problem is by making you curate your own list of experts. For your example query, a colleague may have told you about a popular list that thousands of DBAs subscribe and contribute to. So when you search that query it has the sites to crawl and find the material
- greglindahl 1758 days ago
  You're re-inventing the blekko search engine.
dpacmittal 1758 days ago
Why don't we get rid of tracking instead of getting rid of search engines. Why can't I just have my ad settings set by myself. I should be able to say, I'm interested in tech, fashion, watches, online backup solutions etc. Show me only these ads. It would get rid of all kinds of tracking.
Can anyone tell me why such an approach wouldn't work?
[-]
- nexuist 1758 days ago
  I could put "aviation" as one of my interests but I'm nowhere close to being able to afford a plane or any aviation related accessories unless they're R/C models (even then that's pushing it).
  Just because I see ads I'm interested doesn't mean I'll want to buy what they're selling. Whereas if a system that tracked me can deduce that I'm a private pilot, it can make an educated guess towards my income and adjust the type of items it shows me correspondingly. I doubt many people would be willing to provide this information (demographics, location, income) that advertisers care about most.
8bitsrule 1757 days ago
IME, searching by collections of keywords has become a good strategy. Avoiding using vague/topical keywords ('music', 'chemical'), instead asking for specific words that should/must be found in the search results. If the results start to exclude an important keyword (e.g. '1872' or 'giant' or 'legend'), put a plus sign in front of it and resubmit.
I regularly use DDG (which claims privacy) for this, and requests can be quite specific. E.g. a quotation "these words in this order" may result in -no result at all-, which is preferable to being second-guessed by the engine.
I wonder how 'search engines are not required' would work without expecting the searcher to acquire expertise in drilling down through topical categories, as attempts like 'http://www.odp.org/' did.
gexla 1758 days ago
Good question. I'm going to run an experiment.
First "go-to" for search will be my browser history.
As long as the site I know I'm looking for is in my browser history, then I'll go there and use the search feature to find other items from that site.
Bookmark all the advanced search pages I can find for sites I find myself searching regularly.
Resist mindless searching for crap content which usually just takes up time as my brain is decompressing from other tasks.
For search which is more valuable to me, try starting my search from communities such as Reddit, Twitter or following links from other points in my history.
Maybe if it's not worth going through the above steps, then it's not valuable enough to look up?
NOTE: Sites such as Twitter may not be much better than Google, but I can at least see who is pushing the link. I can determine if this person is someone I would trust for recommendations.
I bet if I did all of the above, I could put a massive dent in the number of search engine queries I do.
Any other suggestions?
[-]
- roveo 1758 days ago
  I think you're onto something.
  You could create a local search index built around your browser history. Then you could create a digital fingerprint-profile around it (still local). And then query other people's histories, that are similar to yours, in a DHT-address fashion.
- KirinDave 1758 days ago
  > but I can at least see who is pushing the link. I can determine if this person is someone I would trust for recommendations.
  This doesn't seem true at all to me. Twitter dramatically shapes and modifies timelines to promote whatever they want. They're even more aggressive on modifying the search experience.
  All of those constraints are invisible. It's dangerous to think you have more control or insight there.
  [-]
  - gexla 1758 days ago
    Twitter has a social graph and communities much like Reddit. This adds more information. There are people posting information who I trust and even know IRL.
    > All of those constraints are invisible. It's dangerous to think you have more control or insight there.
    And yet you are commenting as if these results aren't invisible to you? The machinery behind Google search results aren't invisible? Are you trying to say that one invisible thing is more "X" than another invisible thing?
    [-]
    - KirinDave 1758 days ago
      > And yet you are commenting as if these results aren't invisible to you? The machinery behind Google search results aren't invisible?
      Because of my unique and fortunate work history I understand the internals of these systems better than many people do. I'm objecting to the distinction you're drawing, not suggesting an alternative order of transparency. There really isn't much difference between the two companies output in the regard we're discussing.
      [-]
      - gexla 1758 days ago
        I agree that Twitter may not be much different from Google if you are relying on the algo. Notice that I mentioned Twitter as a community next to Reddit though. Everyone has different usage patterns for these services. I follow (and have been an active participant) in a number of niche communities on Twitter. In some cases, I could do a search for a term and most results would be from people I have interacted with through Twitter and other channels. Each of those people carried a reputation within that niche and some I knew better than others. I wouldn't use Twitter as a general search returning a large number of untrusted results. Sure, even search results on a specific user could be biased, but at least it would be from familiar territory.
ex3xu 1758 days ago
Like others here I don't have too much problem with indexing.
What I would like to see is a human layer of infrastructure on top of algorithmic search, one the leverages the fact that there are billions of people who could be helping others find what they need. That critical mass wasn't available at the beginning of the internet, but it certainly is now.
You kind of have attempts at this function in efforts like the Stack Exchange network, Yahoo Questions, Ask Reddit, tech forums etc. but I'd like to see more active empowerment and incentivization of giving humans the capacity to help other humans find what they need, in a way that would be free from commercial incentives. I envision stuff like maintaining absolutely impartial focus groups, and for commercial search it would be nice to see companies incentivized to provide better quality goods to game search rather than better SEO optimization.
ntnlabs 1758 days ago
How about this: Internet as a service. Instead of looking for answers You will "broadcast" Your needs. Like "I need a study about cancer". And You will receive a list of sources that answered Your question. maybe with some sort of decentralised rating and maybe Country and author. How about that?
[-]
- pjc50 1758 days ago
  Broadcasting your searches seems, if anything, even worse for privacy, and an invitation for just-in-time spam.
  [-]
  - swalsh 1758 days ago
    That's a workable issue. If a random and unique guid is asking for results, it would be hard to correlate users.
    Of course there would defiantely be an issue with how you generate the guid (for example if it was generated by the users MAC + some predictable random number generator that might be reversible). So you would keep that in mind. But these seem like workable issues.
desc 1758 days ago
As others have commented, the problem here is the ranking algorithm and how it can be gamed. Essentially, trust.
'Web of trust' has its flaws too: a sufficiently large number of malicious nodes cooperating can subvert the network.
However, maybe we can exploit locality in the graph? If the user has an easy way to indicate the quality of results, and we cluster the graph of relevance sources, the barrier to subverting the network can be raised significantly.
Let's say that each ranking server indicates 'neighbours' which it considers relatively trustworthy. When a user first performs a search their client will pick a small number of servers at random, and generate results based on them.
* If the results are good, those servers get a bit more weight in future. We can assume that the results are good if the user finds what they're looking for in the top 5 or so hits (varying depending on how specific their query is; this would need some extra smarts).
* If the results are poor (the user indicates such, or tries many pages with no luck) those servers get downweighted.
* If the results are actively malicious (indicated by the user) then this gets recorded too...
There would need to be some way of distributing the weightings based on what the servers supplied, too. If someone's shovelling high weightings at us for utter crap, they need to get the brunt of the downweighting/malice markers.
Servers would gain or lose weighting and malice based on their advertised neighbours too. Something like PageRank? The idea is to hammer the trusting server more than the trusted, to encourage some degree of self-policing.
Users could also chose to trust others' clients, and import their weighting graph (but with a multiplier).
Every search still includes random servers, to try to avoid getting stuck in an echo chamber. The overall server graph could be examined for clustering and a special effort made to avoid selecting more than X servers in a given cluster. This might help deal with malicious groups of servers, which would eventually get isolated. It would be necessary to compromise a lot of established servers in order to get enough connections.
Of course, then we have the question of who is going to run all these servers, how the search algorithm is going to shard efficiently and securely, etc etc.
Anyone up for a weekend project? >_>
gist 1758 days ago
This is to broad a question to answer. There are really to many different uses of the Internet to try and fashion a solution that works in all areas. Not to mention the fact that it's to academic to begin with. How do you get such a large group of people to change a behavior that works for them already? And very generally most people are not bothered by the privacy aspect as much as tech people (always whining about things) are or even the media. People very generally like they can get things at no cost and don't (en masse) care anywhere near as much about being tracked as you have been led to believe. And that's when tracking is not even benefiting them which it is often. This is not 'how can we eliminate robo calls'. It's not even 'how can we eliminate spam'.
Havoc 1758 days ago
Seems unlikely. Search engines solves a key problem.
To me they are conceptually not the problem. Nor is advertising
This new wave of track you everywhere with ai brand of search engines is an issue though. They’ve taken it too far essentially.
Instead of respectable fishing they’ve gone for kilometer long trawling nets that leave nothing in their wake
hideo 1758 days ago
This isn't an entire solution, but Van Jacobson's Content-Centric Networking concept is fascinating, especially when you consider its potential social impact compared to the way the internet exists today
https://www.cs.tufts.edu/comp/150IDS/final_papers/ccasey01.2... http://conferences.sigcomm.org/co-next/2009/papers/Jacobson....
munchausen42 1758 days ago
To get rid of search engines like Google and Bing we don't need to build a new internet - we just need to build new search engines.
E.g., how about an open source spider/crawler that anyone can run on their own machine continuously contributing towards a distributed index that can be queried in a p2p fashion. (Kind of like SETI@home but for stealing back the internet).
Just think about all the great things that researchers and data scientists could do if they had access to every single public Facebook/Twitter/Instagram post.
Okayokay ... also think about what Google and FB could do if they could access any data visible to anyone (but let's just ignore that for a moment ;)
[-]
- asdff 1758 days ago
  You know google has been crawling for years and probably already has accessed any public data
nonwifehaver3 1758 days ago
Yes, out of sheer necessity. Search results have become either a crapshoot when looking for commercially adjacent content due to SEO, or “gentrified” when looking for anything even remotely political, obscure, or controversial. Google used to feel like doing a text search of the internet, but it sometimes acts like an apathetic airport newsstand shopkeeper now (& with access to only the same books and magazines).
Due to this I think people will have to use site-specific searches, directories, friend recommendations, and personal knowledge-bases to discover and connect things instead of search engines.
cy6erlion 1757 days ago
I think there is only two options.
1) Have an index created by a centralized entity like google 2) Have the nodes in the network create the index
The first option is the easiest but can be biased on who gets to be on the index and their position on the index.
Option two is hard because we need a sort of mechanism to generate the index from the subjective view of the nodes in the network and sync this to everyone in the network.
The core problem here is not really the indexing but the structure of the internet, domains/websites are relatively dumb they can not see the network topology, indexing is basically trying to create this topology.
JD557 1758 days ago
You could use something like Gnutella[1], where you flood the network with your query request and that request is then passed along nodes.
Unfortunately (IIRC and IIUC how Gnutella works), malicious actors can easily break that query schema : just reply to all query requests with your malicious link. I believe this is how pretty much every query in old Gnutella clients returned a bunch of fake results that were simply `search_query + ".mp3"`.
1: https://en.wikipedia.org/wiki/Gnutella
quickthrower2 1758 days ago
Search engines are not required: there are directories out there with tonnes of links. It is just that search engines are damn convenient. And googles search is light years ahead of any websites own search.
[-]
- barrystaes 1758 days ago
  I turn to search engines mostly when entering a new knowledge domain! E.g. learn about a specific product. Sometimes when im just lazy.
  For routine stuff i tend to have established resource starting points, like documentation, official/community sites, blogs/news feeds, and yes: link directories (like awesome lists).
oever 1758 days ago
The EU has a funding call open for Search and Discovery on the Next Generation Internet.
https://nlnet.nl/discovery/
inputcoffee 1758 days ago
It was thought that one way of finding information is to ask your network (Facebook and Twitter would be examples), and then they would pass on the message and a chain of trusted sources would get the information back to you.
I am being purposefully vague because I don't think people know what an effective version of that would look like, but its worth exploring.
If you have some data you might ask questions like:
1. Can this network reveal obscure information?
2. When -- if ever -- is it more effective than indexing by words?
[-]
- dbspin 1758 days ago
  This seems significantly laborious. Not sure that the utility of this kind of network recommendation scales to incentivise participation beyond a few people. i.e.: We already have user groups on sites like reddit, FB etc, where experts or enthusiasts answer questions when they feel like it. But this is a slow process that relies on a group that contains enough distributed knowledge, but isn't overwhelmed with inquiries. As a counter example, the /r/BuildaPC subreddit long ago exceeded the size where it could answer a significant proportion of build questions, and most remain unanswered despite significant community engagement.
  Not convinced any kind of formalised 'question answering network' could replace search. It would be both slow, and require an enormous asymmetric investment of time, for a diffuse and unspecified reward.
  [-]
  - inputcoffee 1758 days ago
    I don't think it would be questions.
    Suppose you like fountain pens, and you recommend certain ones. One of your friend looks for fountain pens that their friends recommend and finds the ones you like.
    That is just one example of things that don't require explicit questions.
    Another one might be you have searched for books or other things and then they follow the same "path". So long as you have similar interests it might work.
    People haven't solved this issue, but there is a lot of research out there on networks of connections potentially replacing certain kinds of search.
ninju 1758 days ago
I find myself not need to do a 'generic' Internet search that much anymore
For long-term facts and knowledge lookup: Wikipedia pages (with proper annotation)
For real-time World happens: A mix of direct news websites
For random 'social' news: <-- the only time I direct direct Google/Bing/DDG search
The results from the search engines nowadays are so filled with (labeled) promoted results and (un-labeled) SEO results that I have become cynical and jaded to the value of the results
jka 1758 days ago
There'd be a feedback loop problem, but are DNS query logs a potential source of ranking/priority?
Over time the domains that users genuinely organically visit (potentially geo-localized based on client location) should rise in query volume.
Caveats would include DNS record cache times, lookups from robots/automated services, and no doubt a multitude of inconsistent client behavior oddities.
A similar approach could arguably be applied even at a network connection log level.
mahnouel 1758 days ago
Maybe I'm missing the point. But Instagram, Facebook, Twitter - all of them are not mainly experienced through search but through a feed of endless content, curated by an algorithm. Most regular users don't even search that often, they consume. Maybe there could be an decentralized Internet where you follow specific handles and then they bring their content into your main "Internet" aka feed (= user friendlier RSS).
z3t4 1758 days ago
An idea I've had for a long time is a .well-known/search standard (REST) endpoint. Where your browser, or a search aggregator combines results from many sites like stack overflow, MDN, news sites, i duvidual blogs, etc. That way search engines doesnt have to create a index. It would be up to the sites to create the search result. This means searching would be parallel and distributed.
epynonymous 1757 days ago
my ideal internet would be more like a set of concentric rings per user, a ring would represent different preferences, filters, and data, i could choose to include certain users access to parts of my rings, and i could access other parts of other user's rings. obviously there should be an open ring that every user can access which would need a search engine run by a company or set of companies, this would be like today's internet, but that would not be the same ring, i could switch between rings with ease. i think this maybe somewhat what tim berners lee is doing with the decentralized web, or perhaps bits of dark net interwoven with the internet.
an example use case would be like a set of apps that my family could use for photo sharing, messaging, sending data, links to websites, etc. perhaps another set of apps for my friends, another for my company, or school. the protocols would not require public infrastructure, dns, etc. perhaps tethering of devices would be enough. there would be a need for indexing and search, email, etc.
sktrdie 1758 days ago
I feel like Linked Data Fragments provides a solution to this: http://linkeddatafragments.org/
You're effectively crawling portions of the web based on your query, at runtime! It's a pretty neat technique. But you obviously have to trust the sources and the links to provide you with relevant data.
Johny4414 1758 days ago
What about Xanadu? Internet is very broken but almost no one seems to care (for a reason). Idea of more p2p web is there for a while but at the end of the day user don't care to much about anything so it probably never happen.
https://en.wikipedia.org/wiki/Project_Xanadu
[-]
- coldtea 1758 days ago
  Xanadu assumes good players. It will be decimated by the very first spammer / advertiser that appears...
  It's a vision for an academic, small scale, network, not for a viable global web.
CapitalistCartr 1758 days ago
I've said this before: I dearly miss Alta Vista. It indexed, but the user had to provide the ranking, which required actually thinking about what was wanted. I would construct searches of the pattern (word OR word) And (word NEAR word) with great success. Naturally Google, requiring far less thinking to use, won.
politician 1758 days ago
Lately, I've been turning over an idea that in order to advance, the next generation of the Internet should be designed so that third-party advertising is impossible to implement. I believe that as a consequence, this requirement will prevent crawler-based search engines from operating which presents a source discovery problem.
Discovering new sources of information in this kind of environment is difficult, and basically boils down to another instance of the classic key distribution problem - out-of-band, word-of-mouth, and QR codes.
Search engines like Google and Bing solve the source discovery problem by presenting themselves as a single source; aggregating every other source through a combination of widespread copyright infringement and an opaque ranking algorithm.
Google and Bing used to do a great job of source discovery, but the quality of their results have deteriorated under relentless assaults from SEO and Wall Street.
I think it's time for another version of the Internet where Google is not the way that you reach the Internet (Chrome) or find what you're looking for on the Internet (Search) or how you pay for your web presence (Adsense).
BerislavLopac 1758 days ago
We already have it, and it's called BitTorrent. DNS as well.
What you call Internet is actually World Wide Web, just another protocol (HTTP) on top of Internet (TCP/IP), which was designed to be decentralised but lacked any worthwhile discovery mechanism before two students designed the BackRub protocol.
wsy 1758 days ago
To everybody who wants to tackle this challenge: start by considering how you would protect your 'new internet' against SPAM and SEO attacks.
For example, if you build on a decentralized network, ask yourself how you can prevent SEO companies from adding a huge amount of nodes to promote certain sites.
rayrrr 1758 days ago
There's been a few mentions of the PageRank algorithm already...FWIW, Google's patent just expired. https://patents.google.com/patent/US6285999B1/en
qazpot 1758 days ago
See Ted Nelson's Xanadu Project - https://en.wikipedia.org/wiki/Project_Xanadu#Original_17_rul...
Point 4 allows a user to search and retrieve documents on the network.
hayksaakian 1758 days ago
If you look at usage patterns, social media has replaced search engines for many use cases.
For example, if you want to know where to eat tonight, instead of searching "restaurants near me" you might ask your friends "where should I eat tonight" and get personalized suggestions.
weliketocode 1758 days ago
Your two points really don’t fit with your follow-up explanation.
If you don’t believe finding information is currently trivial using Google, that’s going to be a tough nut to crack.
What would you use for information retrieval that doesn’t involve indexing or a search engine?
garypoc 1758 days ago
We would still need search engines, but we could change the business model. For example we could make a protocol to associate URL with content and search keywords. Something similar to DNS associated with distributed Elasticsearch servers
lowcosthostings 1756 days ago
The good one post which you have to share. https://www.lowcostwebhostings.com/dealstore/webhostingpad
fooker 1758 days ago
I'll be pessimistic here and say no, that is an impossible pipe dream. For any such system design you can come up with, a centralized big brother controller system will be more more efficient and have better user experience.
siliconc0w 1758 days ago
You could make a browser plugin that effectively turned everyone into a spider that sent new chunks of the index to some decentralized blockchain-esque storage system for all to query with its own blockchain-esque micro payments
tmaly 1757 days ago
I think once really good AI becomes a commodity and can fit in your phone AND
Once we have really fast 5?G networks, there is a good possibility that some type of distributed mesh type search solution could replace the big players.
Advaith 1758 days ago
I think this is the long game with respect to blockchains and establishing trust in general.
You will be able to trust data and sources instantly. There will be no intermediaries and trust will be bootstrapped into each system.
blackflame7000 1758 days ago
What if we just make a program that Googles a bunch of random stuff constantly so that there is so much garbage in their algorithms that they can't effectively figure out real vs synthetic searches.
nobodyandproud 1758 days ago
We need an alternative internet where anonymity between two parties is impossible.
Not a place for entertainment, but where government or business transactions can be safely conducted.
A search engine would be of secondary importance.
reshie 1758 days ago
i guess if we had a highly regulated and one site for one type of service it would be possible but i would not really want that. you could have a algorithm that would parse your query and send you directly to a site of course it could get it wrong where you may need to refine you query just like now sometimes. of course thats still a search engine but more direct. bookmarks are already a form of web without re-searching.
it sounds like what you really want is a decentralized search engine and anonymous by default as apposed to no search engine.
[-]
- orky56 1758 days ago
  AOL [1] among others in the early days did exactly that. Without using a search engine, you could access whatever type of content you wanted. Similar to a communism vs capitalism argument, you don't quite the same amount of variety but you trade that for instant access to what you need.
  [1][https://www.trbimg.com/img-5320a78f/turbine/orl-0312aol-1996...]
paparush 1758 days ago
We could go back to Gopher.
Papirola 1758 days ago
I still remember gopher https://en.wikipedia.org/wiki/Gopher_(protocol)
[-]
- Jaruzel 1758 days ago
  Some of us still use it...
  Shameless plug: http://www.jaruzel.com/gopher/gopher-client-browser-for-wind...
Isamu 1758 days ago
That was the original Internet. Search engines evolved to make finding things possible.
Another original intent: that URLs would not need to be user-visible, and you wouldn't need to type them in.
truckerbill 1758 days ago
We could try and revive and improve the web-ring concept. Or more simply, convince the community to dedicate a page of their site linking to other related/relevant sites.
[-]
- pbhjpbhj 1758 days ago
  Webrings are still there, they're just implicit. People link within their content to the same resources over-again, or have more explicit footer blocks or aside link stacks.
  Search engines use this structure for domain authority.
  A search for "link:example.com -site:example.com" would have found that webring in the past.
thedevindevops 1758 days ago
You want to create another https://en.wikipedia.org/wiki/Deep_web ?
ken 1758 days ago
Is this the same as asking if we can create a telephone system with no phone books, or a city with no maps? Where is our shared understanding of the system's state?
kazinator 1758 days ago
Can you walk through a complete use case?
A user wants to find a "relevant document".
What is that? What information does the user provide to specify the document?
Why does the user trust the result?
bitL 1758 days ago
How can I help? Dumped most of centralized solutions in favor of self-hosted (mostly ActivityPub-based) services and still can't get rid of search.
comboy 1758 days ago
I'm too late, but yes, it is not easy but it definitely seems doable: http://comboy.pl/wot.html
I'm sorry it's a bit long, TL;DR you need to be explicit about people you trust. Those people do the same an then thanks to the small world effect you can establish your trust to any entity that is already trusted by some people.
No global ranking is the key. How good some information is, is relative and depends on who do you trust (which is basically form of encoding your beliefs). And yes, you can avoid information bubble much better than now but writing more when I'm so late to the thread seems a bit pointless.
FPurchess 1758 days ago
I wonder if we could rearranged the internet as decentralised nodes exchanging topic maps which then can be queried in a p2p fashion.
[-]
- dredds 1758 days ago
  I've always imagined a kinda cross between Solid (ontology mapping) and Zeronet (seed hosted) with perhaps Dat for social (mutability) where crowd navigation determines the relations as a feedback. (original pagerank was a simplified version of such)
otabdeveloper4 1758 days ago
Yes, it's called "Facebook", and it already exists.
Probably not what you had in mind, though. Be careful what you wish for.
xorand 1758 days ago
Two-way links would help. I can't locate the information now but it seems that it was proposed initially.
robot 1758 days ago
It is a huge problem. It's not possible to fix it some other way without putting in the same effort.
buboard 1758 days ago
Didnt we? It is called "ask your friends". It s a great way to turn your friends into enemies.
ISNIT 1758 days ago
Maybe we should all just learn a graph query language and live on WikiData ;)
amelius 1758 days ago
Are any academic groups still researching search engines?
ptah 1758 days ago
I guess nowadays the web IS the internet
sys_64738 1758 days ago
Yes because nobody will be using it.
peterwwillis 1758 days ago
tl;dr the problems are 1) relevancy, 2) integrity, 3) content management/curation.
If you've ever tried to maintain a large corpus of documentation, you realize how incredibly difficult it is to find "information". Even if I know exactly what I want.... where is it? With a directory, if I've "been to" the content before, I can usually remember the path back there... assuming nothing has changed. (The Web changes all the time) Then if you have new content... where does it go in the index? What if it relates to multiple categories of content? An appendix by keyword would get big, fast. And with regular change, indexes become stale quickly.
OTOH, a search engine is often used for documentation. You index it regularly so it's up to date, and to search you put in your terms and it brings up pages. Problem is, it usually works poorly because it's a simple search engine without advanced heuristics or PageRank-like algorithms. So it's often a difficult slog to find documentation (in a large corups), because managing information is hard.
But if what you actually want is just a way to look up domains, you still need to either curate an index, or provide an "app store" of domains (basically a search engine for domain names and network services). You'd still need some curation to weed out spammers/phishers/porn, and it would be difficult to find the "most relevant" result without a PageRank-style ordering based on most linked-to hosts.
What we have today is probably the best technical solution. I think the problem is how it's funded, and who controls it.
fergie 1758 days ago
Author of the npm module search-index here.
"1- Finding information is trivial"
The web already consists, for the most part, of marked up text. If speed is not a contraint, then we can already search through the entire web on demand, however, given that we dont want to use 5 years on every search we carry out, what we really need is a SEARCH INDEX.
Given that we want to avoid Big Brother like entities such as Google, Microsoft and Amazon, and also given, although this is certainly debatable, that government should stay out of the business of search, what we need is a DECENTRALISED SEARCH INDEX
To do this you are going to need AT THE VERY LEAST a gigantic reverse index that contains every searchable token (word) on the web. That index should ideally include some kind of scoring so that the very best documents for, say, "banana" come at the top of the list for searches for "banana" (You also need a query pipeline and an indexing pipeline but for the sake of simplicity, lets leave that out for now).
In theory a search index is very shardable. You can easily host an index that is in fact made up of lots of little indexes, so a READABLE DECENTRALISED SEARCH INDEX is feasable with the caveat that relevancy would suffer since relevancy algorithms such as TD-IDF and Page Rank generally rely on an awareness of the whole index and not just an individual shard in order to calculate score.
Therefore a READABLE DECENTRALISED SEARCH INDEX WITH BAD RELEVANCY is certainly doable although it would have Lycos-grade performance circa 1999.
CHALLENGES:
1) Populating the search index with be problematic. Who does it, how they get incentivized/paid, and how they are kept honest is a pretty tricky question.
2) Indexing pipelines are very tricky and require a lot of work to do well. There is a whole industry built around feeding data into search indexes. That said, this is certainly an area that is improving all the time.
3) How the whole business of querying a distributed search index would actually work is an open question. You would need to query many shards, and then do a Map-Reduce operation that glues together the responses. It may be possible to do this on users devices somehow, but that would create a lot of network traffic.
4) All of the nice, fancy schmancy latest Google functionality unrelated to pure text lookup would not be available.
"2- You don't need services indexing billions of pages to find any relevant document"
You need to create some kind of index, but there is a tiny sliver of hope that this could be done in a decentralized way without the need for half a handful of giant corporations. Therefore many entities could be responsible for their own little piece of the index.
sonescarol 1758 days ago
l
RyanAF7 1757 days ago
What a lot of these comments are missing is the "new" part of the OP's question.
The "internet" is a term used to describe connected devices which use a common networking protocol.
The "web" is the domain/namesever www. set of text pages available for access on the internet.
The "Google-verse" is what has become of the web and the internet in which the accessible sites are those that play the game Google created.
So, to answer your question, without getting stuck in the weeds, yes. You can create a new "internet" without Google.
Will anyone want to use it? Well... that depends on how you create it.
The ability to leverage a new Web depends on how well the innovation incentivizes adaption and thus creates an exponential network effect.
Search <>, !=, =\=, .NE. the internet.
nairobi222 1757 days ago
That would bring to an even bigger filter bubble issue, more precisely to a techno élite which is capable, willing and knowledgeable enough to feel the need go through the hassle, and all the rest navigating in such an indexed mess that would pave the way to all sort of new gatekeepers, belonging to the aforementioned tech élite. It’s not a simple issue to tackle, perhaps a public scrutiny on the ranking algorithms would be a good first step.
Timunlimitedla4 1755 days ago
Raymond Harleysville is reliable yet fast when it comes to Fake IDs with authentic Social security numbers, Buying airline tickets for cheap Security Breaches and Revenge Hacks (for cheating partners),Security and Penetration testing, Credit Fix and to increase School Grades, Increase Social media followers. *Blank Atm. Contact him on raycreditrebuild@gmail.com or text 231 419-4109 for more inquiries.
wjmao88 1758 days ago
A similar, simplified form of this question is: How would you navigate a large code base without using an IDE? I think it would have a high requirement on the structure of the files and the architecture of the code, neither of which can be achieved with what we have on the internet today.
sonnyblarney 1758 days ago
G is apparently losing a lot of product related search to Amazon, I suggest that the 'siloing' of the web, for better or worse, might yield some progress here.
i.e. when you search, you start in a relevant domain instead of Google so Amazon for products, Stack Exchange for CS questions.
Obviously not ideal either.
diminoten 1758 days ago
No. Search is a consequence of data volume.
beezlebubba 1758 days ago
AOL?
wfbarks 1758 days ago
a New New Internet
bluecitron 1758 days ago
I think so.
bluecitron 1758 days ago
kkk
codegladiator 1758 days ago
No
nojobs 1758 days ago
Also we should keep hiding from big brothers to save our data from companies and government and pay for it. VPN I mean. But first you need to find a proper one, I waste enough time on it. https://vpn-review.com/found one here
drenvuk 1758 days ago
Finding information has never been trivial and until you can read people's minds to see what they really mean when they search for 'cookies' when they really mean "how to clear my internet browsing history for the past hour" it will continue to be non-trivial. The work Google has done in the search space is damn near magical. Your question belittles the literal billions of dollars and millions of man hours that have gone into making the current and previous implementations of Google's search engine almost good enough.
This is not simple, and your Ask HN reeks of ideology and contempt without so much as an inkling of the technical realities that would have to be overcome for such a thing to happen. That goes for both old and new internet.
/rant
[-]
- boblebricoleur 1758 days ago
  > Your question belittles the literal billions of dollars and millions of man hours that have gone into making the current and previous implementations of Google's search engine
  I don't think this question belittles Google's work.
  I feel saying that would be like saying that animals that chose to live on the land were belittling millions of years of evolution in the water.
  People working at Google chose to spends their time building a search engine for the world wide web, fine. That does not mean that sharing information accross a network has to be done via { world wide web, google }.
  All of this is purely theroical of course, but I'm sure someone more creative than me would find another solution. Maybe not a solution that would exactly fit OP's description, maybe not a solution that would be practical with the current infrastructure.
  But a solution that would render Google as-is obsolete ? Yes, I think that would be possible.
- cf141q5325 1758 days ago
  Tbh I have the feeling search results get a lot worse as a result of google trying to guess what I am looking for. The occurrences of me not finding stuff where I know I used the right search terms increased a lot recently. If I search for cookies I want the results for cookies and nothing else. If the first few results ignore part of my search query something is seriously wrong.