Four years ago, we hacked 3rd party commands into Siri without jailbreaking before Alexa Skills or SiriKit were released (https://www.wired.com/2014/04/googolplex/). It was the first App Store for voice commands. Since then, we’ve worked on NL interfaces at Google and Apple Siri. Now we're tackling the next problem: products using NLP are fairly simplistic in what they can do for users. For example, systems like Siri still struggle to directly answer a basic question like "When is the Y Combinator application due?" because it can't understand and reason where the answer may lie in a sentence on Y Combinator's website.
We’re approaching the problem differently by understanding the structure of language and relationships within text, instead of relying on more simplistic methods like keyword matching. We build a graph of entities and their relationships within a sentence along with other linguistic information. You can think of it as “Open Information Extraction” with a lot more information (https://www.plasticity.ai/api/demo).
Currently, we use a TensorFlow model to perform classical tasks like parts of speech, tokenization, and syntax dependency trees. We built our own Wikipedia crawler for data to better handle chunking and disambiguation, which helps return more accurate results for multi-word entities in sentences like: "The band played let it be by the beatles." We wrote our open IE algorithms from scratch, focusing on speed. It's written completely in C++ and we are adding more features everyday.
Our public APIs are in beta right now, we’re constantly working to improve the accuracy, and we’re looking forward to hearing feedback. We’d love to hear what the HN community is working on with NLP and how we can help!
Also: "What is the largest city in Europe?" -> "New York City".
"What is the largest city in the world?" -> "Gotham City"
So it seems to make KB lookup errors and probably can't do logic/set operations.
How old is the French Prime-Minister? How old is the Portuguese President?
President always defaults to Trump and Prime-Minister to May (May also responds with two different results even though it show the same text(/source?). Also in Sapien "Prime-Minister" wasn't recognized.
I'm very excited about technologies like this one.
Good catch on "Prime-Minister", we will patch that.
There are definitely some questions (e.g. earth age) that we aren't as good at right now, but we're improving those!
Fun to play with!
Winnie - The - Pooh
It feels like you've reinvented much by writing stuff from scratch. spaCy is fast, has tons of features, commonly updated, free, trained on the Common Crawl corpus. Why not just use that? I'm only curious, not critical.
Fair question, we think spaCy is great, but it just made a lot of sense for us to start on the basics so that we could modify things as needed. For example, our tokenization algorithm and syntax dependency tree algorithm treats "let it be" in "The band played let it be by the beatles." as a single chunk to return a more accurate syntax dependency tree, which Google Cloud NL and spaCy don't do out of the box today.
But there’s a dark side to being a Null, and you coders out there are way ahead of me on this. For those of you unwise in the ways of programming, the problem is that “null” is one of those famously “reserved” text strings in many programming languages. Making matters worse is that software programs frequently use “null” specifically to ensure that a data field is not empty, so it’s often rejected as input in a web form."
Maybe it's the use of the word `Null`? Not sure, but love what you're doing and thought I'd let you know about this.
I then tried "Who married the president?" and got the correct responses also.
The only thing I would change is at the bottom of the Plasticity demo you should have a big sign up button. And a link to your documentation.
Text simplification and summarization are great places this technology can be deployed for non-commercial usage. One example is https://newsela.com which provides articles on many different subjects at various reading levels for kids in school. For example, you can adjust the reading level on an article like this:
https://newsela.com/read/lib-convo-europe-invasion-dna/id/33...
Currently, this process is manual. But, our APIs could be used to help automate things like this in the near future. Quick reminder that our APIs are free for open-source or educational purposes. So, if anyone's interested in giving this a go for a hackathon project, you can e-mail me at ajay@plasticity.ai
But we'll offer both just in case.
Ignore the grammar error, you're helping government extract information from text? Where exactly? Do you mean the NSA? Do you mean helping the government look at public internet written commentary to track citizens?
We don't do anything like that, in fact, we don't work with the government at all right now. We know that there is a huge application of this technology in the government beyond the Department of Defense. For example, large corpuses of text data other government agencies might need to process like the Census Bureau, the IRS, etc.
https://en.m.wikipedia.org/wiki/Evi_(software)
We think being able to understand the semantic meaning behind language through our graph of relationships and entities in a sentence are going to be critical in building more robust conversational interfaces. So companies we are talking to now include companies who want to use it for natural language search or messaging apps.
Of course, we think the knowledge graph is useful as well in democratizing the technology since WolframAlpha is absurdly expensive ($25-50 CPM) and the Google KnowledgeGraph API is limited to 100,000 queries a day with no option to pay for more and doesn't handle natural language question answering.
It will be interesting to see how other people apply this without competing with Google, Amazon, or Apple directly.
We think being able to understand the semantic meaning behind language through our graph of relationships and entities in a sentence are going to be critical in building more robust conversational interfaces. So companies we are talking to now include companies who want to use it for natural language search or messaging apps.