I notice that Cynthia rudin is continuing to produce great stuff on explainable AI.
What else is going on that is not GPT/Diffusion/MultiModal?
I notice that Cynthia rudin is continuing to produce great stuff on explainable AI.
What else is going on that is not GPT/Diffusion/MultiModal?
23 comments
- 3d scene reconstruction from a few images: https://dust3r.europe.naverlabs.com/
- gaussian avatars: https://shenhanqian.github.io/gaussian-avatars
- relightable gaussian codec: https://shunsukesaito.github.io/rgca/
- track anything: https://co-tracker.github.io/ https://omnimotion.github.io/
- segment anything: https://github.com/facebookresearch/segment-anything
- good human pose estimate models: (Yolov8, Google's mediapipe models)
- realistic TTS: https://huggingface.co/coqui/XTTS-v2, bark TTS (hit or miss)
- open great STT (mostly whisper based)
- machine translation (ex: seamlessm4t from meta)
It's crazy to see how much is coming out of Meta's R&D alone.
They have the money...
The ones who walk away from Omelas
Dunno how pasting a link works but here it is:
https://shsdavisapes.pbworks.com/f/Omelas.pdf
What does a simplistic moral set piece about the abhorrence of sacrificing the good of one for the good of many have to do with (check notes) Facebook? Even as vague hand-wavey criticism, wouldn't Facebook would be the inverse?
Re: "actually you should just ponder why you are a simpleton who doesn't get it, given other people derived value from how it relates to Facebook": There arent people here running around praising it. The comment 4 up was, and still is downvoted well below 0, there's barely anyone reading all the way down here. Only one other person even bothered replying.
I don't think me mentioning this is useful or fair, but I don't know how to drive home how little contribution there is from a condescending "think harder, didn't you notice the crowd loves it and understands how it's just like Facebook"
Unless they think to hire new people.
I've played with Bark quite extensively a few month ago and I'm on the fence regarding that model: when it works, it's the best, but I found it to be pretty useless for most use-case I want to use TTS for because of the high rate of bad or weird output.
I'm pretty happy with XTTv2 though. It's reliable and output quality is still pretty good.
https://coqui.ai
In theory you can also animate such scenes but how to actually do that is still a research problem.
Whether this will end up being better than really well optimized polygon based systems like Nanite+photogrammetry is also an open question. The existing poly pipes are pretty damn good already.
My limited understanding is that Nerfs are compute-heavy because each cloud point is essentially a small neural network that can compute its value from a specific camera angle. Gaussian splats are interesting since they achieve almost the same effect using a much simpler mechanism of using gaussian values at each cloud points and can be efficiently computed in real-time on GPU.
While a Nerf could be used to render a novel view of a scene, it could not do so in real-time, while gaussian splats can which opens up lots of use-cases.
There's no point cloud in NeRFs. A NeRF scene is a continuous representation in a neural network, i.e. the scene is represented by neural network weights, but (unlike with 3D Gaussian Splatting) there's no explicit representation of any points. Nobody can tell you what any of the network weights represent, and there's no part of it that explicitly tells you "we have a point at location (x, y, z)". That's why 3D Gaussian Splatting is much easier to work with and create editing tools for.
nerfs: https://youtu.be/wKsoGiENBHU Gaussian platting: https://youtu.be/VkIJbpdTujE
[1]: https://www.matthewtancik.com/nerf
I think this is pretty much settled unless we encounter any fundamental new theory roadblocks on the path of scaling ML compute. Polygon based systems like Nanite took 40+ years to develop. With Moore's law finally out of the way and Huang's law replacing it for ML, hardware development is no longer the issue. Neural visual computing today is where polygons where in the 80s. I have no doubt that it will revolutionize the industry, if only because it is so much easier to work with for artists and designers in principle. As a near-term intermediate we will probably see a lot of polygon renderers with neural generated stuff inbetween, like DLSS or just artificially generated models/textures. But this stuff we have today is like the Wright brother's first flight compared to the moon landing. I think in 40 years we'll have comprehensive real time neural rendering engines. Possibly even rendering output directly to your visual cortex, if medical science can keep up.
http://proceedings.mlr.press/v139/hutchinson21a/hutchinson21...
The interface to sign up was very painless and straightforward
I signed up for a 2-week periodic digest
The first digest comes instantly and scanning through the titles alone was inspirational and I'm sure will provide me with more than a few great papers to read over upcoming years
There's quite a few advanced solutions already (predating LLM/ML)
https://www.youtube.com/@OlliHuttunen78
edit - I just realized you want a mesh :) for which Gaussian splatting is not there yet! BUT there are multiple papers which are exploring adding gaussians to a mesh thats progressively refined, I think its inevitable based on what's needed for editing and usecases just like yours.
You could start exploring and compiling footage and testing and maybe it will work out but ...
Here is a news site focused on the field -
https://radiancefields.com/
[1]: https://robotics-transformer2.github.io
Prompt guided labelling is also pretty cool, but still in infancy (eg you can tell the model "label all the shadows"). Seg GPT for example. But now we're right back to LLMs...
On labelling, there is still a dearth of high quality niche datasets ($$$). Everyone tests on MS-COCO and the same 5-6 segmentation datasets. Very few papers provide solid instructions for fine tuning on bespoke data.
The key insight (Jakob Uszkoreit) to using self-attention for language was that language is really more hierarchical than sequential, as indicated by linguist's tree diagrams for describing sentence structure. The leaves of one branch of a tree (or sub-tree) are independent of those in another sub-tree, allowing them to be processed in parallel (not in sequence). The idea of a multi-layer transformer is therefore to process this language hierarchy one level at a time, working from leaves on upwards through the layers of the transformer (processing smaller neighborhoods into increasingly larger neighborhoods).
My intuition says yes but what do I know.
Turns out that for their use case a small (weights fit in tens of KiB IIRC) multilayer perceptron works the best.
There is a lot of machine learning out in the world like that, but it doesn't grab the headlines.
"What is the 2024 Machine Learning Marathon (MLM24)?
This approximately 12-week summer event (exact dates TBA) is an opportunity for machine learning (ML) practitioners to learn and apply ML tools together and come up with innovative solutions to real-world datasets. There will be different challenges to select from — some suited for beginners and some suited for advanced practitioners. All participants, project advisors, and event organizers will gather on a weekly or biweekly basis to share tips with one another and present short demos/discussions (e.g., how to load and finetune a pretrained model, getting started with GitHub, how to select a model, etc.). Beyond the intrinsic rewards of skill enhancement and community building, the stakes are heightened by the prospect of a cash prize for the winning team."
More information here: https://datascience.wisc.edu/2024/03/19/crowdsourced-ml-for-...
We (humans) are following the last thing that worked (imagine if we could do true gradient decent on the algorithm space).
Good question, and I'm interested to hear the other responses.
They're mostly easy grant money and are being gamed by entire research groups worldwide to be seen as effective on the published papers. State of academia...
I suspect that this is because we've actually got a much more complex supervised training task than average (10k classes, multilabel), leading to much better supervised embeddings, and rather more intense needs for generalization (new species, new microphones, new geographic areas) than 'yet more humans on the internet.'
Here's some recent-ish work: https://www.nature.com/articles/s41598-023-49989-z
We also run a yearly kaggle competition on birdsong recognition, called birdclef. Should be launching this year's edition this week, in fact!
Here's this year's competition, which will be a dead link for now: https://www.kaggle.com/competitions/birdclef-2024
And last year's: https://www.kaggle.com/competitions/birdclef-2023
Deepmind just placed pretty high at International Mathematical Olympiad . Here it does have to present reasoning.
https://arstechnica.com/ai/2024/01/deepmind-ai-rivals-the-wo...
And it's couple years old, but AlphaFold was pretty impressive.
EDIT: Sorry, I said LLM. But meant AI/ML/NN generally, people say a computer can't reason, but DeepMind is doing it.
I couldn't think of a better way to demonstrate that LLMs are poor at reasoning than using this crutch.
Eventually LLMs will be plugged into Vision Systems, and Symbolic Systems, and Motion Systems, etc... etc...
The LLM wont be the main 'thing'. But the text interface.
Even human brain is bit segmented with different faculties being 'processed' in different areas with different architectures.
Does seem more realistic to train something not on text but on actual reasoning/logic concepts and use that along with other models for something more general purpose. LLMs should really only be used to turn "thoughts" into text and to receive instructions, not to do the actual reasoning.
First, as you mentioned, Rudin continues to prove that the reason for using AI/ML is that we don't understand the problem well enough; otherwise we wouldn't even think to use it! So, pushing our focus to better understand the problem, and then levy ML concepts and techniques (including "classical AI" and statistical learning), we're able to make something that not only outperforms some state-of-the-art in most metrics, but often even is much less resource intensive to create and deploy (in compute, data, energy, and human labour), with added benefits from direct interpretability and post-hoc explanations. One example has been the continued primacy of tree ensembles on tabular datasets [0], even for the larger datasets, though they truly shine on the small to medium datasets that actually show up in practice, which from Tigani's observations [1] would include most of those who think they have big data.
Second, we're seeing practical examples of exactly this outside Rudin! In particular, people are using ML more to do live parameter fine-tuning that outwise would need more exhaustive searches or human labour that are difficult for real-time feedback, or copious human ingenuity to resolve in a closed-form solution. Opus 1.5 is introducing some experimental work here, as are a few approaches in video and image encoding. These are domains where, as in the first, we understand the problem, but also understand well enough that there's search spaces we simply don't know enough about to be able to dramatically reduce. Approaches like this have been bubbling out of other sciences (physics, complexity theory, bioinformatics, etc) that lead to some interesting work in distillation and extraction of new models from ML, or "physically aware" operators that dramatically improve neural nets, such as Fourier Neural Operators (FNO) [2], which embeds FFTs rather than forcing it to be relearned (as has been found to often happen) for remarkable speed-ups with PDEs such as for fluid dynamics, and has already shown promise with climate modelling [3], material science [4]. There are also many more operators, which all work completely differently, yet bring human insight back to the problem, and sometimes lead to extracting a new model for us to use without the ML! Understanding begets understanding, so the "shifting goalposts" of techniques considered "AI" is a good thing!
Third, specifically to improvements in explainability, we've seen the Neural Tangent Kernel (NTK) [5] rapidly go from strength to strength since its introduction. While rooted in core explainability vis a vis making neural nets more mathematically tractable to analysis, not only inspiring other approaches [6] and behavioural understanding of neural nets [7, 8], but novel ML itself [9] with ways to transfer the benefits of neural networks to far less resource intensive techniques; which [9]'s RFM kernel machine proves competitive with the best tree ensembles from [0], and even has advantage on numerical data (plus outperforms prior NTK based kernel machines). An added benefit is the approach used to underpin [9] itself leads to new interpretation and explanation techniques, similar to integrated gradients [10, 11] but perhaps more reminiscent of the idea in [6].
Finally, specific to XAI, we're seeing people actually deal with the problem that, well, people aren't really using this stuff! XAI in particular, yes, but also the myriad of interpretable models a la Rudin or the significant improvements found in hybrid approaches and reinforcement learning. Cicero [12], for example, does have an LLM component, but uses it in a radically different way compared to most people's current conception of LLMs (though, again, ironically closer to the "classic" LLMs for semantic markup), much like the AlphaGo series altered the way the deep learning component was utilised by embedding and hybridising it [13] (its successors obviating even the traditional supervised approach through self-play [14], and beyond Go). This is all without even mentioning the neurosymbolic and other approaches to embed "classical AI" in deep learning (such as RETRO [15]). Despite these successes, adoption of these approaches is still very far behind, especially compared to the zeitgeist of ChatGPT style LLMs (and general hype around transformers), and arguably much worse for XAI due to the barrier between adoption and deeper usage [16].
This is still early days, however, and again to harken Rudin, we don't understand the problem anywhere near well enough, and that extends to XAI and ML as problem domains themselves. Things we can actually understand seem a far better approach to me, but without getting too Monkey's Paw about it, I'd posit that we should really consider if some GPT-N or whatever is actually what we want, even if it did achieve what we thought we wanted. Constructing ML with useful and efficient inductive bias is a much harder challenge than we ever anticipated, hence the eternal 20 years away problem, so I just think it would perhaps be a better use of our time to make stuff like this, where we know what is actually going on, instead of just theoretically. It'll have a part, no doubt, Cicero showed that there's clear potential, but people seem to be realising "... is all you need" and "scaling laws" were just a myth (or worse, marketing). Plus, all those delays to the 20 years weren't for nothing, and there's a lot of really capable, understandable techniques just waiting to be used, with more being developed and refined every year. After all, look at the other comments! So many different areas, particularly within deep learning (such as NeRFs or NAS [17]), which really show we have so much left to learn. Exciting!
> Finally, specific to XAI, we're seeing people actually deal with the problem that, well, people aren't really using this stuff!
I am very curious to see which practical interpretability/explainability requirements enter into regulations - on one hand it's hard to imagine a one-size fits all approach, especially for applications incorporating LLMs, but Bordt et al. [1] demonstrate that you can provoke arbitrary feature attributions for a prediction if you can choose post-hoc explanations and parameters freely, making a case that it can't _just_ be left to the model developers either
[1] "Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts", Bordt et al. 2022, https://dl.acm.org/doi/10.1145/3531146.3533153
So, it's an interesting unsolved area for how to put forward approaches that aren't quite one-size fits all, since that doesn't work, but also makes tailoring it to the domain and moment tractable (otherwise we lose what ground we gain and people don't use it again!)... which is precisely the issue that regulation will have to tackle too! Having spoken with some people involved with the AI HLEG [1] that contributed towards the AI Act currently processing through the EU, there's going to have to be some specific tailoring within regulations that fit the domain, so classically the higher-stakes and time-sensitive domains (like, say, healthcare) will need more stringent requirements to ensure compliance means it delivers as intended/promised, but that it's not simply going to be a sliding scale from there, and too much complexity may prevent the very flexibility we actually desire; it's harder to standardise something fully general purpose than something fitted to a specific problem.
But perhaps that's where things go hand in hand. An issue currently is the lack of standardisation, in general, it's unreasonable to expect people re-implement these things on their own given the mathematical nuance, yet many of my colleagues agree it's usually the most reliable way. Things like scikit had an opportunity, sitting as a de facto interface for the basics, but niche competitors then grew and grew, many of which simply ignored it. Especially with things like [0], there are a bunch of wholly different "frameworks" that cannot intercommunicate except by someone knuckling down and fudging some dataframes or ndarrays, and that's just within Python, let alone those in R (and there are many) or C++ (fewer, but notable). I'm simplifying somewhat, but it means that plenty of isolated approaches simply can't worth together, meaning model developers may not have much chance but to use whatever batteries are available! Unlike, say, Matplotlib, I don't see much chance for declarative/semi-declarative layers to take over here, such as pyplot and seaborn could, which enabled people to empower everything backed by Matplotlib "for free" with downstream benefits such as enabling intervals or live interaction with a lower-level plugin or upgrade. After all, scikit was meant to be exactly this for SciPy! Everything else like that is generally focused on either models (e.g. Keras) or explanations/interpretability (e.g. Captum or Alibi).
So it's going to be a real challenge figuring out how to get regulations that aren't so toothless that people don't bother or are easily satisfied by some token measure, but also don't leave us open to other layers of issues, such as adversarial attacks on explanations or developer malfeasance. Naturally, we don't want something easily gamed that the ones causing the most trouble and harm can just bypass! So I think there's going to have to be a bit of give and take on this one, the regulators must step up while industry must step down, since there's been far too much "oh, you simply must regulate us, here, we'll help draft it" going around lately for my liking. There will be a time for industry to come back to the fore, when we actually need to figure out how to build something that satisfies, and ideally, it's something we could engage in mutually, prototyping and developing both the regulations and the compliant implementations such that there are no moats, there's a clearly better way to do things that ultimately would probably be more popular anyway even without any of the regulatory overhead; when has a clean break and freshening up of the air not benefited? We've got a lot of cruft in the way that's making everyone's jobs harder, to which we're only adding more and more layers, which is why so many are pursuing clean-ish breaks (bypass, say, PyTorch or Jax, and go straight to new, vectorised, Python-ese dialects). The issue is, of course, the 14 standards problem, and now so many are competing that the number only grows, preventing the very thing all these intended to do: refresh things so we can get back to the actual task! So I think a regulatory push can help with that, and that industry then has the once-in-a-lifetime chance to then ride that through to the actual thing we need to get this stuff out there to millions, if not billions, of people.
A saying keeps coming back to mind for me, all models are wrong, some are useful. (Interpretable) AI, explanations, regulations, they're all models, so of course they won't be perfect... if they were, we wouldn't have this problem to begin with. What it all comes back to is usefulness. Clearly, we find these things useful, or we wouldn't have them, necessity being the mother of invention and all, but then we must actually make sure what we do is useful. Spinning wheels inventing one new framework after the next doesn't seem like that to me. Building tools that people can make their own, but know that no matter what, a hammer is still a hammer, and someone else can still use it? That seems much more meaningful of an investment, if we're talking the tooling/framework side of things. Regulation will be much the same, and I do think there are some quite positive directions, and things like [1] seem promising, even if only as a stop-gap measure until we solve the hard problems and have no need for it any more -- though they're not solved yet, so I wouldn't hold out for such a thing either. Regulations also have the nice benefit that, unlike much of the software we seem to write these days, they're actually vertically and horizontally composable, and different places and domains at different levels have a fascinating interplay and cross-pollination of ideas, sometimes we see nation-states following in the footsteps of municipalities or towns, other times a federal guideline inspires new institutional or industrial policies, and all such combinations. Plus, at the end of the day, it's still about people, so if a regulation needs fixing, well, it's not like you're not trying to change the physics of the universe, are you?
The idea is that to solve these problems you need to solve the schrodinger equation (1). But the schrodinger equation scales really badly with the number of electrons and can't get computed directly for more than a few sample cases. Even Density Functional Theory (DFT), the most popular approximation that still is reasonably accurate scales N^3 with the number of electrons, with a pretty big pre factor. A reasonable rule of thumb would be 12 hours on 12 nodes (each node being 160 cpu cores) for 256 atoms. You can play with settings and increase your budget to maybe get 2000 (and only for a few timesteps) but good luck beyond that.
Machine learning seems to be really useful here. In my own work on aluminium alloys I was able to get the same simulations that would have needed hours on the supercomputer to run in seconds on a laptop. Or, do simulations with tens of thousands of atoms for long periods of time on the supercomputer. The most famous application is probably alphafold from deep mind.
There are a lot of interesting questions people are still working on:
What are the best input features? We don't have any nice equivalent to CNNs that are universally applicable, though some have tried 3d convnets. One of the best methods right now involves taking spherical harmonic based approximates of the local environment in some complex way I've never fully understood, but is closer to the underlying physics.
Can we put physics into these models? Almost all these models fail in dumb ways sometimes. For example if I begin to squish two atoms together they should eventually repel each other and that repulsion force should scale really fast (ok maybe they fuse into a black hole or something but we're not dealing with that kind of esoteric physics here). But, all machine learning potentials will by default fail to learn this and will only learn the repulsion to the closest distance of any two atoms in their training set. Beyond that and the guess wildly. Some people are able to put this physics into the model directly but I don't think we have it totally solved yet.
How do we know which atomic environments to simulate? These models can really only interpolate they can't extrapolate. But while I can get an intuition of interpolation in low dimensions once your training set consists of many features over many atoms in 3d space this becomes a high dimensional problem. In my own experience, I can get really good energies for shearing behavior of strengthening precipitates in aluminum without directly putting the structures in. But was this extrapolated or interpolated from the other structures. Not always clear.
(1) sometimes also the relativistic Dirac equation. E.g. fast moving moving atoms in some of the heavier elements move at relativistic speeds.
Context for the non-mat-sci crowd - numerically solving Schrodinger essentially means constructing a large matrix that describes all the electron interactions and computing its eigenvalues (iterated to convergence because the electron interactions are interdependent on the solutions). Density functional theory (for solids) uses a Fourier expansion for each electron (these are the one-electron wave functions), so the complexity of each eigensolve is cubic in the number of valence electrons times the number of Fourier components
The tight binding approximation is cool because it uses a small spherical harmonic basis set to represent the wavefunctions in real space - you still have the cubic complexity of the eigensolve, and you can model detailed electronic behavior, but the interaction matrix you’re building is much smaller.
Back to the ML variant: it’s a hard problem because ultimately you’re trying to predict a matrix that has the same eigenvalues as your training data, but there are tons of degeneracies that lead to loads of unphysical local minima (in my experience anyway, this is where I got stuck with it). The papers I’ve seen deal with it by basically only modeling deviations from an existing tight binding model, which in my opinion only kind of moves to problem upstream
https://developer.nvidia.com/modulus
https://www.ansys.com/ai
Could you elaborate on this further? How exactly were the simulations sped up? From what I could understand, were the ML models able to effectively approximate the Schrodinger's equation for larger systems?
Then you can use the trained method on new arbitrary structures. If you've done everything right you get good, or good enough results, but much much faster.
At a high level It's the same pipeline as in all ML. But some aspects are different, e.g. unlike image recognition you can generate training data on the fly by running more DFT simulations
I suppose your process would be using ML to get pointed in the "right direction" and then confirming the models theories using the expensive method?
If there is anything unclear you're interested in just let know. In my heart I feel I'm still just a McDonald's fry cook and feel like none of this is as scary as it might seem :)
I'd like to see something about other ML methods such as SVM, XGBoost, etc.