To my surprise I see a repeating theme:
- ChatGPT generates ample amount of code that eventually runs after refinement through multiple prompts.
- Solution hits a hard edge case and we need to rework, refactor stuff, involving even more prompts to get it working (this is understandable as more tech debt is introduced)
- Solution hits an edge case and eventually ChatGPT4 starts hitting a loop saying it has a new solution or it fixed something but it really didn't.
- Manual inspection of code and simple logic removes that edge case from that solution.
To me this was an interesting to see: ChatGPT4 seems quite apt at handling requirements but it is unable to see very obvious mistakes or one liners.
I've spent about a week developing an iOS app with ChatGPT4 and these are my findings. I'm wondering if anybody else has experimented with generating code they have no experience or knowledge of and what your experience was like.
Have also been using Claude 3 Opus recently. Opus follows my style exactly and is better at making edits to longer code. When things get really complex, it tends to fail at tasks where GPT-4 (turbo) still delivers results.
Gemini 1.5 Advanced which is a lot faster and sometimes the only one to come up with non-convoluted one-liners that meet the requirements exactly.
So, as of today, my experience is,
- best vision & complex reasoning: GPT
- best style match: Opus
- best speed: Gemini
Years later, the pandemic hit and I found myself with excess free time and nothing to do in the evenings. So I decided to take a series of Coursera classes to learn data science programming in Python and VBA. VBA became rather useful as I was able to program sophisticated macros for automating tasks in SolidWorks CAD. But my programming knowledge has still been rather limited compared to most professional programmers.
When ChatGPT came out and people started tinkering with it for programming, I was delighted to find that it was able to produce what appeared to be SolidWorks VBA scripts. But on further examination the scripts produced were often buggy and in need of rework. So I was rather skeptical for a while about ChatGPT's usefulness.
A few years later, I'm now in a job that requires a LOT of programming and scripting in a multitude of different languages (Python, Bash, and a few others) and the focus has shifted away from CAD. I'll humbly admit that ChatGPT has saved me multiple times in figuring out how to approach different problems. The code often works right on the first try. It's an essential tool for me now.
It once did manage to write a very simple Bash script that worked first time.
Calculators resulted in a generation that could not do arithmetic in their heads. Word processors with autocorrect resulted in a generation that could not spell. ChatGPT et. al. will result in a generation of developers who cannot code. And then we'll be hoping that the Enterprise will come along and help us maintain our infrastructure.
if the local model got stuck I try to abstract the problem and present gpt with the general case (e.g. given 2 columns A,B, find... )
The idea is to use LLMs to generate glue code between applications/APIs; making it easy to connect different apps together (e.g., you can scrape websites directly into google sheets; get it to classify gmails, etc.)
Experience so far -- model quality really matters. Claude Opus does surprisingly a lot better than GPT4Turbo. Creating good abstractions matter. Syntax, type checking matters. Models do get stuck sometimes, and GPT4 often gets lazy (placeholder/comments instead of actual code).
I've had the exact same experience as you, it starts great but cannot handle edge cases and starts to bloat code which has exactly 0 effect on the output.
Same thing with other LLMs, so my understanding is we are far from the utopia.
I've simply had to just modify things at the end myself or talk to a very experienced developer friend of mine who also pointed out the code was clunky, so could be cleaner.
A few more years and we might actually have something that is scary, but doesn't seem like its today
So gemini 1.5 pro has enabled scenarios which were remaining un-done earlier: It was too much work to deploy and keep updated the bash/python scripts on other peoples computers, and it was not worth my time to develop these automations in golang from the get-go. But with this, I hand out management scripts like candy to other users, dramatically reducing dependence on me.
When I'm testing out a new LLM this is usually my go to. How far up the abstraction layers can I go?
Still have to check the code yourself, ask it to troubleshoot etc. but I find this workflow pretty effective.
It can be frustrating trying to reprompt it if it goes off on a tangent, and sometimes I just can't get it to go down the path I know I want, but over all I'm fairly confident it's saved me time - particularly when dealing with things I'm not familiar with such as jq or the aws cli
It's really good at being a better auto complete.
Sometimes it seems almost like you could do similar things with non-ML algorithms, I'm not sure if the AI is really smart, or if code just isn't as high entropy as it seems.
I've never tried coding in a language I don't actually know though. I have very minimal Rust experience, maybe that would be a fun experiment to learn more.
That said, I'm still in the free trial of copilot fot github and I'm really liking it.
I stick to one liners and adding functionality to them in small increments to make small scripts. They still make lots of mistakes so I try to keep it to them fixing one mistake at a time.
ChatGPT used to be a lot better than it is now.
For old things and extremely popular works very well.
For the rest is pure shit, same experience with copilot.
Like it makes up with it would imagine Apple’s API should look like, when it does not in fact the API does not look like that at all.
im just sitting back here shocked that i can produce a complete iOS app (albeit very basic) without studying Swift
It takes longer to debug chatgpt hallucinations than writing the code from docs.
You could try RAG, maybe to help you search for information in documentation
> Solution hits an edge case and eventually ChatGPT4 starts hitting a loop saying it has a new solution or it fixed something but it really didn't.
An infuriating subtype of this "loop" failure is when you point out a flaw, and it fixes it, but now there's a different flaw. You point that out, it fixes it, but now the original flaw is back again.
Also consider that $20/mo does not come close to the actual cost of running LLMs. The companies that own those things pour money into them and attract users to create a market they can dominate, spending investor money and taking advantage of public energy and water infrastructure. Once so-called AI has insinuated itself as a necessity the price will go up. Look at cloud hosting and software licenses for examples of the rent-seeking model. We're in the market-making stage now, overlooking the problems and limitations because of FOMO.
You can already get keyboards and mice with copilot keys. Every company seems to rush to add "AI" to their software with no evidence that it makes anything better. We will just come to accept it, like cookie pop-ups and clicking on photos of fire hydrants, without asking why or how it improves anything.