Ask HN: Do you use ChatGPT to generate code? What were your experiences?

So I was experimenting with Swift without knowing any Swift code to see if I could generate an app.

To my surprise I see a repeating theme:

- ChatGPT generates ample amount of code that eventually runs after refinement through multiple prompts.

- Solution hits a hard edge case and we need to rework, refactor stuff, involving even more prompts to get it working (this is understandable as more tech debt is introduced)

- Solution hits an edge case and eventually ChatGPT4 starts hitting a loop saying it has a new solution or it fixed something but it really didn't.

- Manual inspection of code and simple logic removes that edge case from that solution.

To me this was an interesting to see: ChatGPT4 seems quite apt at handling requirements but it is unable to see very obvious mistakes or one liners.

I've spent about a week developing an iOS app with ChatGPT4 and these are my findings. I'm wondering if anybody else has experimented with generating code they have no experience or knowledge of and what your experience was like.

16 points | by spxneo 9 days ago

22 comments

  • cetinsert 9 days ago
    Have been using GPT-4 for code generation on RTCode.ai for months. It was magical before turbo. It is still the best at complex tasks and vision. GPT-4 did just know what my 10 years-old, extensible, type-and-source-polymorphic, non-linear applicative parser combinator library for F# 3+ would produce given just the examples! (Not even sources or types.)

    Have also been using Claude 3 Opus recently. Opus follows my style exactly and is better at making edits to longer code. When things get really complex, it tends to fail at tasks where GPT-4 (turbo) still delivers results.

    Gemini 1.5 Advanced which is a lot faster and sometimes the only one to come up with non-convoluted one-liners that meet the requirements exactly.

    So, as of today, my experience is,

    - best vision & complex reasoning: GPT

    - best style match: Opus

    - best speed: Gemini

  • pupppet 9 days ago
    I find it works well but you still need to know what you're doing and steer it in the right direction. For example I just asked it to whip up some code for a Lambda function to copy DynamoDB documents from one table to another, just mundane stuff I don't want to think about. But I noticed it provided coded that iterated and wrote each document one at a time which doesn't make sense when batchWrite is available. So I suggest it reproduce the code using batchWrite and of course it's all 'yes this makes sense that would be so much faster' (paraphrasing obviously!).
  • Fermatfan 9 days ago
    Code is simply there to achieve logic set out in the parameters of your issue required to have optimum results. https://en.wikipedia.org/wiki/Axiomatic_design A perfect prompt to an AI will either result in your bias (understanding/fallical argument) being an issue or a flaw in the training data of the company. I was given lies to children arguments for letting me have an understanding of what would be simpler to get more technically proficient at the thing I requested. It was regarding error handling in early September 2023. I couldn't solve in time for the fixed no consequence deadline the next day, so I turned to a tool that provided a solution outside the realm of my understanding. Which worked. Kind of got the logic but too scared to ask because of consequences of how this went against the rules of my learning. I was just utilising a tool to finish the project. Not using agile teaching, far more waterfall in my opinion. ChatGPT 4 is not that good for SQL prompts but I have a limited technical knowledge base regarding it. How do we use agile and prompts to develop output on AI tools? As code is not the end result in terms of a business setting, it's just a tool in a tool. https://en.wikipedia.org/wiki/Computational_thinking
  • neonlights84 8 days ago
    I don't have a formal background in CS -- most of my schooling and professional background has been centered squarely on mechanical engineering. Attending college in 2004-2009, the school I went to spent just a couple class sessions teaching MATLAB and LabView. Neither language really appealed to me and after some initial struggles I realized that our TAs weren't really checking our scripts... so I faked a bunch of non-functional homework scripts just to move forward in the class. The experience soured me on programming for years.

    Years later, the pandemic hit and I found myself with excess free time and nothing to do in the evenings. So I decided to take a series of Coursera classes to learn data science programming in Python and VBA. VBA became rather useful as I was able to program sophisticated macros for automating tasks in SolidWorks CAD. But my programming knowledge has still been rather limited compared to most professional programmers.

    When ChatGPT came out and people started tinkering with it for programming, I was delighted to find that it was able to produce what appeared to be SolidWorks VBA scripts. But on further examination the scripts produced were often buggy and in need of rework. So I was rather skeptical for a while about ChatGPT's usefulness.

    A few years later, I'm now in a job that requires a LOT of programming and scripting in a multitude of different languages (Python, Bash, and a few others) and the focus has shifted away from CAD. I'll humbly admit that ChatGPT has saved me multiple times in figuring out how to approach different problems. The code often works right on the first try. It's an essential tool for me now.

  • johannesrexx 9 days ago
    No Rust code ChatGPT has ever generated for me has worked and it was a complete waste of time.

    It once did manage to write a very simple Bash script that worked first time.

    Calculators resulted in a generation that could not do arithmetic in their heads. Word processors with autocorrect resulted in a generation that could not spell. ChatGPT et. al. will result in a generation of developers who cannot code. And then we'll be hoping that the Enterprise will come along and help us maintain our infrastructure.

  • kiviuq 9 days ago
    Bots are great time savers when learning new technology and for doing tedious tasks. For example, I'm learning Rust and using a combination of GPT-4, local llama3, and Codeium helped me to write more Rust-like code. They can generate schema definitions from sample XML data that I use in Spark, and generally help me write better and idiomatic PySpark/Spark queries. Afterward, you wouldn't be able to tell whether the code was written by someone with years of Spark/Python/Rust experience or a noob like myself. As long as you're fluent in some related tech stack (like Scala, SQL), I find bots can get you there quickly. They also work best when combined with TDD / test-first. I get decent results when I let the bot focus on smaller aspects. When I'm unsure or the code has issues, I take it to the other Bot to find alternative solutions. This way, I can quickly get an idea of what you can do in the language, including lesser-used features. Finally I get unit tests and documentation for free.
    • dmarchand90 9 days ago
      Could you clarify what parts you use llama and which you use gpt4 on?
      • kiviuq 8 days ago
        I use llama for anything that falls under an NDA clause, e.g. data by which the client could be identified. Like asking it analyze existing code/algorithms, to document functions or parse data is done only locally.

        if the local model got stuck I try to abstract the problem and present gpt with the general case (e.g. given 2 columns A,B, find... )

  • jngiam1 8 days ago
    We've been hacking up using generated code at https://lutra.ai

    The idea is to use LLMs to generate glue code between applications/APIs; making it easy to connect different apps together (e.g., you can scrape websites directly into google sheets; get it to classify gmails, etc.)

    Experience so far -- model quality really matters. Claude Opus does surprisingly a lot better than GPT4Turbo. Creating good abstractions matter. Syntax, type checking matters. Models do get stuck sometimes, and GPT4 often gets lazy (placeholder/comments instead of actual code).

  • shivc 9 days ago
    I've used it primarily for 2 things: dataset manipulation & visualization code and simpler scripts to deploy specific website solutions.

    I've had the exact same experience as you, it starts great but cannot handle edge cases and starts to bloat code which has exactly 0 effect on the output.

    Same thing with other LLMs, so my understanding is we are far from the utopia.

    I've simply had to just modify things at the end myself or talk to a very experienced developer friend of mine who also pointed out the code was clunky, so could be cleaner.

    A few more years and we might actually have something that is scary, but doesn't seem like its today

  • ghoul2 8 days ago
    gemini 1.5 pro is very good. far from perfect, but the long context window helps it stay focussed when iterating on a code base. I tried chatgpt, etc, but it was obvious (to me) that I was wasting more time trying to get these models to work right - I could have just coded it myself faster. But Gemini 1.5 pro has changed that. For moderate-sized stuff (mainly automation, ops scripts etc), its now my goto. And I discovered its much better at golang than even at python. I have converted quite a few fairly-complex python/shell scripts, with feature additions and cleanups, to golang code using gemini 1.5 pro. Just give it the original code, ask it to redo it in golang, and then keep prompting it to add feature-after-feature. One main advantage of this is golang can be cross-compiled to other OSen, and generated a single, stand-alone binary - this means I can distribute these scripts/utilities easily - while I could not do the same for python or shell, as people who would use them are on windows typically.

    So gemini 1.5 pro has enabled scenarios which were remaining un-done earlier: It was too much work to deploy and keep updated the bash/python scripts on other peoples computers, and it was not worth my time to develop these automations in golang from the get-go. But with this, I hand out management scripts like candy to other users, dramatically reducing dependence on me.

  • f0e4c2f7 8 days ago
    I find that the key is in working in the right size abstraction. If you ask gpt-4 to write whole apps for you, it usually can't get there. But if you architect the app in your head you can ask it to build it out one function at a time (with unit tests) pretty effectively.

    When I'm testing out a new LLM this is usually my go to. How far up the abstraction layers can I go?

    Still have to check the code yourself, ask it to troubleshoot etc. but I find this workflow pretty effective.

  • davesmylie 9 days ago
    I've used it fairly extensively to create shell scripts and found it moderately useful. It'll generate a base line that I can then take and run with, adding what I need.

    It can be frustrating trying to reprompt it if it goes off on a tangent, and sometimes I just can't get it to go down the path I know I want, but over all I'm fairly confident it's saved me time - particularly when dealing with things I'm not familiar with such as jq or the aws cli

  • eternityforest 9 days ago
    I don't use ChatGPT for anything directly, but I do use the Codeium plugin in VS Code.

    It's really good at being a better auto complete.

    Sometimes it seems almost like you could do similar things with non-ML algorithms, I'm not sure if the AI is really smart, or if code just isn't as high entropy as it seems.

    I've never tried coding in a language I don't actually know though. I have very minimal Rust experience, maybe that would be a fun experiment to learn more.

  • kesavvaranasi 8 days ago
    I've been using ChatGPT to get a crash-course in Next.Js. I'm coming from a backend and data eng. background so ChatGPT is helpful for getting advice for modifying Next.JS components for the project I'm working on. Similarly, it's been helpful for styling SwiftUI views for iOS app dev.
  • meiraleal 9 days ago
    I use chatgpt to refactor code all the time. I think about a big refactor, supply the code and get it back looking like the code I would write, refactored. I won't ever code the same way as before, too boring. Unfortunately tho, some times it takes way longer for the correct prompt than it would take to code it myself. But just some times.
  • bionhoward 9 days ago
    I used the service heavily for a while but cancelled my sub and switched to mistral due to the AI customer noncompete issue. I use mistral large occasionally when either stumped or bulk stylistically rewriting. AI is most helpful in tandem with other static analysis tools / type systems because those others can help verify the AI code.
  • BMc2020 9 days ago
    Both Chat GPT and Gemini are very good, CoPilot and Mistral Lechat are noticeably worse.

    That said, I'm still in the free trial of copilot fot github and I'm really liking it.

    I stick to one liners and adding functionality to them in small increments to make small scripts. They still make lots of mistakes so I try to keep it to them fixing one mistake at a time.

    • Turing_Machine 8 days ago
      Now, that's the opposite of my experience. ChatGPT is better for general questions, while Copilot is better for code.

      ChatGPT used to be a lot better than it is now.

  • delduca 9 days ago
    I have used for Python, Go and TypeScript.

    For old things and extremely popular works very well.

    For the rest is pure shit, same experience with copilot.

  • mikeocool 9 days ago
    With Swift specifically, I’ve found it to be very helpful 50% of the time, and very convincingly hallucinate things the other 50% of the time.

    Like it makes up with it would imagine Apple’s API should look like, when it does not in fact the API does not look like that at all.

    • spxneo 9 days ago
      interesting...i did not experience much hallucinations for my use case and also I was building on top of an existing library could be why i had a better time.

      im just sitting back here shocked that i can produce a complete iOS app (albeit very basic) without studying Swift

  • hpeter 9 days ago
    Generating code is a nice gimmick but you are better off following tutorials and googling how to build an app, for now.

    It takes longer to debug chatgpt hallucinations than writing the code from docs.

    You could try RAG, maybe to help you search for information in documentation

  • Turing_Machine 8 days ago
    It's great at straightforward boilerplate code, and I find it useful.

    > Solution hits an edge case and eventually ChatGPT4 starts hitting a loop saying it has a new solution or it fixed something but it really didn't.

    An infuriating subtype of this "loop" failure is when you point out a flaw, and it fixes it, but now there's a different flaw. You point that out, it fixes it, but now the original flaw is back again.

  • gregjor 9 days ago
    Meh. I would prefer hiring a junior programmer if I have to do that much review and mentoring. At least a real person would get the benefits rather than some tech giant that wants to take away jobs and profit from the work of others.
    • meiraleal 9 days ago
      You prefer to pay 3-5k usd to a person in place of 20 usd to a tech company?
      • gregjor 9 days ago
        Yes. You get what you pay for. ChatGPT et al just waste my time. I don't need hundreds more lines of code. I need thinking and learning and solutions to business problems. The costs of AI code generation go much higher than $20 because I have to spend so much time "prompting" and checking and testing stuff that I could have found on StackOverflow once for free.

        Also consider that $20/mo does not come close to the actual cost of running LLMs. The companies that own those things pour money into them and attract users to create a market they can dominate, spending investor money and taking advantage of public energy and water infrastructure. Once so-called AI has insinuated itself as a necessity the price will go up. Look at cloud hosting and software licenses for examples of the rent-seeking model. We're in the market-making stage now, overlooking the problems and limitations because of FOMO.

        You can already get keyboards and mice with copilot keys. Every company seems to rush to add "AI" to their software with no evidence that it makes anything better. We will just come to accept it, like cookie pop-ups and clicking on photos of fire hydrants, without asking why or how it improves anything.