IBM Granite: A Family of Open Foundation Models for Code Intelligence

(github.com)

252 points | by lukhas 11 days ago

11 comments

sbierwagen 11 days ago
3B, 8B, 20B and 34B parameter model weights available here: https://huggingface.co/collections/ibm-granite/granite-code-...
dusanh 11 days ago
I'm a complete newb when it comes to AI, and I am getting pretty ashamed of it too. How do I take a model like this and use it in my day to day? Can I somehow use in, say, VSCode? How do I point it at my code base, and use it to help me write new code?
[-]
- everforward 11 days ago
  You run most of these models in something that wraps them in an HTTP API. I use Ollama, which I think is the most popular but I’m not in a great position to judge. My impression is that it handles running models on CPU better.
  So you’d basically install Ollama, download one of the versions of this model off HuggingFace, create a Modelfile since this isn’t in the default Ollama repo, and then Ollama can answer prompts with the model. Modelfiles are very simple, based on Dockerfiles. It takes like 15 seconds to make one if you aren’t messing with the various parameters.
  Once it’s in Ollama, just get one of the various GPT plugins for VSCode and give it the Ollama URL (http://localhost:11434 by default). I use continue.dev but there are many.
  Continue takes over the tab autocomplete with the LLM, and has a chat window on the right where you can use keyboard shortcuts to copy code into the prompt and ask it to edit/generate code or ask questions about existing code.
  [-]
  - homarp 10 days ago
    if you can compile stuff, then looking at llama.cpp (what ollama uses) is also interesting: https://github.com/ggerganov/llama.cpp
    the server is here: https://github.com/ggerganov/llama.cpp/tree/master/examples/...
    And you can search for any GGUF on huggingface
  - dusanh 11 days ago
    Thank you so much! That sounds surprisingly straightforward. I expected a lot more fiddling to get going.
    Where would I start if I wanted to use a model programmatically ? Like let's say I am building a chat bot. I have a large data set of replies I want the model to mimic, and I'd want to do this in Python. Of course, I'd probably use a different model than Granite.
    [-]
    - everforward 11 days ago
      This is stretching my own knowledge, so if someone else knowledgeable wants to take a stab here I would appreciate a response as well!
      Before doing that, I would start basic. Pull llama3 and see what it does with your prompts. You may be surprised how much is already in there and just not need to involve your own data at all. If that doesn’t work, check HuggingFace to see if someone has already made a model/fine tune/LoRA for what you’re trying to do. There are many, eg I found a Magic The Gathering rules model the other day.
      If those fails, or you just want to play with your own data, you’ll need to figure out what “mimic” means.
      If the model does okay with generating content but the content is factually wrong or missing background, you may be able to just do RAG (retrieval augmented generation). Basically running your documents through an AI that converts them to embeddings (some kind of vector, I don’t understand how they work). Then when you run a query, you can search for related embeddings and pass them to the model so that it “knows” the content that was in the document. This is the easiest; open-webui (the Ollama web chat interface) has some RAG support. Danswer is open source and built from the ground up to do RAG, and has built in support for ingesting from Slack, Drive, etc, etc. OpenAI also has embedding as a service.
      A step up from that is making a LoRA. To my novice eyes, LoRA’s are basically a diff of the models parameters or weights. So rather than training a whole new model, you just add deltas to an existing one. These let you “teach” the model something while preserving the base generation capabilities of the underlying model. Ie you won’t have to worry about making sure you feed it enough data that it can speak English properly, because it gets that from the base model, you only have to give it enough data to speak about whatever you’re training it on.
      If that doesn’t make any sense, go check CivitAI for Stable Diffusion (image model) LoRAs. The effects are way more obvious on image AIs.
      Anyways, LoRAs are trained so you’re into training there. I think HuggingFace has tools that make this easy, but I don’t know enough to say anything with confidence.
      The last option, which you almost certainly don’t want, is to train a new base model like llama3. You’re starting from 0 there; you have no existing model so you will have to teach it everything. It will take a ton of data, it will take forever to train, and it will likely be much worse than even randomly clicking models on HuggingFace. Meta has spent who knows how much on Llama and it still hallucinates.
      If you end up training, you’ll probably end up doing it in the cloud unless you have tons of VRAM doing nothing. Prices are pretty reasonable, I think A100s are around $2/hr. I don’t know how to gauge how long it needs to train, but I believe it’s related to the amount of data you’re training on. I believe it’s pretty reasonable for LoRAs though, I’m guesstimating in the $20-ish range?
      Edit: oh, and I’m not affiliated in any way, but I found out last night that Fireworks’ new function calling model is free while it’s in beta, which is a neat/fun thing to play with. https://fireworks.ai/blog/firefunction-v1-gpt-4-level-functi... it’s also open weights if you want to run it locally, but it’s a 40B model so I can’t on my 3060
      [-]
      - dusanh 11 days ago
        Thank you again! This is definitely something to start from!
- huijzer 10 days ago
  https://github.com/TabbyML/tabby can run self-hosted AI coding assistants. I tried it a while ago and it worked with Nvim pretty easily. There is a VS code extension too. The extension will just sort of "read" with you and provide suggestions from time to time. Anytime the suggestion is good you can press some key (<TAB> by default) to accept it. It's basically autocomplete on steroids.
- mark_l_watson 10 days ago
  If you like Emacs (I use both Emacs and VSCode, for slightly different coding use cases), then the Emacs elamma [1]package is very nice. It is set up out of the box to use Ollama and to use M-x commands for code completion, summarization, and dozens of other useful functions. I love it, your mileage may vary.
  [1] https://github.com/s-kostyaev/ellama
victor9000 11 days ago
Does anyone know of other open models available for code intelligence?
[-]
- sp332 11 days ago
  WizardCoder, StarCoder, CodeLlama?
  [-]
  - ekianjo 11 days ago
    deepseek coder as well
    [-]
    - xRyen 11 days ago
      There is CodeQwen, too.
adt 11 days ago
https://lifearchitect.ai/models-table/
dur-randir 11 days ago
Based on their own numbers, 8B seems decent, but 34B not worth it compared to general-purpose trained models even on specific tasks. Which is an interesting result.
continuational 11 days ago
Is there an online demo of this somewhere?
koryk 11 days ago
I am seeing at least one granite model on ollama, wonder when they will all show up!
throwaway290 10 days ago
As usual, license/copyright violation:
> Our process to prepare code pretraining data involves several stages. First, we collect a combination of publicly available datasets (e.g., GitHub Code Clean, Starcoder data), public code repositories, and issues from GitHub
khana 11 days ago
[dead]
reacharavindh 11 days ago
https://i.kym-cdn.com/photos/images/original/001/138/631/b7a...
[-]
- holografix 11 days ago
  Is this a segway for IBM to release Terraform specific LLMs so I never have to write that hot garbage ever again? Sign me up IBM!
  [-]
  - snapcaster 11 days ago
    just a heads up it's segue in the context you're using it
    [-]
    - holografix 10 days ago
      thank you internet stranger!
  - nwsm 11 days ago
    Here's a similar existing product- https://www.ibm.com/products/watsonx-code-assistant-ansible-...
hustwindmaple1 11 days ago
I wonder why companies like IBM are jumping on the LLM bandwagon and training/releasing models that have no chance of competing with Llama/Mistral? To me it just looks like a complete waste of $$ because nobody will use them in any serious scenarios
[-]
- paxys 11 days ago
  IBM made $60 billion in revenue last year. Where do you think it all came from? The same companies/governments that buy their overpriced crap are going to buy these new LLMs as well.
  [-]
  - breezeTrowel 11 days ago
    These are open weight models released under an Apache 2.0 license. There's nothing to buy.
    [-]
    - kkielhofner 11 days ago
      IBM is a sales and services org.
      Their customers aren’t going to build their own RAG and agent frameworks, vector DBs, data ingest pipelines, finetunes, high scale inference serving solutions, etc, etc.
      There’s an incredible amount of stuff to buy.
      [-]
      - hustwindmaple1 10 days ago
        Right, but they can just use Llama/Mistral for free, instead of their inferior models, which I'm sure take quite a bit of resources to train in the first place.
        [-]
        kkielhofner 9 days ago
        Yes but using someone else's models doesn't make them an "AI company".
    - paxys 11 days ago
      Who is going to host them?
- xarope 11 days ago
  Enterprises think differently. They want data provenance, privacy, ability to mitigate/transfer risk etc. If IBM is willing to offer that, there will be enterprises that bite.
  [-]
  - acheong08 11 days ago
    Llama and Mistral are already local & fulfill these requirements
    [-]
    - abdullin 11 days ago
      IBM goes at great lengths to train models on clean data that has lower risk of copyright or legal issues attached. Just take a look at the model description.
      That data issue is important enough for some companies to pick mediocre model over llama or mistral.
      [-]
      - doctorpangloss 11 days ago
        What if I told you that a lot of freely licensed code on GitHub is not clean? That the authors may have read something and rewritten it in a way that wasn’t transformative? So it basically has the same problems.
        [-]
        bayindirh 11 days ago
        What if I told you the supposedly clean "The Stack" dataset contains at least one GPL repository inside, just because their license detection tool bugged out?
        IBM and other big players are vigilant about these things, and this is what companies pay for.
        Their software may not be better in some metrics, but they're cleaner in some and their support contracts allows people to sleep tight at night.
        This is what money buys. Peace of mind and continuity.
        [-]
        maccard 11 days ago
        > IBM and other big players are vigilant about these things, and this is what companies pay for.
        And more importantly, IBM will guarantee it in the case that they're wrong. _That's_ what companies pay for.
        [-]
        CamperBob2 11 days ago
        And more importantly, IBM will guarantee it in the case that they're wrong.
        So will OpenAI, according to Sam Altman. Can they be trusted?
        [-]
        bayindirh 10 days ago
        IBM has a track record going back to automatically price calculating cheese cutters [0], but Sam does not.
        IBM has proven itself in various ways over the years, OpenAI hasn't.
        While IBM is a behemoth of a money making machine, they put money where their mouth is. OpenAI does not.
        So I'll trust IBM, but not OpenAI.
        [0]: https://youtu.be/z8VhNF_0I5c
        bayindirh 11 days ago
        Yes. I tucked it under "support contract" part mentally, actually.
        [-]
        maccard 10 days ago
        That's fair, but until I actually read one of those contracts myself I didn't really understand what people meant by "support"
        [-]
        bayindirh 10 days ago
        It depends per job and per requirements, and has a direct affect on the cost of the contract in general.
        doctorpangloss 10 days ago
        Indemnity is moving the goal posts, no? So you’re conceding that their data isn’t clean. But they say it’s clean.
        This support contract stuff: what are you talking about? You download these models, you use them. What would you pay for? It’s not clean data, they say it’s clean: why would I pay liars? Let’s game out the indemnity idea. I pay $10k/mo for 12 months. Then OpenAI loses v. NYTimes, ruled LLM training is not fair use, need express permission. IBM pulls the models. What the hell did I pay $120k for? And by the way, you can pay a law student 1 beer to tell you OpenAI is going to lose because of Warhol v Goldsmith. You can do whatever you want with your money, but I personally would not waste it on worthless indemnity.
        [-]
        bayindirh 10 days ago
        First of all, "The Stack" is the dataset that models like StarCoder is trained upon. I don't know what's the data source for IBM Granite family.
        I know the Stack is not clean, because they included my fork of GDM's greeter, which is GPL licensed.
        My words about IBM was in general. I can't tell anything about their models, because I didn't see mention of "The Stack", and I don't know what their models are based on.
        On the other hand, IBM doesn't like risks from my experience, so they would play it way safer than other companies.
        If their data is not clean to begin with, then shame on them, and hope their AI efforts burn to the ground.
        BTW, LLM training is not fair use. For start, Fair Use's definition automatically excludes "for profit" usage. Just because OpenAI has a non-profit part and training done here doesn't make them immune to consequences of for profit operations.
        [-]
    - rolisz 11 days ago
      Yes, but nobody got fired for buying IBM
      [-]
      - antod 11 days ago
        Yeah if something you install doesn't work, you get the blame. If IBM supplies something that doesn't work (likely), you get to blame them instead.
        [-]
        szszrk 11 days ago
        Between "works" and "doesn't work" there is a full rainbow of possible answers, KPIs, yearly reviews in a network of matrix reporting.
        There will be market for their services. Maybe a different one, but there will be.
      - morgante 11 days ago
        That's a pretty outdated phrase, even in enterprise.
        [-]
        kubami 11 days ago
        By no means it is an outdated phrase. Ask any startup sales person!
        [-]
        morgante 11 days ago
        I personally know a VP who was fired for buying "IBM Cloud." You can absolutely get flak for choosing IBM these days, even at a stodgy enterprise.
        The gist is still current, but you need to fill in AWS as the current uncontroversial choice.
        [-]
        dubcanada 11 days ago
        Must be a very terrible company to work for if they are firing people solely on them picking X over Y.
    - jazzyjackson 11 days ago
      but who can you pay to run these models and fulfill these requirements /for you/ ;)
      [-]
      - worthless-trash 11 days ago
        I could be wrong, but I think thats what the RHEL AI, topic was all about 24 hours ago ?
    - keefle 11 days ago
      Can I sue Lamma and Mistral if things go wrong?
      [-]
      - dartos 11 days ago
        Llama is owned by Meta, so you’d be suing meta
        But I’m pretty sure both models have “we’re not responsible” clauses.
        [-]
        keefle 10 days ago
        That was my point. Whereas if you are using a service from IBM as an enterprise, you would be able to sue them
  - insane_dreamer 11 days ago
    “Nobody was ever fired for hiring IBM”
- mhh__ 11 days ago
  IBM do a mixture of shovelware and extremely hardcore tech so they could honestly go either way with this.
  [-]
  - semi-extrinsic 11 days ago
    Agreed. For example their research lab in Zurich has been absolutely world-leading in things like atomic force microscopy (AFM) for four decades, including the Nobel prize in Physics in 1986 (AFM) and 1987 (high-temperature superconductivity). They also invented things like trellis coding and token ring.
  - propter_hoc 11 days ago
    > mixture of shovelware and extremely hardcore tech
    Citation needed
    All I've seen from them in my professional experience is actually legacy mainframe maintenance.. Not shovelware, but very far from hardcore tech.
    [-]
    - mistrial9 11 days ago
      no - here is an example from Aug 2019 EE Times:
      PALO ALTO, Calif. – IBM defined at (trade show ed.) Hot Chips a new interface for the 2020 version of its Power 9 CPUs. The Open Memory Interface (OMI) will enable packing on a server more main memory at higher bandwidth than DDR, and as a potential Jedec standard could rival GenZ and Intel’s CLX.
      OMI basically removes the memory controller from the host, relying instead on a controller on a relatively small DIMM card. Microchip’s Microsemi division already has a DDR controller running on cards in IBM’s labs. The approach promises to deliver up to 4TBytes memory on a server at about 320GBytes/second or 512GB at up to 650GB/s sustained rates.
    - starspangled 5 days ago
      https://research.ibm.com/semiconductors#publications
      https://research.ibm.com/blog/albany-semiconductor-research-... etc
      IBM doesn't have fabs, but they still do R&D into semiconductors that very much target future commercial processes. They do a fair bit on quantum computing too, to name just a couple of things.
    - andsoitis 11 days ago
      “The South Korean technology giant Samsung Electronics was awarded a total of 6,165 United States patents in 2023, the most of any company. Qualcomm ranked second among companies, with 3,854 U.S. patents granted, followed by the likes of Taiwan Semiconductor Manufacturing Company and IBM.” — https://www.statista.com/statistics/274825/companies-with-th...
    - voidmain0001 11 days ago
      Non-mainframe maintenance:
      https://research.ibm.com/blog/ibm-molecule-generation-experi...
      https://www.smithsonianmag.com/smart-news/ibm-engineers-push...
    - mhh__ 11 days ago
      Those mainframes are actually pretty modern and interesting.
      If IBM split off half of their mainframe division and let some competition get going I think the segment could actually be something to contend with.
      The basic idea of the IBM mainframe is almost perfect for what a lot of companies actually need (massively reliable hardware to support lots of middling software; most work is shunting data around) but everyone knows they're going to get locked into IBM.
    - seankurtz 11 days ago
      On the contrary, the maintenance and continued improvement of an entire ISA and ISA specific operating systems is exactly my idea of hardcore tech, i.e. continuing to pay a chip org to design new chips for said ISA every generation and implement new instructions...and continuing to pay OS and compiler programmers to work those into their OS's and compilers...I'm not sure where we draw the line on maintenance vs. continued development here, but I'm not sure I'd call that purely maintenence.
      There really aren't a lot of companies out there that can claim to do similar (and of course besides s390x, an ancient and venerable CISC, IBM also has Power, so they are doing this 2x over). You'll find a lot of IBM employees contributing to what I'd consider "hardcore" tech like LLVM and the Linux kernel as a result, because they genuinely have a large amount of expertise in those and similar areas. And here I'm not even really including Red Hat, but if you include them then they are even more overweight in the hardcore tech category.
      If anything, a lot of the rest of the tech industry has left "hardcore tech" behind due to efficiency concerns as a result of a longrunning industry wide process of consolidation and commodification that IBM has resisted for obvious reasons. IBM is hardcore to a fault if anything.
      TLDR: I actually think IBM punches above their weight in the "hardcore tech" area so long as our definition is sufficiently low level rather than say, cloud services, in which case fair enough you can probably fairly say they suck at that.
      Here I've also chosen to entirely ignore IBM research.
      [-]
    - esafak 11 days ago
      IBM Research is hardcore.
- Brajeshwar 11 days ago
  When they pitch potential clients for their services, their slides on LLM, AI, ML, etc., must be their own. Whether they use it or not for the services does not matter. These are like the side projects that service companies release to help them close their clients.
- cess11 11 days ago
  Same reason they jumped on the clown bandwagon, it's the kind of offering it's expected to have when you're a company like that. Huge size, leading research departments, big enterprise customers.
  They've been doing "AI" for ages. Notably Watson over the last couple of decades or so.
  [-]
  - jujube3 5 days ago
    What is "the clown bandwagon"?
- Jedd 11 days ago
  > ... models that have no chance of competing with ...
  I've not seen any proper evaluations for Granite against, say, Llama or Mistral.
  Until we do it's probably too early to say they can't compete, at least in some areas where others perform poorly.
  [-]
  - abdullin 11 days ago
    They are Ok-ish.
    Previous Granite models were on the level of first llama in my benchmarks.
    I’m expecting this version to be roughly comparable to llama 2
- logicchains 11 days ago
  >I wonder why companies like IBM are jumping on the LLM bandwagon and training/releasing models that have no chance of competing with Llama/Mistral
  Did you even read the benchmarks they post on that link? Assuming they're not outright lying, their 8B model is superior to Llama/Mistral models of the same size for coding tasks.
- papruapap 11 days ago
  prob getting some reputation in AI space will help them to sell watsonx. tbf, watson predates Transformers paper.
- halJordan 11 days ago
  On the other hand i spend my time wondering why people like you think someone should just throw away their ideas simply because there's already someone in the niche.