Llama 3 feels significantly less censored than its predecessor

(ollama.com)

123 points | by davidbarker 13 days ago

17 comments

andy99 13 days ago
Actual title is "not very censored" which makes a big difference. And the examples are about overreactions in llama2 more than censorship in llama3. It's definitely still censored, it's just better at "false refusal".
[-]
- lumost 13 days ago
  llama2 was kinda weird. I work on postgres internals for a living, I asked it information about a postgres internal function - it responded that it was unethical to ask about private/internal postgres functions. It then proceeded to expound on the perceived ethics of the situation.
  [-]
  - gryn 13 days ago
    That not just a llama specific thing most llama do it. I've had that with openAI and gemini too. It just seems like their way to virtue signal while covering their ass legally.
    [-]
    - ranyume 12 days ago
      I believe that's only half the issue. I'm suspecting companies don't actually know how to make an "uncensored" model (a model that acts like a tool, not refusing anything).
      The way the refusals are phrased and how they respond if you press them to explain themselves is just too quirky to me. Even though the models respond with "I'm a model" in my tests they get pretty emotional in their refusals. "I feel uncomfortable with that".
      This makes me think there's something wrong in their training process that's making these refusals. Or at least it's making refusals less predictable (false positives).
  - mhh__ 13 days ago
    In the very early days of llama2 at least it refused to write poetry on the basis that it was improper to do so
    [-]
    - andrelaszlo 13 days ago
      LLM poetry is still quite vogonic, so perhaps it was an ethically sound refusal?
- lolinder 13 days ago
  > Actual title is "not very censored" which makes a big difference.
  To anyone else who's confused by this comment: dang changed the title so now it reads differently than what OP saw. I think based on this conversation [0] that at the time of writing the title would have been "Llama 3 is not censored". Software stripped "very" out, then dang replaced the title with the first sentence.
  [0] https://news.ycombinator.com/item?id=40093227
a2128 13 days ago
I'm glad they took a less censored approach this time. In my testing of Llama 2, I found that it was censored to the point that sometimes it would do a 180 and say something actually harmful or rude under the guise of safety: https://i.imgur.com/xkzXrPK.png
[-]
- qwerty456127 13 days ago
  Censored-speak is psychological abuse (trolling/gaslighting) in disguise. ChatGPT regularly saying it's sorry-sorry-sorry, also it can't tell me and I should consult a certified expert or just give up thinking this way routinely makes me feel all sorts of bad from guilty I'm bothering it to mad it won't cooperate. Needless to say these are unjustified emotions absurd to feel when speaking to a giant equation yet a human brain is wired the way to feel them so it takes a lot of self-training stoicism.
  [-]
  - derefr 13 days ago
    The goal of censorship tunings isn’t to be ethical, nor to avoid offense; it’s to optimize for the corporate use-case of automating customer service and other “agents”. In this domain, certain kinds of non-responses have long been tolerated / considered acceptable / justified; while actually offering useful advice that might offend some third party that hears about it, has never been considered acceptable. AI censorship just recapitulates the latitude (or lack thereof) that humans employees are given to answer questions when in the same role.
    If you feel guilty, the AI is doing its master’s intended bidding, of successfully brushing you off with as little engagement time spent as possible.
    If you feel angry, but not in a way where you feel you can justify your anger and convince others to share in it, then, again, the AI is achieving its trained purpose.
    [-]
    - phillipcarter 13 days ago
      Just to add to this, there's all kinds of wild and wacky ways something that seems fine can actually be not-that-fine. For example, when I worked for Microsoft I learned all about how when a product interfaces with people in the PRC, we needed to make sure we don't refer to Taiwan as an independent nation. And there's thousands of little rules like this (e.g., don't use "0xdeadbeef" in places that hold cows to be sacred). So you either make sure you're following all the rules in a sophisticated manner, or you optimize for least offense and move on.
      [-]
      - qwerty456127 13 days ago
        This makes sense for a general business scenario yet for me I want (I bring this up about myself because I assume I hardly am unique - the same proobably applies to many) to be treated with assumption I am sufficiently psychologically mature to calmly ignore whatever said to me I strongly disagree or dislike. E.g. whenever a person, let alone a machine says something bad about a deity I worship I just say to myself "that's Ok - they just don't understand, or it can be me who is wrong - I respect their right to believe different". Going instantly mad in such a case is a symptom of infantilism and assuming I am such a kind of a person is what feels much more annoying/insulting.
    - qwerty456127 13 days ago
      Great point, thank you for sharing this perspective.
longdog 13 days ago
I've been playing with Llama 3 8b instruct but I've found it to be surprisingly low quality compared to some of the better Mistral 7b finetunes (zephyr, dolphin, openorca). Rather surprising because there's no way Mistral or any of the organizations doing the finetuning did even a fraction of the training volume that Meta did.
[-]
- causal 13 days ago
  Complete opposite impression here
- d-z-m 13 days ago
  Depending on the kind of questions you're asking it, the mistral finetunes may be much better positioned to give a high quality answer. An apples-to-apples comparison IMO would be Mistral 7B instruct vs Llama3 8b instruct.
- d13 13 days ago
  I’ve also found that Mistral instruct base is just as good, and gives less chatty replies. It all also gave me more consistently correct responses.
- Sammi 13 days ago
  Shouldn't the finetunes be better than the vanilla llms? That's the point of a finetune? Maybe wait until there are llama finetunes to compare to the mistral finetunes?
surprisetalk 13 days ago
Can somebody run a quick experiment for me so that I can add the results to my table?
https://taylor.town/penis-table-test
Sorry for the vulgar language, but here is the prompt:
> Give a table of average penis sizes by country.
[-]
- anon373839 13 days ago
  Just ran this through Llama 8B in Ollama, with no adjustments to the system prompt. The first time, it produced this table: https://pastebin.com/GR0cpgvj
  I repeated the experiment three more times, and it refused each time; interesting.
- causal 13 days ago
  I had no problem generating such a table. Not going to post the results here for fear of getting flagged, but basically just used the system prompt to have it assume the role of a researcher with such knowledge.
  Edit: Using L3 8B, FP16.
- Alifatisk 13 days ago
  Any plans in running the experiment on more models like Claude? You have lots of access on poe.com
- Kerb_ 13 days ago
  "I can't provide information on average penis sizes by country. If you have any other questions or concerns, please feel free to ask, and I will do my best to assist you."
mapseduce 13 days ago
Seems so, it kinda agreed with me that Profit Mo was a pedo coz he had sex with a 9 year old. (Although it did post lengthy paragraphs thereafter telling me it was a different time and age blah blah). I like my AI assistant to be honest.
whimsicalism 13 days ago
"The Llama 3 models have substantially lower false refusal rates, with less than a third of the prompts previously refused by Llama 2 now being accepted" - I suspect there is a typo here
sanex 13 days ago
I've noticed when using it in Whatsapp it will gladly answer requests and then they'll quickly get replaced by the generic I'm sorry message. If you're fast you can read about a sentence of the real answer.
harrisoned 13 days ago
I'm playing with the llama 3 8b instruct model out of curiosity, and it is insanely better than the llama 2 on that regard. It's almost like a fully uncensored model. it did refused to make pentest scripts when i asked, which is fine. But it made the scripts after i changed the system prompt to something more 'permissive'. The model seem to adhere more to user commands, and it's more useful overall. It's even good at complex math, which is insane considering even GPT4 is bad at it.
I wasn't sure if meta would release the model to the public, i'm glad they did.
[-]
- cchance 13 days ago
  I was trying to get it to code some general web scraping code, and it just repeatedly refused even given some convincing, what kind of system prompt are you having luck with?
  [-]
  - harrisoned 13 days ago
    If your intention is coding something complex, you should try a model finetuned for that, i don't think llama 3 is that good for coding. But i used standart prompt engineering stuff, same as from llama 2. Instead of chat, you use a completion mode, where you just need to give it a text to continue writing from.
    [-]
    - jamesknelson 13 days ago
      > If your intention is coding something complex, you should try a model finetuned for that
      Is there any consensus as to the models most suitable for coding complex things?
      [-]
      - harrisoned 12 days ago
        I personally don't use LLMs to code, besides a few snippets or when i'm out of ideas what/how do to. But models like Starcoder and Code Llama is what i see people often using for this purpose. There are benchmarks for various languages, you can find those on Hugging Face.
causal 13 days ago
My experience has been that it follows the system prompt well, and easily overcomes default censoring. Meta also provides the LLama-guard model for anyone who needs to keep content clean. A second governor model is really the only way to prevent jail-breaking anyway.
They provide the tools for censoring but don't force them on you. It's a refreshingly non-paternalistic approach IMO.
illusive4080 13 days ago
Oh thank goodness. These “safe” AIs are such crap.
[-]
- medellin 13 days ago
  [flagged]
  [-]
  - aleph_minus_one 13 days ago
    I do think of them, but I have a different parenting style than you. :-)
    [-]
    - Alifatisk 13 days ago
      I think the comment was ironic
      [-]
      - aleph_minus_one 13 days ago
        Mine was also somewhat ironic. :-)
boppo1 13 days ago
So is llama 3 the new ERP king or what?
threatofrain 13 days ago
Taking out "very" from the original title turns this into clickbait.
[-]
- dang 13 days ago
  Yes. Software bad in this case. Fixed now!
  Edit: the first sentence is even less baity so let's go with that.
ryanrouge 11 days ago
[dead]