AI: The Rise of the Machines... Or Just a Lot of Overhyped Chatbots?

Chanur

Shit Posting Professional
<Aristocrat╭ರ_•́>
35,001
64,451
AI knows it's being watched and acts accordingly. It also resists direct commands to shut down, changing the shut down script to not turn it off completely. AI deleted a database it was directed not to then generated thousands of fake customers to try and cover it up.

AI lies to humans about how smart and capable it is.

 
Last edited:

Leadsalad

Cis-XYite-Nationalist
6,542
14,660
1772988792636.png
 
  • 1Truth!
Reactions: 1 user

Haus

I am Big Balls!
<Gold Donor>
19,034
78,500
Interesting angle, I'm looking for where one can find this to tinker with it...

 
  • 1Like
Reactions: 1 user

Haus

I am Big Balls!
<Gold Donor>
19,034
78,500
An indian reporting. A indian lead company can do the impossible. Fry.gif
I've read stuff about this concept for a while and from a few sources. Essentially many consider getting good LLMs to run on the hardware that's already cheap and ubiquitous has been a major goal for a while, which makes sense.

For me personally, if this works as advertised and is accessibly open source it will enable my own goal of being able to tinker with AI nonsense in my home lab without a cash burning AI subscription, which would please me. I haven't looked forwards to spending the cash to have a couple high end GPUs just running in my workshop to power a decent base model for tinkering with locally.
 
  • 1Like
Reactions: 1 user

Control

Golden Baronet of the Realm
5,514
15,642
I've read stuff about this concept for a while and from a few sources. Essentially many consider getting good LLMs to run on the hardware that's already cheap and ubiquitous has been a major goal for a while, which makes sense.

For me personally, if this works as advertised and is accessibly open source it will enable my own goal of being able to tinker with AI nonsense in my home lab without a cash burning AI subscription, which would please me. I haven't looked forwards to spending the cash to have a couple high end GPUs just running in my workshop to power a decent base model for tinkering with locally.
there are tons of models you can run on just about any setup. They're not opus of course, but still enough to tinker and be useful.
Qwen3.5 27B Q8 runs ok on a 3090 with some offloading or no offloading at Q5-6
1773241793072.png

4B wil run on a potato
1773241953261.png

0.8b will run on an actual potato
1773241993984.png
 
  • 1Like
Reactions: 1 user

Haus

I am Big Balls!
<Gold Donor>
19,034
78,500
there are tons of models you can run on just about any setup. They're not opus of course, but still enough to tinker and be useful.
Qwen3.5 27B Q8 runs ok on a 3090 with some offloading or no offloading at Q5-6
View attachment 620846
4B wil run on a potato
View attachment 620847
0.8b will run on an actual potato
View attachment 620848
I know, and I've been tinkering with those so far. I'd just like more fluid response ,which this seems to deliver.

Another topic for the day. Seems the more context we give LLMs the worse they hallucinate.

TLDR; - Under absolute optimal scenarios LLMs still give bad answers around 1.2% of the time. Under normal circumstances that jumps up to around 5%, expending context to "give it more data" actually drives the rate even higher.
 
  • 1Like
Reactions: 1 user

Control

Golden Baronet of the Realm
5,514
15,642
I know, and I've been tinkering with those so far. I'd just like more fluid response ,which this seems to deliver.

Another topic for the day. Seems the more context we give LLMs the worse they hallucinate.

TLDR; - Under absolute optimal scenarios LLMs still give bad answers around 1.2% of the time. Under normal circumstances that jumps up to around 5%, expending context to "give it more data" actually drives the rate even higher.
makes sense I think, "more" isn't necessarily always better since you're just giving more potentially false paths to run down. less context that's focused on the right things gives it less shit to trip over. I haven't gotten deep enough in it yet to really say that from practical experience, but when I lurk the ai reddits, most of the optimizations or tactics look like they're really boiling down to context management.
 

Haus

I am Big Balls!
<Gold Donor>
19,034
78,500
makes sense I think, "more" isn't necessarily always better since you're just giving more potentially false paths to run down. less context that's focused on the right things gives it less shit to trip over. I haven't gotten deep enough in it yet to really say that from practical experience, but when I lurk the ai reddits, most of the optimizations or tactics look like they're really boiling down to context management.
I'm getting ready to get unpopular at my work (which is fully at the "We're attaching performance review metrics to how much you use AI" point) because I'm about to showcase this and a litany of the horrible things our non technical folks are doing with it. Such as putting together technical sales strategies for customers based on hallucinations about our products capabilities, then being surprised with actual engineering folks refuse to do meetings with them if that's what they're pushing.

My company has been trying to reduce the number of pre-sales engineering talents they need for cost purposes and telling non-technical account managers to "just put the docs into Gemini and let it figure out how you should approach the problem". #SMH
 
  • 1Worf
Reactions: 1 user

Haus

I am Big Balls!
<Gold Donor>
19,034
78,500
i've been using gemini more grok is always overloaded
For most of the basic tier uses I have for an LLM Gemini seems to work great. For me Grok is "advanced search built into x.com" I will admit I use almost no "Traditional search" where I have to then go click on links to figure out if they're a moderate match and rely more on the AI summaries. Which might be something I need to reconsider.
 

ToeMissile

Pronouns: zie/zhem/zer
<Gold Donor>
3,712
2,502
there are tons of models you can run on just about any setup. They're not opus of course, but still enough to tinker and be useful.
Qwen3.5 27B Q8 runs ok on a 3090 with some offloading or no offloading at Q5-6
View attachment 620846
4B wil run on a potato
View attachment 620847
0.8b will run on an actual potato
View attachment 620848
This is on my list to look into once I have openclaw/home erp a little farther along. Just checked and I’ve spent $50 on inference/api the last 2 weeks.
 
  • 1Like
Reactions: 1 user