Results 1 to 7 of 7
31-07-2014, 07:57 PM #1
How would an AI confirm another AI's asimovian compliance?
I was writing a narrative context to a game. Outside of a lazy 'scanning" explanation how would a cleaner AI detect a' bad AI'? I was thinking of things like file integrity codes and behavioural compliance tests, any thoughts?
31-07-2014, 10:04 PM #2
- Join Date
- Jun 2014
- West Coast US
You're talking about Asimov's Laws of Robotics? Like, never allow a human to come to harm through action or inaction?
Those are kind of wishy washy anyways in that "tharr more like guidelines" because to actually identify and judge harm requires something close to omniscience.
Some of it depends on how your AI come to be. Some kind of retrofuture might have programmed AI, in which case they might keep their data nice and separate from their code, and be scannable, but in a more realistic scenario, that wouldn't really be possible-- each AI would rapidly become a special snowflake, just like people do. So I would think "bad" AI would have to be identified just like "bad" people are identified: through their actions.
Depending on the rest of your tech, this might be easier, or more difficult. For instance, if your AIs are brains-in-a-jar, maybe you could hook them up to a simulation and they wouldn't be able to know the difference between it and reality; maybe you could even run the simulation at high speed. (This is more reasonable with a well-understood AI than with a mysterious one.) Then you could just give it a few bazillion years and check against a control: are fewer simulated people harmed with the AI in the world, or are more simulated people harmed? Obviously, it's limited by your ability to simulate.
But if you lean away from the brain-in-a-jar model, your AI might not be so easily duped about what's real and what's not. A sociopathic AI would probably do very well at hiding its sociopathy from any standardized tests that it knew about-- just like sociopathic people, bad AI would tell you what you wanted to hear, then act selfishly afterwards.
Just thinking out loud here-- interesting question, looking forward to hearing other responses.
31-07-2014, 10:25 PM #3
Yeah, I'm not hemmed in by any set of sciences, but I'd like to veer away from brain in jar and toward rapidly learning super sentience. Indeed, whole 'battles' could be over in a minute if you forget to secure a wifi.
Some 'augmented meat bags' would be hackable but mostly it's about server mainframes adding new servers to their network without getting hacked by the others while utility bots clack about wielding each other.
01-08-2014, 08:06 AM #4
Maybe you'd end up with an arms race between a set of scenarios designed to determine an AI's 'friendliness' - a battery of simulations, thousands of them, but all concluded in a human eyeblink - and the knowledge needed to produce the result the examiner wants to see.
01-08-2014, 08:37 AM #5
I tend to agree with what others have said, that running the AI in a more or less controlled environment and watching what it does would be the most viable option to determine an AIs friendliness.Want to add me on Steam? Steam name: Mr. Gert
01-08-2014, 08:52 AM #6
I actually think AIs would do what some people have done use game theory to really succesfully weed out the horrible AIs.
In the medieval era judges if they didn't have evidence either way would get the accused to choose between paying the accuser or doing a church ordeal which usually meant putting your hand in boiling water and if your innocent god would not burn you. Seems dumb right? Well the monks were clever they knew everyone was super religious so they knew that the guilty person would pick the first one since they know god wouldn't protect them and an innocent person would choose the second one. So to protect them and show that it does work they turned down the heat so they wouldn't get burned.
An AI would devise a test that catches out a bad AI while leaving the good ones to go through.
01-08-2014, 08:58 AM #7