[Question] What is bias in alignment terms?

hKmtEKBqoKTAHgiZt

title

authors

Jonas Kgomo

date_published

2022-05-04T21:35

score

omega_karma

votes

tags

source

lesswrong

Context: I was in a seminar on Algorithmic Bias, it seemed to me that they were ignoring the possibility the AI systems themselves being biased. When it comes to algorithmic bias it seems like there are two ways to think about the problem. It seems to me that the trolley problem could be re-stated as a two way problem: Part I : The human decides what to choose when it comes to the path of the train (to be biased or not) Part II: The trail itself is redesigned to avoid bias (architecture is possibly biased) The trail here is a metaphor for neural networks and reinforcement learning environments that are susceptible to bias and untruthful AI. Assume that an AI architecture could be optimised to avoid lying, then the question is could you have a trail re-organisation that reduces the risk of AI dishonesty. Is this a fair analysis of bias

Comment

id

PeiuEfKa89bykp3kM
authors

Charlie Steiner
score

2
omega_karma
votes

1
date_published

2022-05-05T03:34

https://www.lesswrong.com/posts/hKmtEKBqoKTAHgiZt/what-is-bias-in-alignment-terms?commentId=PeiuEfKa89bykp3kM

I am confused about what you even mean at several points. Maybe try re-explaining with a more typical example of bias, as clearly as you can?

Comment

id

adMhwyKJLj6Nb2HcW
authors

Jonas Kgomo
score

3
omega_karma
votes

2
date_published

2022-05-05T11:02

https://www.lesswrong.com/posts/hKmtEKBqoKTAHgiZt/what-is-bias-in-alignment-terms?commentId=adMhwyKJLj6Nb2HcW

Is bias simply human in the loop problem(is it something that can be solved by data refinement and having diverse programmers), or is it also related to explainability of AI, the fact that we can not explain why AI decided to make some decisions. A simple example would be if an AGI was supposed to identify extreme ideology in a persons posts on social media: one AI (honest) tells us an extreme person A is extreme, while the other AI (dishonest) tells us an extreme person B is **not extreme **(even thou it knows the person is extreme). In the above scenario, having a human trying to understand if there is bias would be futile, since the untruthful AI would basically perpetuate bias by lying about there being no bias. Does this mean algorithmic bias is beyond human in the loop, but also an architectural bias (if we had more causal models and logic in neural networks then we could have less of such bias and side effects).