Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Philosophical Investigations into AI Alignment: A Wittgensteinian Framework

José Antonio Pérez-Escobar*, Deniz Sarikaya*

*Korrespondierende/r Autor/-in für diese Arbeit

Abstract

We argue that the later Wittgenstein’s philosophy of language and mathematics, substantially focused on rule-following, is relevant to understand and improve on the Artificial Intelligence (AI) alignment problem: his discussions on the categories that influence alignment between humans can inform about the categories that should be controlled to improve on the alignment problem when creating large data sets to be used by supervised and unsupervised learning algorithms, as well as when introducing hard coded guardrails for AI models. We cast these considerations in a model of human–human and human–machine alignment and sketch basic alignment strategies based on these categories and further reflections on rule-following like the notion of meaning as use. To sustain the validity of these considerations, we also show that successful techniques employed by AI safety researchers to better align new AI systems with our human goals are congruent with the stipulations that we derive from the later Wittgenstein’s philosophy. However, their application may benefit from the added specificities and stipulations of our framework: it extends on the current efforts and provides further, specific AI alignment techniques. Thus, we argue that the categories of the model and the core alignment strategies presented in this work can inform further AI alignment techniques.

OriginalspracheEnglisch
Aufsatznummer80
ZeitschriftPhilosophy and Technology
Jahrgang37
Ausgabenummer3
ISSN2210-5433
DOIs
PublikationsstatusVeröffentlicht - 09.2024

Fördermittel

Open access funding provided by University of Geneva Open access funding provided by University of Geneva. The first author has been supported by two Postdoc. Mobility project grants by the Swiss National Science Foundation (P500PH_202892; P5R5PH_214160). The second author is thankful for the financial and ideal support of the Studienstiftung des deutschen Volkes and the Claussen-Simon-Stiftung as well as the Research Foundation Flanders (FWO) [grant number FWOAL950]. The views stated here are not necessarily the views of the supporting organisations mentioned in this acknowledgement.

TrägerTrägernummer
Studienstiftung des Deutschen Volkes
Claussen-Simon-Stiftung
Research Foundation Flanders
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen ForschungP5R5PH_214160, P500PH_202892
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Fund for Scientific Research - Flanders (FWO-Vlaanderen, Belgium)FWOAL950
Fonds Wetenschappelijk OnderzoekFWOAL950

    Fingerprint

    Untersuchen Sie die Forschungsthemen von „Philosophical Investigations into AI Alignment: A Wittgensteinian Framework“. Zusammen bilden sie einen einzigartigen Fingerprint.

    Zitieren