A Language for Studying Knowledge Networks: The Ethnography of LLMs
Kelsie Nabben
16, May, 2024
Introduction
This post outlines a research agenda focused on alternative visions of LLMs to that of monolithic, winner-takes-all models, based on localised instances, interactions, networks, and outcomes. Adopting the lens of affordances of Large Language Models (LLMs), it sets out three core strands of investigation: (1) knowledge creation, (2) knowledge infrastructure, and (3) knowledge interoperability. The field for this study is an organisational experiment of a Knowledge Base-LLM integration, and an inter-organisational ‘knowledge network’ (Nabben, 2023; Zargham & Ben-Meir, 2023). The implications of this research include a deeper understanding of the politics of LLM infrastructure, for whom, and under what conditions, in relation to how LLMs augment and mediate human knowledge creation and interaction.
The Ethnographic Field Site(s)
A recently accepted working paper of mine sets out an approach to ‘AI as a constituted system’ to emphasise the social, technical, and institutional factors that contribute to responsible and accountable AI governance (Nabben, 2023). Through an ethnographic approach, the paper details the iterative processes of negotiation, decision-making, and reflection among stakeholders as they develop, implement, and manage an organisational Knowledge Management System (KMS) that is served to stakeholders via an LLM interface (designed by professional services organisation BlockScience, where I also work). In line with colleagues that initiated the experiment, this research suggests a future where AI is not universally scaled but consists of localised, customised LLMs that are tailored to stakeholder interests (Nabben, 2023; Zargham & Ben-Meir, 2023; Zargham, et. al., 2024). It also includes a framework for analysing or designing AI as a constituted system. The working paper concludes with the next phase of the LLM experiment: transforming a knowledge agent into a knowledge network.
This experiment is underway, in conjunction with colleagues at an open research collective non-profit, Metagov, in which I am a volunteer participant/contributor (Metagov, n.d.; Rennie, 2024). The field sites of BlockScience, Metagov, and possibly others (related to this experiment or otherwise), creating the opportunity for a multi-sided ethnography. At present, I am focusing on the BlockScience side (with colleagues at RMIT University Brooke Ann Coco focusing on Metagov, and Professor Ellie Rennie). This will allow us to conduct a comparative analysis of knowledge infrastructures across organisations (in terms of constitutive processes, purpose, and function), as well as analyse interactions between Metagov’s “KOI-Pond” LLM interface and BlockScience’s “KMS-GPT” LLM interface in relation to the concept of ‘knowledge networks’.
The Framing
This research adopts the theoretical framing of “affordances” that refers to how artefacts, including technologies, “request, demand, allow, encourage, discourage, and refuse” (Davis and Chouinard, 2016, p. 241). This framing allows me to ask not what affordances LLMs as artefacts have, but what are the mechanisms of how these artefacts afford, and for whom and under what circumstances (or conditions) do these artefacts afford (Davis and Chouinard, 2016). This approach is similar to the Science and Technology Studies ‘Actor-Network Theory’ (ANT) approach to studying material and immaterial networks (Latour, 2007), in that it assumes that technologies and people exist together as co-constitutive assemblages that influence and shape one-another (Davis, 2020, p. 46). However, it does not treat human and non-human actants as equal, but traces the role of human action in shaping and directing technologies (Schraube, 2009). The other assumption of the analytical approach of affordances is that technologies are embedded and imbued with politics and power (Davis, 2020).
Tracing technological affordances is a way to better understand the politics of infrastructure, which refers to the ways that technology directs particular forms of human action towards ordering in society (Winner, 1980). Arising from relationships and interactions between technology and people, the politics of infrastructure includes that modern politics are technologically mediated and built on material infrastructure, and that technology itself embodies specific forms of authority and power. Socio-technical infrastructures also release meaning and structure politics through the aesthetic, sensorial, desire, and promise (Larkin, 2013). Awareness of the political dynamics of technology allows for greater sensitivity to the stakeholders, authorities, and subjective political processes that constitute and shape technological tools, that then magnify and embody certain meanings.
The Core Research Themes
The core themes that I am concerned with ethnographically tracing are knowledge creation, transfer, and organisation. More specifically, I identify these under the key themes of:
Knowledge Creation
Knowledge Infrastructure
Knowledge Interoperability.
Each theme is explored in the sections that follow.
Knowledge Creation
“The limits of my language means the limits of my world.”
― Ludwig Wittgenstein, 1922 (2010, p. 74).
This investigation begins from the premise that knowledge can be conceived of as subjective forms of language. Language shapes our thoughts, beliefs, worldview, and interactions. In the above quote, philosopher Ludwig Wittgenstein suggests that the boundaries of our language define the boundaries of our understanding and engagement with the world. Wittgenstein argued that the meaning of words is not fixed or inherent but is shaped by their use in various forms of life.
Wittgenstein proposed that meaning in language emerges from its use within specific contexts of life, characterised by rule-governed activities or "language games” (Wittgenstein, 2010). A language game follows specific rules, and these rules determine the meanings of words within particular contexts. For example, the way language is used in a game of chess, where terms like "king" and "checkmate" have specific meanings, differs from how language is used in a grocery store or in a courtroom. This diversity in usage shows that understanding a language requires familiarity with the forms of life and contexts in which it is used. As a form of abstraction and world modelling, language is an interpretation of reality that can lead to shared understandings. Failure to understand the rules of a language game can also lead to misunderstanding.
Extending this framework to LLMs, these artificial systems are informed by, and potentially transform, the rules and contexts of human language games. LLMs function on the basic precept of natural language to provide an interface to underlying data sets (the ‘knowledge base’). In line with the concept of LLMs as language games, cultural scientists suggest that AI is composed of “embeddings” of human language (representations of complex data, into relational meanings and patterns so it can be processed by algorithms), or “embeddings of embeddings” (Potts, 2024). There are particular rules that govern and limit LLMs use of language, depending on the underlying (‘foundation’) model used, the architecture of a particular implementation, and any fine-tuning (such as, for political correctness). This component of the research design investigates the affordances and effects of LLM-mediated communication as language games that interpret and inform ways of knowing.
Knowledge Infrastructure
"Compute: the future’s most precious currency"
-- Sam Altman (Lex Fridman, 2024).
The second pillar of this research focuses on knowledge infrastructure, as the material elements that undergird LLMs, including computing hardware, cloud databases, and servers. Tracing the material elements that provide the foundational substrates on which LLMs operate is crucial to understanding how LLM infrastructure is constituted through social, technical, institutional, and economic means (Nabben, 2023).
The materiality of digital infrastructures refers not just to tangible, technical artefacts, their arrangements, and practices but also to the cultural significance of social practices of how people remediate, reconfigure, and reform them. Physical artefacts, such as machines and computers, are deliberately designed to shape and constrain human behaviour, including the decisions we make and the way we move through the world (Latour, 1992). These “interaction[s] between artefacts, practices, and social arrangements at the intersection of technical and human processes of continuous self-organisation (and reorganisation) of society and institutions…define, classify, circulate, and regulate power and knowledge” (Lievrouw, 2014, 45). A number of science and technology studies trace the materiality of computing technologies (including database technology) to uncover the influence of physical components on computing practices and software development as they form infrastructures that provide a basis for other activities (Star, 1999; Pinch, and Swedberg, 2008).
Underpinning all data are the necessary infrastructures to generate, analyse, and put them to use (Williamson, 2022). Data economies are predicated on transforming data inputs into data products, through the processing of data intermediaries (Sisson, et. al., forthcoming; Nabben, et. al., forthcoming).
Information security is a crucial consideration in the creation of LLMs, knowledge networks, and data economies. How infrastructure is architected, what data is included, and who has access are fundamental questions for the ethics of LLMs, but also to shaping power relations, interoperability, and LLM futures. These decisions, and how they are communicated, dictate not only the technical capabilities of LLMs but also their social outcomes, including reliability in terms of interaction patterns that users expect or want, and literacy of a model as it relates to perceived trustworthiness.
In the discourse on Artificial Intelligence (AI), and with LLMs as a subset of this technology, computing power (or “compute”) is a key input to AI development, deployment, and thus governance (Dafoe, 2018; Sastry, et. al., 2024). Compute is a fundamental, infrastructural layer of materiality, resource competition, and politics in the composition and landscape of building large-scale LLMs. The compute technology stack consists of electricity, microchips, software, domain-specific languages that can be optimised for machine learning, data management software, and data centre infrastructure (Vipra & Myers West, 2023, p. 4). Unequal access to these resources as AI becomes more and more computationally intensive raises concerns about “de-democratizing” knowledge production (Ahmed and Wahed, 2020, 1). Demand for compute is shaping the AI industry, and control of it by large firms (such as cloud service providers) results in significant influence.
In contrast, the advent of local scale, context-specific LLM-based knowledge networks welcomes the possibility of further investigation into alternative models of compute. Decentralised computing refers to interlinked nodes in a peer-to-peer network architecture that are supported by an underlying messaging protocol, in which there is no central authority (Asharaf & Adarsh, 2017). Efforts to build such infrastructure are underway, such as the project ‘Lilypad’, that provides a serverless, distributed compute network that enables data processing for AI, ML, and other computation (Lilypad, n.d.). These patterns could represent a return to “on-premise” computing, and a more traditional, self-reliant approach (Fisher, 2018), demanding further investigation.
This research strand sets out to map the knowledge infrastructure that undergirds decentralised knowledge networks, including the importance of information security, and the possibility of decentralised compute.
Knowledge Interoperability
"It's not about the LLM. The LLM is just the interface. It is interoperability between Knowledge Infrastructures."
- (Michael Zargham, BlockScience, research interview).
The theme of interoperability allows for an investigation beyond the interface of the LLM to the underlying data and knowledge infrastructure that it is interpreting and representing. Interoperability refers to the ability of different systems, devices, applications, or organisations to effectively share information.
Interoperability functions at multiple levels of knowledge infrastructure. In the context of technology and computing, interoperability involves the ability of two or more systems to communicate to exchange information and use that information without requiring significant reconfiguration (IEEE, 1991). Technical interoperability is an obsession in the era of Big Data, to enhance the useability of outputs for workflow efficiency, productisation, and/or monetisation (Kadadi, et. al., 2014). It is also a crucial consideration for open science (Pagano, et. al., 2013).
Beyond this, semantic interoperability is about ensuring that the meaning of the data is preserved and correctly interpreted across different systems, to ensure that diverse systems can understand and exchange data in a meaningful and consistent manner. Semiotic theory emphasises the dynamic nature of meaning-making, where signs do not exist in isolation but are part of interconnected networks within a "semiosphere" (Lotman, 1990). Lotman’s idea of the semiosphere is a space where multiple semiotic systems interact is particularly relevant to the development of interoperable systems. In the context of LLMs, semantic interoperability involves the ability of these models to interpret and generate text that maintains consistent meaning across different contexts and systems. It suggests that LLMs should be designed to navigate and interpret multiple overlapping systems of meaning, reflecting the real-world complexity of human language and communication.
Relatedly, the area of infrastructure studies has long been concerned with the (often invisible) role of standards and classifications in shaping worldviews and social interactions (Bowker & Star, 2000). Exploration of classification systems underscores the socio-technical dimensions of how information is organised, accessed, and used, revealing the inherent power dynamics and consequences of these systems. How data is classified and organised underpins the data sets used to train and/or inform the responses LLMs models. Concern with the particular worldviews that LLMs embody is crucial, as it can determine whether a knowledge infrastructure is semantically interoperable or not, as well as how knowledge infrastructure can be utilised but also sufficiently standardised to allow interoperability across instances (Zargham, Ben-Meir, & Nabben, 2024).
Data interoperability and automation techniques (including LLM interfaces) are relevant to domains of public policy and private governance. Numerous countries have data interoperability strategies and frameworks (Charalabidis, et. al., 2010). For example, the European data strategy aims to transform the European Union into a data-driven society. This vision of creating a single market for data to “flow freely within the EU across sectors” and capture cost savings (European Commission, n.d.). The initiative is supported by legislation on who can create value from data and how, such as the Data Governance Act (Eur-Lex, 2022), as well as the Interoperable Europe Act, that aims to strengthen cross-border interoperability and cooperation in the public sector (European Commission, 2022). While LLMs are one automation interface that can facilitate the user experience of data interoperability, policy makers are approaching LLMs with caution, as the legal, privacy, intellectual property, and cybersecurity implications, as we the architecture of “digital laws”, are still unfolding (Novelli, et. al., 2024). Yet, the legislation supports the creation of regulatory sandboxes GovTech cooperation, and experimentation with AI and other knowledge infrastructure technologies is underway in EU public services Services (Directorate-General for Digital Services, et. al., 2024), inviting scrutiny of both the benefits and “myths” they purport to offer (Janssen, et. al., 2012).
Meanwhile, knowledge management, human coordination, and automation are prevalent themes in domains of private governance. Private governance refers to the various forms of self-governance, self-regulation, and private enforcement that private individuals, companies and organisations use to create order (Stringham, 2017, p. 320). One example of such a domain is that of public blockchains (Berg, et. al., 2019). Experiments in these communities are already underway as to how information coordination, interpretation, and decision-automation can be applied in governance processes, such as with LLM powered “governatooors” (Nabben and Zargham, 2023). These devices utilise language models to automate decision-making, enforce rules, and manage information flows within decentralised communities, aiming to enhance efficiency and reduce human attention costs. Both public and private domains of knowledge interoperability and automation invite further investigation as to their design, alignment, and implications for ways of knowing.
Conclusion/Next Steps…
This research agenda actively engages with and explores alternative LLM futures by investigating and developing frameworks for LLMs that prioritise localised applications and collaborative networks over monolithic models. By focusing on knowledge creation, infrastructure, and interoperability, the study will continue to delve into the politics and conditions under which LLMs operate, offering insights into their role in augmenting and mediating human knowledge. Next steps include rigorous empirical investigation within the organisational experiment and inter-organizational knowledge network. This blog invites constructive engagement and feedback, particularly in relation to relevant field sites for this topic.
Acknowledgments:
With thanks to Professor Ellie Rennie from RMIT University, Dr. Michael Zargham from BlockScience, and Dr. Goran Gaber from European University Institute for inputs and/or feedback on this research.
References:
Ahmed, N., & Wahed, M. (2020). The De-democratization of AI: Deep learning and the compute divide in artificial intelligence research. arXiv. Available online: https://doi.org/10.48550/arXiv.2010.15581. Accessed 14 May, 2024.
Asharaf, S., & Adarsh, S. (2017). Decentralized Computing Using Blockchain Technologies and Smart Contracts: Emerging Research and Opportunities: Emerging Research and Opportunities. IGI Global.
Berg, C., Davidson, S., & Potts, J. (2019). Understanding the blockchain economy: An introduction to institutional cryptoeconomics. Edward Elgar Publishing.
Birenbaum, M. (2023). The Chatbots’ Challenge to Education: Disruption or Destruction?. Education Sciences, 13(7), 711. https://www.mdpi.com/2227-7102/13/7/71
Bowker, G., and Star, S.L. (2000). Sorting Things Out: Classification and Its Consequences. The MIT Press. https://doi.org/10.7551/mitpress/6352.001.0001
Dafoe, A. (2018). AI governance: a research agenda. Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442, 1443.
Charalabidis, Y., Lampathaki, F., Kavalaki, A., & Askounis, D. (2010). A review of electronic government interoperability frameworks: Patterns and challenges. International Journal of Electronic Governance, 3(2), 189–221. https://doi.org/10.1504/IJEG.2010.034095
Davis, J. L. (2020). How Artifacts Afford: The Power and Politics of Everyday Things. The MIT Press. https://doi.org/10.7551/mitpress/11967.001.0001
Davis, J. L., & Chouinard, J. B. (2016). Theorizing Affordances: From Request to Refuse. Bulletin of Science, Technology & Society, 36(4), 241-248. https://doi.org/10.1177/0270467617714944
Directorate-General for Digital Services, European Commission, Brizuela, A., Montino, C., Galasso, G., Polli, G., Bosch, J. M., De Vizio, L., Tangi, L., Combetto, M., & Gori, M. (2024). Public Sector Tech Watch :Mapping innovation in the EU public services: A collective effort in exploring the applications of artificial intelligence and blockchain in the public sector. https://policycommons.net/artifacts/12293851/public-sector-tech-watch-mapping-innovation-in-the-eu-public-services/13188294/
Eur-Lex. (2022). Regulation (EU) 2022/868 of the European Parliament and of the Council of 30 May 2022 on European Data Governance and Amending Regulation (EU) 2018/1724 (Data Governance Act) (Text with EEA Relevance), 152 OJ L. http://data.europa.eu/eli/reg/2022/868/oj/eng
European Commission. (2022). “Interoperable Europe Act Proposal”. European Commission. Available online: https://commission.europa.eu/publications/interoperable-europe-act-proposal_en. Accessed 13 May, 2024.
European Commission. (n.d.). “European data strategy”. European Commission. (n.d.). Available online: https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/european-data-strategy_en. Accessed 13 May, 2024.
Fernandez, R. C., Elmore, A. J., Franklin, M. J., Krishnan, S., & Tan, C. (2023). How large language models will disrupt data management. Proceedings of the VLDB Endowment, 16(11), 3302-3309. https://dl.acm.org/doi/abs/10.14778/3611479.3611527
Fisher, C. (2018). Cloud versus on-premise computing. American Journal of Industrial and Business Management, 8(9), 1991-2006. DOI: 10.4236/ajibm.2018.89133
Frederick, D. E. (2023). ChatGPT: a viral data-driven disruption in the information environment. Library Hi Tech News, 40(3), 4-10. https://www.emerald.com/insight/content/doi/10.1108/LHTN-04-2023-0063/full/html
Fridman, L. (2024). “Sam Altman: OpenAI, GPT-5, Sora, Board Saga, Elon Musk, Ilya, Power & AGI.” Lex Fridman Podcast #419. March 19. Available online:
. Accessed 9 May, 2024.
IEEE (1991). IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries. In IEEE Std 610 , vol., no., pp.1-217, 18 Jan. 1991, doi: 10.1109/IEEESTD.
Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, Adoption Barriers and Myths of Open Data and Open Government. Information Systems Management, 29(4), 258–268. https://doi.org/10.1080/10580530.2012.716740
Kadadi, A., Agrawal, R., Nyamful, C., and R. Atiq. (2014). "Challenges of data integration and interoperability in big data”. IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 2014, pp. 38-40, doi:10.1109/BigData.2014.7004486.
Larkin, B. (2013). The Politics and Poetics of Infrastructure. Annual Review of Anthropology, 42(1), 327–343. https://doi.org/10.1146/annurev-anthro-092412-155522.
Latour, B. (1992). "Where Are the Missing Masses? The Sociology of a Few Mundane Artefacts." In Shaping Technology/Building Society: Studies in Sociotechnical Change, edited by Wiebe E. Bijker and John Law, 225–59. Cambridge, MA: The MIT Press, 1992.
Latour, B. Reassembling the Social – An Introduction to Actor-Network Theory. (Oxford University Press, Oxford: 2007).
Lievrouw, L. (2014). Media Technologies: Essays on Communication, Materiality, and Society. edited by Tarleton Gillespie, Pablo J. Boczkowski, Kirsten A. Foot. MIT Press.
Lilypad. (n.d.). Lilypad Home. Available online: https://docs.lilypad.tech/lilypad/. Accessed 9 May, 2024.
Lotman, J. (1990). Universe of the Mind: A Semiotic Theory of Culture. Translated by Ann Shukman, Indiana University Press.
Metagov. (n.d.). “KOI Pond”. Available online: https://metagov.org/projects/koi-pond?token=a9d5fd404118ec01d7028aeba5db36169a12c1c9. Accessed 10 May, 2024.
Nabben, K., & Zargham, M. (2023). “Governatooorr Guardrails: Practical considerations when introducing automated governance agents.” November 10. Substack (blog). Available online: https://kelsienabben.substack.com/p/governatooorr-guardrails-practical. Accessed 1 May, 2024.
Nabben, K. (2023). “AI as a Constituted System: Accountability Lessons from an LLM Experiment”. (September 1, 2023). SSRN. Available online: https://ssrn.com/abstract=4561433. Accessed 1 May, 2024.
Novelli, C., Casolari, F., Hacker, P., Spedicato, G., Floridi, L. (2024). “Generative AI in EU Law: Liability, Privacy, Intellectual Property, and Cybersecurity”. arXiv. Available online: https://doi.org/10.48550/arXiv.2401.07348. Accessed 13 May, 2024.
Pagano, P., Candela, L., & Castelli, D. (2013). Data interoperability. Data Science Journal, 12, DOI: 10.2481/dsj.GRDI-004
Pinch, T., and R. Swedberg, eds., Living in a Material World: Economic Sociology Meets Science and Technology Studies (Cambridge, MA: The MIT Press, 2008).
Potts, J. (2024). “Embeddings” SSRN. Available online: https://ssrn.com/abstract=47956. Accessed May 14, 2024.
Rennie, E. (2024). “KOI-Pond: The creation of a synthetic deme”. Medium (blog). April 23. Available online: https://ellierennie.medium.com/koi-pond-the-creation-of-a-synthetic-deme-999a6f1f3426. Accessed May 7, 2024.
Sastry, G., Heim, L., Belfield, H., Anderljung, M., Brundage, M., Hazell, J., O’Keefe, C., Hadfield, G. K., Ngo, R., Pilz, K., Gor, G., Bluemke, E., Shoker, S., Egan, J., Trager, R. F., Avin, S., Weller, A., Bengio, Y., & Coyle, D. (2024). Computing Power and the Governance of Artificial Intelligence (arXiv:2402.08797). arXiv. https://doi.org/10.48550/arXiv.2402.08797
Schraube, E. (2009) “Technology as Materialized Action and Its Ambivalences.” Theory & Psychology 19, no. 2. 296–312.
Star, S. L. (1999). “The Ethnography of Infrastructure”. American Behavioral Scientist, 43(3), 377-391. https://doi.org/10.1177/00027649921955326.
Stringham, E.P. (2017). Private Governance. In: The Routledge Handbook of Libertarianism by Brennan, J, van der Vossen, B., & Schmidtz, D (Eds.). https://doi.org/10.4324/9781317486794
Vipra, J., & Myers West, S. (2023). “Computational Power and AI.” AI Now Institute. Available online: https://ainowinstitute.org/publication/policy/compute-and-ai. Accessed 9 May, 2024.
Williamson, B. (2022). Governing through infrastructural control: Artificial intelligence and cloud computing in the data-intensive state. In The SAGE Handbook of Digital Society (pp. 521-540). Sage.
Winner, L. (1980). Do Artifacts Have Politics? Daedalus, 109(1), 121–136. https://www.jstor.org/stable/20024652
Wittgenstein, L. (2010). Philosophical Investigations. Translated by Odgen, C.K. Wiley. (Original publication, 1922).
Zargham, M. & Ben-Meir, I. (2023). “A Language for Knowledge Networks”. BlockSciene (blog). Available online: https://blog.block.science/a-language-for-knowledge-networks/. Accessed 7 May, 2024.
Zargham, M., Ben-Meir, I., & K. Nabben. (2024). “Knowledge Networks and the Politics of Protocols”. BlockScience (blog). Available online: https://blog.block.science/knowledge-networks-and-the-politics-of-protocols/. Accessed 7 May, 2024.