“Not Your Data Storage Infrastructure, Not Your NFTs”:
Decentralised data storage as a resilience problem
Kelsie Nabben
31 January, 2022
This piece explores the issue of persistent storage of content as part of resilience in distributed networks. I focus on Non-Fungible-Tokens (NFTs) as a general use case for content addressing and data storage. It sits in the broader context of ongoing research on “socio-technical resilience” of more sensitive personal, cultural, or political data with IPFS as a Protocol Labs Fellow.
As a public, decentralised cryptocurrency protocol, the rules of Bitcoin ownership are “not your keys, not your coins”. Similarly with decentralised data storage and utilisation, the rules are “not your storage architecture, not your NFTs”.
“The devil is in the metadata”
When I asked multiple high profile NFT developers in Australia how they store their NFTs, they answered “IPFS”, which of course, is a content addressing protocol, not a storage solution, or “I’m focusing more on the Ethereum side of web3 [development], rather than storage”. Yet, how and where data is stored and utilised will become increasingly pertinent as Web3 goes up in interest and value.
The spectre haunting the modern world of NFTs is data persistence.
On the 12th of March, 2021, “@jonty” on Twitter posted “The NFT token you bought either points to a URL on the internet, or an IPFS hash. In most circumstances it references an IPFS gateway on the internet run by the startup you bought the NFT from.”
An NFT is a JSON metadata file. In other words, NFTs are collatoralised information.
@jonty went on to outline that the $69 million Beeple NFT, sold by Christie’s art house is hosted on the Nifty NFT platforms servers. This means that when they go down, get hacked, or go out of business, the NFT could effectively disappear, throwing up the dreaded 404 error message because the content no longer exists. According to Associate Professor Marta Poblet, “the devil is in the metadata”.
Data persistence is a largely underrated undercurrent of the NFT craze.
When a friend later called to tell me he was “aping” into a “Pudgy Penguin” NFT copycat project, I asked him, “but where is the data stored?”. “No idea” he replied.
[Andrey Metelev, Unsplash]
Storage Architecture Matters
An NFT is a digital asset which represents a digital or a physical good by “pointing” to what it represents, such as an image, piece of multimedia, or in-game asset. The value of an NFT is based on what it represents or provides access to. Metadata is the descriptive data which provides information about what at NFT points to and represents.
NFTs are ERC721 token standard wrapped data, that can then be collateralised, utilised, yielded, farmed, etc.
Knowing your decentralised storage solutions
IPFS is just one peer-to-peer content storage protocol that aims to address this issue.
It works by storing content on people’s computers (nodes) in the network (known as “addressing”) and allowing this content to be requested by others. When content is “addressed” on IPFS, the data is assigned a “content identifiers” (CID) which points to the specific file. IPFS nodes talk across the peer-to-peer network to find the closest node with a copy of that data and deliver it to your device.
There are multiple ways to access content on IPFS from the web browser, with varying levels of reliability and decentralisation. This includes:
Via a HTTPs gateway, which offers a URL to content on the IPFS network (for example, https://ipfs.io/ipfs/QmPAg1mjxcEQPPtqsLoEcauVedaeMH81WXDPvPx3VC5zUz). There are multiple gateway providers, of which Cloudflare is the biggest.
The main use case for this approach is to pull a piece of content from the network.Opening your own gateway in the browser by running some lines of code. This has a number of limitations in terms of connectivity, and centralisation as it is still reliant on Domain Name Service (DNS) providers for more fluid content routing.
This is the main approach taken by NFT platforms to pull and save data locally and eventually publish to the IPFS networkVia a web browser, such as Brave browser or Tor’s IPFS integration. Brave works via a public https gateway (see 1), or running a local IPFS node through the browser (see options under “extensions”. This allows you to access previously accessed content while offline, verify content, and serve content to the rest of the network.
This approach allows people to publish directly to the IPFS network and request content without going through a gateway. It is also more persistent, although a DNS provider could block access to users.
Alternately, infrastructure management and maintenance can be delegated to third-party “pinning” services, such as “NFT.storage”, “Pinata” and “Eternum”.
Understanding these subtle distinctions in infrastructure is often beyond the everyday user but could be make or break for the value of your next NFT. In terms of resilience, the strongest guarantee that the IPFS network can serve your content back to you if you run your own external node, host a copy of that content, and maintain that data.
Individual Ownership in Public Networks
Another commonly misconstrued reality is that decentralised digital infrastructure is not inherently private. The IPFS network and public blockchains are…public. Big companies run deep analytics to track metadata to guide investment strategies, or to train and deploy devilish front running algorithm bots to beat your trades.
For example, Nansen.ai have an “NFT leaderboard” to track the greatest spenders and profit takers. The transparency of your name, wallet, trades, holdings, and gains. Their “top” person is currently “Pransky”, at 3.4 Million USD. Yet, with this amount of wealth and the ability to link one’s identity between their address, Twitter, and other identifiers, your personal holdings is not necessarily something people may want to show off.
People don’t realise that privacy on decentralised networks often requires additional know how and steps. For example, IPFS transport is encrypted but content is not encrypted or private. Greater privacy on the IPFS network requires additional steps, such as encryption and the use of an onion router like Tor.
At this point, ensuring the resilience of how this data is stored is not completely clear. If you run a node, how can you ensure redundancy if your computer is compromised? What is missing from data management (meaning content addressing and storage) in the decentralised web is a coherent logic framework to unify data governance and infrastructure management for users to operate as participants to ensure storage best practice, in the context of their data.
The importance of NFT storage best practice was highlighted in the Tezos community when the largest NFT platform and most active Tezos app “HicetNunc” (or HEN) web page went down.
Case study: CID Recovery Mission
The infrastructure for HicEtNunc was dependent on the project’s founder, rather than community owned and maintained.
Web3 infrastructure integration project “DNS.xyz” reports that “On November 12, @HicEtNunc2000 changed its profile description to discontinued. The founder, Rafael Lima, was unreachable by the community. Thousands of creators were wondering what would happen to their livelihood. Tens of thousands of collectors worried about their NFTs. The ecosystem was imperiled.” Even though the website was allegedly “yanked”, limiting people’s access to view NFTs on the marketplace, anyone that wanted to take down the NFTs on HicEtNunc could have done so by attacking one person’s box and deleting them. HicEtNunc was a “house of cards”.
After seeing users complain on Twitter, Protocol Labs team members had been trying to encourage a senior HEN engineer to migrate to nft.storage for months. When the site went down, it was unclear if people’s NFTs could be recovered and the community was panicking about the loss of cultural value and the economic impacts of no longer being able to access their NFTs. People started sharing “Hen R.I.P. Hen died.” on Twitter.
The team at DNS worked with “TezTools” to recover pre-existing lists of NFT CIDs, run a script to find CIDs and pin them with Pinata, and set up a clone of the previous website using a pre-existing mirror. In the end, between 110 to 449 NFTs were lost, out of over 1.4 million CIDs. The crisis began on a Thursday, and by Saturday evening 4 Terabytes of data had been migrated and the site, contract, and storage worked.
Now, the NFT data is mirrored three times on different machines and dedicated nodes on Pinata.cloud. A new URL was deployed at HicEtNunc.art. Although some community members were quite aggressive in response to the unilateral move to recover HEN, the general response was positive and the site is under the ownership of the TezTools.io team, until a community DAO is in place to determine ownership. Following the stock take of the website, smart contract behind the marketplace, and data storage, the teams involved and the community are still working on improving the platform.
Today, @hcetnunc-community reports in their Twitter bio that they are “continued”. “The fact that @hicetnunc2000 died and was respawned within hours is one of the most futuristic things I’ve seen in the #NFT community thus far.” states Justin Scordianos on Twitter.
Discussion & Findings:
There were a number of vulnerabilities that limited the resilience of HEN. NFTs on the HicEtNunc marketplace were being content addressed on IPFS, and stored on “Infura”, an enterprise level blockchain application layer for hosting, APIs, and development tools. Data was only being pinned to a single Infura instance. If nobody else runs a node that has this same data pinned, it would be gone forever.
IPFS is “not a magic cloud where storage comes for free”. “Simply being on IPFS doesn't make something decentralized” states “Shokunin” from the DNS team, who thinks that people conflate data storage on IPFS with blockchains and don’t know how or where data is stored.
So far, NFT storage has not been a participatory practice where people expect to engage with their storage architecture. In HEN, there was a missing governance layer to contextualise the role of users taking responsibility for storing their own data. In a Web2.0 mindset, people are “users” and it doesn’t matter if data is stored and utilised by a third party. In the Web3.0 mindset, people have to take responsibility for owning their own data by participating in peer to peer infrastructure.
The next step for HEN has been a community proposal for a DAO to collectively take responsibility for governing HEN infrastructure. The proposal for a DAO holds promise for the resilience of the NFT CIDs going forward, as the community can actively participate in how data is stored and governing the infrastructure of this.
Decentralised Storage Best Practice
At present, there is no clear best practice standard of how data should be governed and stored in Web3.0.
NFTs, although of huge monetary value and interest, are only one example of why resilient, long-term, data persistence matters. As well as NFTs as a cultural and business case, the decentralised web is used to store all kinds of content, including sensitive political or cultural information. There are several challenges here pertaining to who runs the servers that hosts the nodes to maintain the nodes that host the data, if not the individuals themselves? @Stammy asks in relation to NFTs, “is the onus on the owner…?”.
It is also imperative for system designers to design with resilience, meaning long term data persistence, in mind. @scott_lew_is on Twitter urges potential buyers to value NFTs based on it outlasting the issuing organisation.
The NFT recovery case study surfaces themes on the importance of governance frameworks that contextualise participation and responsibility for data storage in decentralised digital infrastructure to guide how people interact with storage for resilience.
Rather than relying on third-party NFT marketplace platforms that are less incentivised to maintain that content or may be vulnerable to attack, NFT owners and communities could collectively govern and back up their metadata by claiming greater ownership over their data storage architecture. This could look like NFT investing DAOs (such as FlamingoDAO and PoolTogether) ensuring that they copy and create multiple back-ups of the CIDs of their collections. It could also look like holders of certain classes of NFTs (such as “CryptoPunks” and “Pudgy Penguins”), collaborating to make numerous, redundant copies of NFT metadata between DAO members to ensure persistent and resilient data storage, and collectively manage and govern storage at a local-first level.
Conclusion
This piece has explored data persistence in digital infrastructure as an essential component of socio-technical resilience, using the case study example of NFTs.
In IPFS, resilience is the way that people individually and collectively participate in the network to store and govern data. It is context specific, and requires active participation of stakeholders to govern the storage of their data.
Data storage resilience could be improved by leveraging existing Web3.0 data governance frameworks, such as Decentralised Autonomous Organisations (DAOs), to coordinate data storage infrastructure and utilisation in a way that owners of data are able to govern how it is stored and used. This theme is further explored in forthcoming writing on the content addressing and storage of culturally and politically sensitive data on IPFS, and “DAOs as Data Trusts”.
Further Resources:
Further resources on NFT metadata infrastructure architecture:
https://docs.ipfs.io/how-to/best-practices-for-nft-data/
https://nftschool.dev/
https://nft.storage
https://checkmynft.com/
web3.storage
Suggested citation:
Nabben, K. 2022. ““Not Your Data Storage Infrastructure, Not Your NFTs”: Decentralised data storage as a resilience problem”. Available online: [URL]
Disclaimer:
I am a Protocol Labs PhD Fellow, which includes a stipend, and this research contributes to a broader project on socio-technical resilience in IPFS.
Acknowledgments:
Acknowledgement of my colleague Professor Marta Poblet at RMIT Blockchain Innovation Hub who flagged this issue of NFT metadata early and who’s work continues to inspire mine. Thank you to Protocol Labs and others who participated in research interviews and Zargham for introductions. Thank you also to Ryan Miller and Dietrich Ayala for review.