Sara M. Watson | Metaphors of Big Data
Keywords: byproduct, corporations, data, digital economy, embodied, google, industrial product, james geary, johnson, lakoff, liquid, metaphors, natural resource, oil, personal data, quantified self, sally wyatt, sarah watson, trendy
Dissecting the industry-centric bias at the core of our cultural understanding of data.
Can you fathom the depths of big data? The word fathom is a measurement of depth of the ocean, but it has also come to mean the ability to understand something. Fathom comes from faethm, meaning ‘the two arms outstretched.’ It’s 6 feet or 1.8 meters measurement is based on a standard human scale. The length of rope dropped overboard is handily measured across the span of a sailor’s armspread. The term makes the metaphorical jump to describe concepts that we are able to get our arms around; ideas are things to be grasped. As James Geary describes in his book on metaphor, “This is the primary purpose of metaphor: to carry over existing names or descriptions to things that are either so new that they haven’t been named or so abstract that they cannot be otherwise explained.”
Data has become so big it is difficult to fathom. As a technocratic, scientistically-oriented culture, we are in the midst of understanding computing on a new and ever-evolving scale. While we continue to take data for granted, designated as something that is “given,” data that began as embodied observation has become further and further removed from our lived experience. At the same time, data to which these metaphors refer are becoming ubiquitous in our lives—as the trace of our digital transactions, our bodies, and homes—making it all the more important to have an appropriate contextual model to frame our relationship to it.
Metaphors are helpful for understanding abstract concepts that, because of their complexity or scale, lie beyond our human comprehension. In their seminal work Metaphors We Live By, Lakoff and Johnson describe the conceptual metaphors that help with “referring, quantifying, identifying, setting goals, and motivating actions” for an abstract concept such as inflation. Given its ephemerality and abstraction, data is ripe for metaphoric description.
Metaphors have always helped with introducing new technologies in our everyday lives and finding ways to familiarize ourselves with the novelty. The television entered the living room, framing the cathode ray tube with wood—literally domesticating the technology as furniture in our homes. Early cinema echoed the proscenium arch of the theater as a remediated reference to the history of performance. Visual metaphors and skeuomorphism of legal pads, tape recorders, felt gaming tables create analog analogues to introduce digital interfaces of the computers in our pockets, and fall away as the devices become more familiar. The internet has inspired many different metaphors, and they reflect changes in how we think about it, attributing the promise of the internet with “revolution, evolution, salvation, progress, universalism, and the ‘American dream.'” Like data, we surf, drown, and dive into content on the internet as our media landscape changes.
Though metaphors reveal truths by association, metaphors can just as easily obscure and misrepresent. Metaphors prime us to take for granted the ways we think about things. Most of the metaphors we use to talk about data in popular culture make sense to technocratic corporations and their leaders, those building and disseminating information technologies, but they are fundamentally dehumanizing. It is no wonder individuals continue to believe that they have “nothing to hide” in the face of big data, because we do not have the cognitive context to grasp how behemoth corporations use data. The dominant industrial metaphors for data do not privilege the position of the individual. Instead, they take power away from the person to which the data refers and give it to those who have the tools to analyze and interpret data. Data then becomes obscured, specialized, and distanced.
We need a new framing of a personal, embodied relationship to data. Embodied metaphors have the potential to bring big data back down to a human scale and ground data in lived experience, which in turn, will help to advance the public’s investment, interpretation, and understanding of our relationship to our data.
“The people who get to impose their metaphors on the culture get to define what we consider to be true.”
—Lakoff and Johnson
Dominant metaphors for data today
So what do we talk about when we talk about data? “Data is the new oil.” “We’re mining the data for insights.” “We’re learning to cope with the data deluge, and working out how best to tap it.” “The work of data scientists is janitorial.”
The dominant metaphors for understanding data are industrial. They start from Gartner and O’Reilly Conferences, make their way into the business section of The New York Times, and seep further into the style section as we parse our evolving relationship to technology. Journalism itself has become data-driven in the likes of explainers like Vox and FiveThirtyEight. These metaphors are artifacts of an industrial complex that doesn’t privilege individuals or even populations of users.
Many of the metaphors we have for personal data today come from the big data industry. As Tim Hwang and Karen Levy have suggested, these metaphors describe data as a “natural, inexhaustible good—ripe for exploitation in the name of economic growth and private gain.” Also citing Lakoff and Johnson’s work, Cornelius Puschmann and Jean Burgess collect data metaphors into two categories: “as a natural force to be controlled and as a resource to be consumed.” Deborah Lupton points to another dominant metaphor, detailing the liquid qualities of data.
Data as a natural resource suggests that it has great value to be mined and refined but that it must be handled by experts and large-scale industrial processes. Data as a byproduct describes the transactional traces of digital interactions but suggests it is also wasteful, pollutive, and may not be meaningful without processing. Data has also been described as a fungible resource, as an asset class, suggesting that it can be traded, stored, and protected in a data vault. One programmatic advertising professional related to me that he thinks “data is the steel of the digital economy,” an image that avoids the negative connotations of oil while at the same time expressing concern about monopolizing forces of firms Google and Facebook.
The New York Times writes that “Personal data is the oil that greases the Internet. Each one of us sits on our own vast reserves.” We must then ask: What does it mean to “sit” on a “vast” “reserve” of data “oil?” While these metaphors elucidate the complex inner workings digitally networked information exchange, they fundamentally occlude the issues of personal agency and identity in the process.
DATA IS A NATURAL RESOURCE
gold rush
ecosystem
gathered
raw
trove
DATA IS AN INDUSTRIAL PRODUCT
mining
refining
platform
breach
big data :: big pharma, big business
DATA IS A BYPRODUCT
exhaust
data trail
breadcrumbs
smog
janitor
cleanser
smoke signals
signal and noise
DATA IS A MARKET
economy
paying with data
currency
asset
vault
broker
DATA IS LIQUID
ocean
deluge
tsunami
torrent
wave
firehose
lake
DATA AS TRENDY
data is the new currency
data is the new black
data scientist is the sexiest job of the 21st century
frontier
revolution
wild west
Embodied metaphors for data
Industrial metaphors make business sense, but not sense as individuals in a data-driven society. Jer Thorpe attempts to bridge the connection between the oil metaphor and reality: “where oil is composed of the compressed bodies of long-dead micro-organisms, this personal data is made from the compressed fragments of our personal lives. It is a dense condensate of our human experience.”
The most effective metaphors—ones so fundamental that we forget they are metaphors—draw on embodied experience, or “embodied cognition,” fundamentally part of the way we think and act in the world. Industrial metaphors lack a connection to Lakoff and Johnson’s “basic domain of experience” of individuals, including our bodies, interactions with our physical environment, and interactions with other people. Industrial metaphors share an experiential perspective of a bodiless conglomerate technocratic actor, seeing like a Google, as it were. Embodied metaphors draw from the perspective of us as individual people.
Perhaps the most apt popular metaphor that ties data to the body is the description of data as a digital footprint, fingerprint, or a shadow. These metaphors acknowledge the presence of a person, yet point to the disjuncture between the person and their remaining traces. Still, this comparison relies on the DATA IS A BYPRODUCT construct, and it emphasizes the meaningful information about who we are or where we’ve been that can be deduced from our traces.
The metaphors used in the Quantified Self community offer a more personal, autobiographical, embodied, or practice-oriented conceptual model of data. Studying the early adopters of self-tracking technology, I’ve identified a set of emerging data metaphors starting from a personal, rather than industrial perspective. Some are still mechanistic, drawing on Taylorist theories about “managing what you measure.” But others are more sympathetic and focus on embodied experience and personal reflection.
DATA IS A MIRROR portrays data as something to reflect on and as a technology for seeing ourselves as others see us. But, like mirrors, data can be distorted, and can drive dysmorphic thought.
DATA IS A PRACTICE references the self-tracking process that has been criticized as navel-gazing, but which can also be a means of introspection and a practice toward self-knowledge. The quantified self motto “self-knowledge through numbers” is a misnomer; self-knowledge comes through the attentive process of choosing what to track and self-observation.
Can we push embodied data metaphors further? Data is blood? Data is DNA? We already think of DNA as biological information programming. Data is traces of digital existence like dust is a trace of the presence of our skin? Data is a fingerprint? This is an incomplete and imperfect list, but the start of a reframing of our position as individuals with a stake in where, how, and why data implicates our identity and existence.
DATA IS A BODY
footprint
fingerprint
shadow
blood
DNA
reflection
identity
portrait
profile
Embodied metaphors at work
“Metaphors not only help us to think about the future; they are a resource deployed by a variety of actors to shape the future…Metaphors can mediate between structure and agency, but it is actors who choose to repeat old metaphors and introduce new ones. Thus, it is important to continue to monitor the metaphors at work to understand exactly what work it is that they are doing.”
—Danger! Metaphors at Work in Economics, Geophysiology, and the Internet, Sally Wyatt
The rhetoric around data has granted it too much agency and authority, “with enough data, the numbers speak for themselves.” Personification in metaphor has the ability to transfer agency to inanimate subjects and affects our ability to make decisions about them. Tying data back to individuals, even at the metaphorical level, could change how we design the systems that manage it and policies that protect it. What kind of accountability and responsibility do the default designs of systems have to the subjects of the data? What visibility do individuals have into the flows of their informational agents across an interconnected system? What control do we have over making sure our data accurately reflects our interests?
How we think about data—and more importantly what we do with it—will depend on the value systems that our conceptual metaphors capture and reify. Reframing metaphors for data in a more personal and embodied context will give us a better way to think of ourselves as information organisms, or “inforgs,” as philosopher Luciano Floridi suggests we are becoming. Our data profiles will act on our behalf, and we must be able to interact with and grasp their agency. Embodied data metaphors put more control in our hands as individuals, capable of interpreting and intervening in our own personal data management.
Embodied data metaphors will shape public consciousness and, in turn, shape policy positions, technology designs, and business models going forward. Joseph Grady argues that metaphors engage the public to bridge the gap between jargon-speaking experts and the issue at hand. The need for reframing data has recently inspired the evolution of Human-Computer Interaction to its latest incarnation: Human-Data Interaction. This emerging field of study aims to improve the legibility, agency, and negotiability of data-driven interactions between individuals and complex technical systems. An embodied understanding of our digital identities will only become more important as the thin boundary between the online and offline world dissolves.
How will the metaphors we use today shape our data society in the future? Will they still be industrially driven, consolidating power and authority in technocratic entities with particular data views on the world? Or will our data society be personal, humane, and reflective of our values in a distributed and individually empowering way?
Sara M. Watson is a technology critic and a Fellow at the Berkman Center for Internet and Society at Harvard University. She tweets @smwat.