- Информация / Мы / ДОКУМЕНТЫ ПАНЛОГА / 2019-05-03 На пути к воскресению мертвых с помощью Панлог-программы Софт-Актёр: сетевые Big Data и цифровое бессмертие // Big Data & Society, Volume 6, Issue 1, First Published 2019-04-23 /

# 2019-05-03 На пути к воскресению мертвых с помощью Панлог-программы Софт-Актёр: сетевые Big Data и цифровое бессмертие // Big Data & Society, Volume 6, Issue 1, First Published 2019-04-232019-05-03 На пути к воскресению мертвых с помощью Панлог-программы Софт-Актёр: сетевые Big Data и цифровое бессмертие // Big Data & Society, Volume 6, Issue 1, First Published 2019-04-23

We project the future accumulation of profiles belonging to deceased Facebook users. Our analysis suggests that a minimum of 1.4 billion users will pass away before 2100 if Facebook ceases to attract new users as of 2018. If the network continues expanding at current rates, however, this number will exceed 4.9 billion. In both cases, a majority of the profiles will belong to non-Western users. In discussing our findings, we draw on the emerging scholarship on digital preservation and stress the challenges arising from curating the profiles of the deceased. We argue that an exclusively commercial approach to data preservation poses important ethical and political risks that demand urgent consideration. We call for a scalable, sustainable, and dignified curation model that incorporates the interests of multiple stakeholders.

We, the Party, control all records, and we control all memories. Then we control the past, do we not? (Orwell, 1949: 313)

Internet users leave vast volumes of online data behind when passing away, commonly referred to as digital remains (Lingel, 2013). The phenomenon is gaining increasing traction within the academic community (Gotved, 2014). Scholars of law and related areas are investigating new dilemmas arising from inheritance of digital estates (Banta et al., 2015Craig et al., 2013) and issues of posthumous online privacy (Harbinja, 2014). Sociologists and anthropologists are increasingly turning their gaze towards the new types of ‘para-social’ relationships (Sherlock, 2013), and the ‘continuing bonds’ (Bell et al., 2015) that we shape with the online dead. And in philosophy, there has been a rising interest for the ontological (Steinhart, 2007Stokes, 2012Swan and Howard, 2012) and ethical (Öhman and Floridi, 2018Stokes, 2015) status of digital remains. In short, online death has rapidly become a booming and diverse research area.

Despite this breadth of perspectives, few studies have thus far explored the macroscopic and quantitative aspects of online death. While research on philosophical micro- and meso-level aspects are illuminating, the global spread of the phenomenon, as well as its future development, remain uncertain. The absence of thorough empirical investigation on the macro-level makes it difficult to formulate a critical analysis of the global impact of online death from either long and/or short-term perspectives. This is problematic, not only because researchers (including the authors of this study) often motivate the significance of the subject by alluding to its presumed size and growth (Acker and Brubaker, 2014: 10; Harbinja, 2014: 21; Öhman and Floridi, 2017: 640), but also because there is reason to believe that online death will increase in significance as more people around the world become connected and mortality numbers rise. It is important to get the picture straight. Is social media, as occasionally claimed (Ambrosino, 2015Brown, 2016), turning into a ‘digital graveyard’? If so, how is the phenomenon geographically distributed? And perhaps more importantly, what ethical and political challenges would emerge from such development? Despite the somewhat alarming nature of these questions, there have hitherto been few attempts to provide rigorous answers.

To address this lacuna and lay the groundwork for further macroscopic analysis, the current study sets out to estimate the growth of digital remains over the course of the 21st century, using the world’s largest platform – Facebook – as a case study. Facebook’s policy on deceased users has changed somewhat over the years, but the current approach is to allow next of kin to either memorialize or permanently delete the account of a confirmed deceased user (Facebook, n.d.).1The focus of this article, however, is not merely on the memorialized profiles, but on all profiles belonging to deceased users, be they memorialized or not. We pose two research questions:

RQ1: How will the number of Facebook profiles belonging to dead users develop over the course of the 21st century?

Our analysis is conducted in two stages, henceforth referred to as Scenarios A and B. In Scenario A, we assume a global freeze on new users joining the network as of 2018 and predict the resulting accumulation of dead profiles for each nation in the world. This effectively sets a ‘floor’ on the possible growth of dead profiles on the network. To carry out the analysis, we use a public dataset of projected mortality from 2000 to 2100, distributed by age group and nationality (United Nations, Department of Economic and Social Affairs, 2017). These data are matched with current Facebook user totals, scraped from Facebook’s audience insights application programming interface (API) for each country and age group. This allows us to estimate the number of Facebook users expected to die in any given country-year. In Scenario B, we expand the analysis to a hypothetical scenario for Facebook’s future growth, assuming that the network will continue to grow at a pace of 13% per year (Facebook, 2018) until it reaches a penetration rate of 100% for each country-year-age group. While unlikely, this estimate provides a ‘ceiling’ for the accumulation of dead profiles. In conjunction with the ‘floor’ defined in Scenario A, this ceiling defines the window within which we can expect the true number of dead profiles to fall.

In concluding the study, we situate the findings within the larger context of digital preservation (Whitt, 2017). We raise concerns over the current dominance of commercial data management, and warn that it may limit future generations’ access to historical data. We argue that profiles of the deceased are valuable in ways that cannot be quantified in purely economic terms, which is why we advocate an explicitly multi-stakeholder approach. If data are preserved solely on the basis of corporate profitability, we warn that non-economic considerations – e.g., the ethical, religious, scientific, and historical value of digital remains – may be neglected. Our digital heritage is difficult to measure in dollars and cents.

Three types of data were used to carry out the analysis: projected mortality over the 21st century, distributed by age and nationality; projected population data over the 21st century, also distributed by age and nationality; and current Facebook user totals for each age group and country.

Mortality rates were calculated based on UN data, which provide the expected number of mortalities and total populations for every country in the world (United Nations, Department of Economic and Social Affairs, 2017). Numbers are available for each age group – 0 to 100, divided into five-year intervals – and all years from 2000 to 2100, likewise divided into five-year intervals. The estimates are based on official data from each country’s government, and in some cases external sources (esa.un.org/unpd/wpp/DataSources/). It is unclear from the data how precision varies by country and year. All projections are reported as point estimates, with no standard errors or confidence intervals. For a more detailed account of the UN data, see esa.un.org/unpd/wpp/.

Facebook data were scraped from the company’s Audience Insights page (facebook.com/ads/audience-insights/) using a custom Python script that extracts Facebook’s active monthly users by country and age. These estimates are based on the self-reported age of users. Facebook provides lower and upper bounds for user totals across all ages and nationalities. For example, there are between 15 and 20 million 25-year-old Indians on the network.2 Variability increases with user counts, both of which are reported in round numbers divisible by 5 or 10, suggesting that they are not meant as serious estimates of standard errors or confidence intervals. We take the midpoint of each country-age window for our analysis.

Facebook’s audience insights API provides by far the most comprehensive publicly available estimate of the network’s size and distribution. Nevertheless, we wish to draw attention to several limitations of this dataset. First, there are reasonable doubts about the accuracy of Facebook’s reported monthly active users. The site has recently been sued for allegedly inflating these numbers with the intent of overcharging advertisers (Todd, 2018), and Facebook explicitly notes that their estimates are not meant to be matched with population data. In addition to these concerns about false positives, we also expect false negatives due to users visiting the site less than once a month. Unfortunately, it is impossible to say exactly how this affects results without more fine-grained detail on the distribution of errors. Second, the data exclude users under 18, preventing us from evaluating network activity among 13–17 year olds (Facebook requires all users to be at least 13). Due to the relatively low (although varied) mortality rate of this age group, the missing data should not have much impact on the projection until relatively late in the century. Third, users aged 65+ are all put into the same age category. This gives us less detailed data on penetration rates among the elderly. But as we show in the following section, this problem can be mitigated by extrapolating from a smooth curve fit to data from younger users.

Finally, we wish to emphasize that our model is devoted to the future development of death on Facebook, and therefore leaves out users who have already died and left profiles behind. Estimating the current number of dead profiles would require historical data on the age distribution of Facebook users in various countries, which are currently inaccessible through the site’s API. Furthermore, the aim of the study is to depict a larger, long-term trend, in which the current numbers play only an illustrative role.

Our methodological approach can be summarized by the following procedure for each country:

1. estimate a function f mapping age and year to expected mortality rates (see Figure 1(a));

2. estimate a function g mapping age to expected active monthly Facebook users (see Figure 1(b));

3. extend g across time under two alternative scenarios (details below);

4. multiply the outputs of f and g to estimate the number of Facebook profiles belonging to dead users of a given age in a given year (see Figure 1(c)); and

5. integrate this product across all age groups to estimate the number of dead profiles in a given year.

This pipeline is repeated for each country to get a global estimate. Projections are integrated over several years to get national or global estimates over time.

It should be noted that this approach makes a substantive and potentially problematic assumption, namely that each country’s Facebook users constitute a representative sample of the population, at least with respect to mortality rates. It is well established that internet usage, especially in developing economies, is strongly correlated with education and income (PEW Research Centre, 2018: 15). These two variables are in turn correlated with life expectancy, which means there is reason to believe that current Facebook users will live slightly longer than non-users on average. Our model does not account for this potential bias, which may result in an overestimation of dead users in developing countries.

However, a recent PEW research report (2018: 15) indicates that the divide is rapidly shrinking. Between 2015 and 2017, social media penetration in countries such as Lebanon, Jordan and the Philippines rose by more than 20 percentage points, suggesting that connectivity is fast becoming increasingly accessible. This trend is expected to continue throughout the 21st century, mitigating any potential confounding effects on projections years or decades out. Furthermore, the closer we get to full market saturation, the smaller the bias becomes since people with high and low life expectancies are both joining the network in large numbers. In the face of this, it is important to stress that the value of the present study lies in the larger trends it identifies, not in the details of the immediate future development. This should be kept in mind when assessing very short-term scenarios.

The model described in step (2) was trained on 2018 data. We vary projections for future Facebook growth according to two scenarios: (A) Shrinking. No new users join the network. All current users remain until their death. (B) Growing. The network grows at 13% per year across all markets until usership reaches 100%. To help extrapolate beyond the age of 64, the final age for which Facebook provides monthly active user totals, we anchored all regressions with an extra data point of zero users aged 100. This is almost certainly true in all markets, at least to a first approximation. Alternative anchor points may be justified, but do not have a major impact on results.

All statistical analysis was conducted in R, version 3.5.1 (R Core Team, 2018). Predictive functions were estimated using generalized additive models (GAMs), which provide a remarkably flexible framework for learning nonlinear smooths under a wide range of settings (Hastie and Tibshirani, 1990). Regressions were implemented using the mgcv package (Wood, 2017). A supplemental methods section, including data and code for reproducing all figures and results, can be found online at: https://github.com/dswatson/digital_graveyard.

We fit three separate models for each country

 Mortality_Rate=fC(Time,Age)FB_Users_2018=gC(Time=2018,Age)Population=hC(Time,Age)$\begin{array}{c}\text{Mortality}_\text{Rate}={f}_{C}\left(\text{Time},\text{Age}\right)\\ \text{FB}_\text{Users}_2018={g}_{C}\left(\text{Time}=2018,\text{Age}\right)\\ \text{Population}={h}_{C}\left(\text{Time},\text{Age}\right)\end{array}$

The subscript C indicates that each model is country-specific. We omit the subscript for notational convenience moving forward.

The mortality and population models provide nonlinear interpolations so that we can make predictions for any age-year in the data without the limitations imposed by the UN’s binning strategy.

Under Scenario A, we extrapolate model g beyond 2018 by assuming that no new users join Facebook and current users leave the network if and only if they die. This means we see zero 18-year-olds on the network in 2019, zero 18- or 19-year-olds in 2020, and so on. Attrition from current users can be calculated recursively. For each year t and age a:

Scenario A

 FB_Users=g(Time=t,Age=a)=g(Time=t−1,Age=a−1)×(1−f(Time=t−1,Age=a−1))$\begin{array}{c}\text{FB}_\text{Users}=g\left(\text{Time}=t,\text{Age}=a\right)=\\ g\left(\text{Time}=t-1,\text{Age}=a-1\right)\\ ×\left(1-f\left(\text{Time}=t-1,\text{Age}=a-1\right)\right)\end{array}$

In Scenario B, we extrapolate beyond g by assuming that Facebook will see constant growth of 13% per year in all markets until reaching a cap of 100% penetration. For each year t and age a:

Scenario B

 upper_bound=h(Time=t,Age=a)FB_proj=g(Time=t−1,Age=a−1)×1.13t−2018FB_Users=g(Time=t,Age=a)=min(upper_bound,FB_proj)$\begin{array}{c}\text{upper}_\text{bound}=h\left(\text{Time}=t,\text{Age}=a\right)\\ \text{FB}_\text{proj}=g\left(\text{Time}=t-1,\text{Age}=a-1\right)×1.{13}^{\text{t}-2018}\\ \text{FB}_\text{Users}=g\left(\text{Time}=t,\text{Age}=a\right)\\ =\text{min}\left(\text{upper}_\text{bound},\text{FB}_\text{proj}\right)\end{array}$

In both cases, our true target is

 y=∫10013∫21002018f(Age,Time)g(Age,Time)d(Age)d(Time)$y={\int }_{13}^{100}{\int }_{2018}^{2100}f\left(\text{Age},\text{Time}\right)g\left(\text{Age},\text{Time}\right)d\left(\text{Age}\right)d\left(\text{Time}\right)$

For the mortality rate model f, we used beta regression with a logit link function, a common choice for rate data. For the Facebook model g, we used negative binomial regression with a log link function, which is well suited for over-dispersed counts such as those observed in this dataset. We experimented with several alternatives for the population model h, ultimately getting the best results using Gaussian regression with a log link function. Parametric specifications for each model were evaluated using the Akaike information criterion (Akaike, 1974), a penalized likelihood measure. Age and time were incorporated as both main effects and interacting variables in models f and h, which were fit with tensor product interactions in a functional ANOVA structure (Wood, 2006). We use cubic regression splines for all smooths, with a maximum basis dimension of 10. Parameters were estimated using generalized cross-validation.

While there remains no good way to evaluate the precision of the underlying data – as noted above, neither the UN nor Facebook provides confidence intervals – we may quantify the uncertainty of the model using nonparametric techniques. GAMs provide straightforward standard errors for their predictions, but under both scenarios our true target y is a double integral of a product of two vectors. Unfortunately, there is no analytic method for calculating y’s variance as a function of those variables without making strong assumptions that almost certainly fail in this case.

For that reason, we measure uncertainty using a Bayesian bootstrap (Rubin, 1981). To implement this algorithm, we sample n weights from a flat Dirichlet prior and fit the models using these random weights. We repeat this procedure 500 times for each country and scenario, providing an approximate posterior distribution for all predictions, from which we compute standard errors. These numbers are reported in parentheses next to point estimates in the text, and in their own column in all table summaries.

As previously noted, the findings we present in this paper concern only the future accumulation of dead profiles (i.e., those who will die between 2018 and 2100). Naturally, many users have already left profiles behind when they passed away. This number, however, is unknown, but should (whatever it is) be added to the plots we present in Scenarios A and B below.

### Scenario A

Our first scenario assumes that users will cease joining the network as of 2018. While unlikely, this defines the minimum of the possible development, what we refer to as the floor (see Figure 2). Attached to the plot is a table with the exact numbers and share of each continent (Table 1).

 Table 1. Geographical distribution of dead profiles (in millions) under Scenario A.

Table 1. Geographical distribution of dead profiles (in millions) under Scenario A.

Under the assumptions of Scenario A, we estimate that some 1.4 billion (±11.15 million) Facebook users will die between 2018 and 2100 – fully 98% of the 1.43 billion users in our dataset. Under this scenario, the number of deaths per year on Facebook grows steadily for the next five decades, peaking at over 29 million (±0.31 million) in 2077 before decelerating through the rest of the century. The global sum of dead profiles exceeds 500 million (±3.86 million) in 2060 and 1 billion (±8.67 million) in 2079. Note that under these conservative assumptions, the dead will in fact overtake the living on Facebook in about 50 years. This corroborates popular claims in media (Ambrosino, 2015; Brown, 2013) about living profiles becoming a minority on the network within the (relatively) near future.

The plot further shows that Asia contains a growing plurality of deceased users for every year in the dataset, culminating with nearly 44% of the total by the end of the century. Nearly half of those profiles come from just two countries, India and Indonesia, which account for a cumulative 278.8 million (±9.8 million) Facebook mortalities by 2100 (see Table 2).

 Table 2. Geographical distribution of dead profiles (in millions) by country under Scenario A. Results for top ten countries shown.

Table 2. Geographical distribution of dead profiles (in millions) by country under Scenario A. Results for top ten countries shown.

### Scenario B

Scenario A is highly unlikely. For Facebook to see zero global growth as of 2019 would require some cataclysmic event(s) far more ruinous than the Cambridge Analytica scandal (Cadwalladr and Graham-Harrison, 2018), which revealed serious issues regarding the security and privacy of Facebook user data. To estimate how much higher the growth can possibly be, the second scenario sets a ‘ceiling’ on the development. We presume that Facebook will continue to see global growth of 13% per year until it reaches 100% penetration in all markets. As illustrated by Figure 3, this assumption drastically changes the total number of dead users by the end of the century.

A continuous growth rate of 13% per year increases the expected number of dead profiles on Facebook by a factor of 3.5, for a total sum of 4.9 billion (±97.23 million). Unlike Scenario A, the dead profiles do not show any signs of exceeding the living within this century. However, the proportion is still substantial, and the dead are likely to reach parity with the living in the first decades of the 22nd century.

A continuous 13% growth rate would change not just the total number of dead users, but their geographical distribution (see Tables 3 and 4). The most notable shift is the considerably increased share of global Facebook mortalities contributed by African nations. Nigeria in particular becomes a major hub of Facebook user deaths under Scenario B – in fact, the second largest in the world, accounting for over 6% of the global total. The shift is evident in Figures 3and 4. Niger, Mali and Burkina Faso also appear in the top 10 countries by dead profile count, while the United States is the only Western nation to crack the list. In other words, a minority of dead profiles will belong to Western users.

 Table 3. Geographical distribution of dead profiles (in millions) under Scenario B.

Table 3. Geographical distribution of dead profiles (in millions) under Scenario B.

 Table 4. Geographical distribution of dead profiles (in millions) by country under Scenario B. Results for top ten countries shown.

Table 4. Geographical distribution of dead profiles (in millions) by country under Scenario B. Results for top ten countries shown.

To illustrate the geographical distribution more clearly, we have included a heatmap that visualizes deceased Facebook users per country (see Figure 4). Unsurprisingly, the map closely tracks the list of largest Facebook markets. However, it should be noted that only two Western countries (the US and the UK) make it to the top 10 list under either scenario. Thus, the maps clearly show that death online is a global phenomenon, reaching far wider than just Europe and America.

To summarize, both scenarios are implausible. The true number almost certainly falls somewhere between Scenarios A and B, but we can only speculate as to where. Assumptions regarding growth rates have a major impact on both absolute numbers and geographical distributions of dead profiles. While richer data sources may help produce more accurate projections, an exact estimate is almost beside the point. Even in the conservative Scenario A, numbers are large. Facebook will indubitably have hundreds of millions of dead users by 2060 if not sooner.

With regards to the geographical distribution, it can be noted that in both scenarios, a handful of countries make up a large proportion of the total – mainly India (due to its large population) and the US (due to its high penetration rates), but also other countries like Nigeria and Brazil will be important stakeholders in this development. Next, we turn to a discussion of the challenges posed by the growth of death online.

Our projection of growth in dead Facebook users’ accounts marks the first step toward empirically exploring the macroscopic and quantitative aspects of death on social media. The results should be interpreted not as a prediction of the future, but as a commentary on the present, and an opportunity to respond with thoughtful and effective policy interventions.

Undoubtedly, there is a great deal of uncertainty in projections of this kind. In addition to the predictive variance discussed above, there is also uncertainty regarding the data underlying the model. For instance, we do not know if there will be a significant cultural shift among users towards deleting profiles (either one’s own or deceased relatives’, a possibility given to the appointed legacy contact). It is also possible that Facebook will unexpectedly go bankrupt in the foreseeable future, thus invalidating the assumptions underlying our models. As stressed by boyd (2006) among others, the longevity of social media sites depends on their ability to evolve, and despite the success of the past decade, we do not yet know how or if Facebook will manage to do this in the future.

But this has no bearing on our larger point – namely, that critical discussion of online death and its macroscopic implications is urgently needed (not least in regard to its geographical spread). Facebook is merely an example of what awaits any platform with similar connectivity and global reach. Furthermore, the sudden dissolution of Facebook would arguably make the subject even more important, as the company may be forced to sell or delete their user data. A sufficiently severe blow to Facebook’s finances could force a redesign of the platform with major implications for those currently using it as a memorial site (see, for instance, Arnold et al., 2018: 202 on how the relaunch of MySpace 2013 dropped features used by mourners). In what follows, we tentatively presume that Facebook or something like it will continue to exist for the foreseeable future.

Each individual who leaves a profile behind represents a unique event in its own right, which often leaves us with difficult questions of inheritance of digital assets (Banta et al., 2015Craig et al., 2013) and posthumous online privacy (Harbinja, 2014). But when aggregated, the totality of these cases amounts to something beyond the sum of its parts. The personal digital heritage left by the online dead are, or will at least become, part of our shared cultural digital heritage (Cameron and Kenderdine, 2007), which may prove invaluable not only to future historians (Brügger and Schroeder, 2017Pitsillides et al., 2012Roland and Bawden, 2012), but to future generations as part of their record and self-understanding. As stated by Matt Raymond, the former director of communications at the American Library of Congress upon receiving a large data donation from Twitter, ‘Individually tweets might seem insignificant, but viewed in the aggregate, they can be a resource for future generations to understand life in the 21st century’ (Raymond, 2010). Such records can thus be thought of as a form of future public good (Waters, 2002: 83), without which we risk falling into a ‘digital dark age’ (Kuny, 1998Smit et al., 2011).

Despite its seeming immortality, digital information is more fragile than is sometimes assumed, and future access is far from guaranteed (Whitt, 2017) – even for Facebook itself. File formats change, hardware must be updated, and data need to be continuously stewarded and organized in order to remain useful. As Jeff Rothenberg (1995) says, ‘Digital information lasts forever – or five years, whichever comes first.’ This is not primarily due to storage costs. ‘The real cost of storage’, as Palm (2006: 5) puts it, ‘is management.’ To maintain data utility, firms must routinely upgrade systems and tend to their contents, a costly and tedious undertaking for which Facebook’s current curation model was not designed. Lavoie and Dempsey (2004: 229) put it well:

Preserving our digital heritage is more than just a technical process of perpetuating digital signals over long periods of time. It is also a social and cultural process, in the sense of selecting what materials should be preserved, and in what form; it is an economic process, in the sense of matching limited means with ambitious objectives; it is a legal process, in the sense of defining what rights and privileges are needed to support maintenance of a permanent scholarship and cultural record … And perhaps most importantly, it is an ongoing, long-term commitment, often shared, and cooperatively met, by many stakeholders.

While Lavoie and Dempsey write primarily for an audience of librarians and archivists, their argument is equally applicable to the case of digital remains on Facebook: the cultural/ethical process of selecting whose data are worth preserving, and how to preserve them, is inseparable from the economic constraints that induce the question. But how is one to determine what is worth preserving? This requires a normative framework, one or several guiding principles that help us determine the value of data. There are many possible candidates for such a principle. An object can be appreciated for its sentimental, scientific, religious, or aesthetic values, to list just a few considerations of note. Furthermore, it is plausible that different regions, nations and other interest groups will appreciate different values in Facebook’s mounting historical record. Nevertheless, it is neither users nor their political representatives or religious groups who determine how their data is collectively managed – it is the corporate interests of Facebook.

For a firm, what makes data ‘worth preserving’ is ultimately their ability to directly or indirectly contribute to the company’s profit. Data belonging to deceased users may prove valuable for such purposes. For example, the memorialized profiles may still serve the function of attracting living users who visit the profile to mourn (Karppi, 2013). Indeed, time spent on Facebook can even be understood as a type of labour (Fuchs and Sevignani, 2013). While the (indirect) traffic generated by mourning relatives may not single-handedly result in enough clicks and exposure to cover the costs of curating the dead, it could still serve the indirect function of appropriating central social functions such as mourning and love (Öhman and Floridi, 2017). What is more, datasets of digital remains may also be used for training new models (Leaver, 2013) and extracting historical insight, which may provide a valuable market advantage. Few legal obstacles stand in the way of such experimentation, as deceased users are not, at least according to current legislation, protected the way living users are (see, for instance, the latest GDPR, which lacks any clear guidelines for handling digital remains).

While both the traffic generated by the bereaved and the internal training of new models are possible uses of digital remains, they do not guarantee long-term profitability. If the economic value of dead profiles were ever to become negative, market forces would compel a rationally self-interested firm to delete them. This seems to be the preferred option for many other social media including Twitter (Twitter.com, n.d.), and is also advocated on a normative basis by some scholars, perhaps most notably Mayer-Schönberger (2009). But thus far Facebook appears to have found the net value of dead profiles to be positive.

This is not to say that Facebook, nor any other platform, only appreciates the commercial value of digital remains as a source of financial exploitation. In fact, Facebook has carefully considered the ethical implications of their policy (Brubaker and Callison-Burch, 2016), and has removed advertisements from the deceased profiles, thus virtually de-commercializing the space. But the de-commercialization itself may be interpreted as a response to market incentives, in so much as it is rational for firms to maintain the good will of their customers. Curating a deceased relative’s profile could keep some users on the platform, even if it is not the main source of revenue generated by them.

Market incentives may often overlap with the interests of researchers, consumers and future generations – but they are by no means identical. Markets have been discussed rather extensively in the digital preservation literature. For instance, Lavoie (2003: 15) identifies three ideal type-roles in the economics of digital preservation: Right holder, Archive and Beneficiary. Sometimes, these roles are played by a single entity, sometimes by separate ones. Lavoie stresses that in so-called supply-side models (17), where the Right holder and the Archive are the same entity but the Beneficiary is external, there is a risk that the market does not create sufficient incentives for preservation. This is indeed a risk in the case of Facebook. The platform has both the rights to the information stored and is the archiving entity. Moreover, they have little incentive to share (to say nothing of the complexity of posthumous privacy rights). The beneficiaries – in this case future generations and historians – can neither speak for themselves nor create any current incentives, which make a purely free-market model inappropriate.

This situation requires what one may call a new macro-ethics of deletion (to borrow a term from Floridi, 2013), a curation model that encompasses and appreciates the various kinds of values involved. In line with Lavoie and Dempsey’s argument, we therefore conclude that multiple stakeholders must be considered. These stakeholders may include states, NGOs, universities, libraries, museums, and any other kind of institution that provides unique perspectives on the value of our digital heritage. The multi-stakeholder approach is not in itself a novel proposal. Indeed, the pioneering Task Force on Archiving of Digital Information (1996) was composed of a collection of individuals representing industry, museums, archives and libraries, publishers, scholarly societies and government. And newer initiatives, such as UNESCO’s strategic plan for software heritage (Di Cosmo and Zacchiroli, 2017: 4), have continued to stress the value of diversity in digital preservation:

We believe that, for Software Heritage, it is essential to build a not-for-profit foundation that has as its explicit objective the collection, preservation and sharing of our software commons. In order to minimize the risk of having a single point of failure at the institutional level, this foundation needs to be supported by various partners from civil society, academia, industry, and governments, and must provide value to all areas that may take advantage of the existence of the archive, ranging from the preservation of cultural heritage to research, from industry to education.

While the above quote deals mainly with software, the same can be said about the vast datasets accumulated by social media firms. It is important that historically significant data are preserved in a way that serves all of humanity, and this cannot be done by allocating the curation of historical social records to any one agent operating in its rational self-interest.

Finally, we wish to stress the importance of decentralizing control over aggregates of digital remains. Concentration of historical data in private hands may prove problematic for political reasons (Lor and Britz, 2012Öhman, 2018). While it is true that one’s digital remains are often distributed over multiple platforms and media (Cann, 2014; Pitsillides et al., 2012: 19), it seems that control of personal data (and hence digital remains) are increasingly concentrated in a small number of global actors (many of which are owned by Facebook, e.g. Whatsapp, Messenger, and Instagram). And, as Orwell so adroitly observed in 1984, those who control our access to the past also control how we perceive the present. So, in order to prevent a possibly dystopian future of power asymmetries and distorted historical narratives, the task before us is to design a sustainabledignified solution that takes into account multiple stakeholders and values. This inevitably requires a decentralization of control and ownership of our collective digital heritage.

Academic knowledge will be key in this process. Researchers are charged not just with providing macro-level analyses like this one, but also with providing qualitative knowledge of how individuals in different cultures and social settings make sense of death and the digital. When it comes to qualitative research, there is already a rich literature upon which to draw (Bell et al., 2015Brubaker et al., 2016Kasket, 2012). However, researchers have hitherto mainly focused on North American and European settings – with some exceptions (Choudhary, 2018). If the goal is to contribute to a fair and flexible system for curating digital remains, researchers must increasingly turn to non-western contexts, where the phenomenon is going to have the largest presence. While survey data from previous studies do not indicate any radical differences in attitudes toward online death across cultures (Grimm and Chiasson, 2014), a qualitative, nuanced understanding of this fast-evolving subject is required. We therefore encourage scholars of online death to widen the geographical scope of their research, and focus particularly on South Asia and Africa, where our models suggest the phenomenon will be most prevalent in the coming decades.

This study has provided the first rigorous projection of the accumulation of Facebook profiles belonging to the deceased. Will the dead then, ‘take over’ Facebook? We have concluded that hundreds of millions of dead profiles will be added to the network in the next few decades alone, and that the dead may well outnumber the living before the end of the century, depending on how global user penetration rates evolve. Irrespective of how the network grows in the years to come, the vast majority of dead profiles will belong to users from non-western countries.

Considering its global reach, we have argued that the totality of deceased user profiles amounts to something beyond the sum of its parts. These profiles are becoming part of our collective record as a species, and may prove invaluable to future generations. We believe that a multi-stakeholder approach is the best way to curate such a vast archive. We have also stressed that in crafting a future curation model, qualitative understanding of how different cultures make sense of death and the digital will be key. Likewise, the development poses difficult ethical problems that require careful consideration. The onus is now on policymakers and industry to rise to these challenges. We look forward to taking part in the debates to come.

We wish to express our sincere gratitude to the four referees who reviewed this study. Their insights and input have substantially improved the final result. We would also like to express our thanks to Patrick Gildersleve for helping us with the Python script that scraped data from the Facebook API.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

We project the future accumulation of profiles belonging to deceased Facebook users. Our analysis suggests that a minimum of 1.4 billion users will pass away before 2100 if Facebook ceases to attract new users as of 2018. If the network continues expanding at current rates, however, this number will exceed 4.9 billion. In both cases, a majority of the profiles will belong to non-Western users. In discussing our findings, we draw on the emerging scholarship on digital preservation and stress the challenges arising from curating the profiles of the deceased. We argue that an exclusively commercial approach to data preservation poses important ethical and political risks that demand urgent consideration. We call for a scalable, sustainable, and dignified curation model that incorporates the interests of multiple stakeholders.

We, the Party, control all records, and we control all memories. Then we control the past, do we not? (Orwell, 1949: 313)

Internet users leave vast volumes of online data behind when passing away, commonly referred to as digital remains (Lingel, 2013). The phenomenon is gaining increasing traction within the academic community (Gotved, 2014). Scholars of law and related areas are investigating new dilemmas arising from inheritance of digital estates (Banta et al., 2015Craig et al., 2013) and issues of posthumous online privacy (Harbinja, 2014). Sociologists and anthropologists are increasingly turning their gaze towards the new types of ‘para-social’ relationships (Sherlock, 2013), and the ‘continuing bonds’ (Bell et al., 2015) that we shape with the online dead. And in philosophy, there has been a rising interest for the ontological (Steinhart, 2007Stokes, 2012Swan and Howard, 2012) and ethical (Öhman and Floridi, 2018Stokes, 2015) status of digital remains. In short, online death has rapidly become a booming and diverse research area.

Despite this breadth of perspectives, few studies have thus far explored the macroscopic and quantitative aspects of online death. While research on philosophical micro- and meso-level aspects are illuminating, the global spread of the phenomenon, as well as its future development, remain uncertain. The absence of thorough empirical investigation on the macro-level makes it difficult to formulate a critical analysis of the global impact of online death from either long and/or short-term perspectives. This is problematic, not only because researchers (including the authors of this study) often motivate the significance of the subject by alluding to its presumed size and growth (Acker and Brubaker, 2014: 10; Harbinja, 2014: 21; Öhman and Floridi, 2017: 640), but also because there is reason to believe that online death will increase in significance as more people around the world become connected and mortality numbers rise. It is important to get the picture straight. Is social media, as occasionally claimed (Ambrosino, 2015Brown, 2016), turning into a ‘digital graveyard’? If so, how is the phenomenon geographically distributed? And perhaps more importantly, what ethical and political challenges would emerge from such development? Despite the somewhat alarming nature of these questions, there have hitherto been few attempts to provide rigorous answers.

To address this lacuna and lay the groundwork for further macroscopic analysis, the current study sets out to estimate the growth of digital remains over the course of the 21st century, using the world’s largest platform – Facebook – as a case study. Facebook’s policy on deceased users has changed somewhat over the years, but the current approach is to allow next of kin to either memorialize or permanently delete the account of a confirmed deceased user (Facebook, n.d.).1The focus of this article, however, is not merely on the memorialized profiles, but on all profiles belonging to deceased users, be they memorialized or not. We pose two research questions:

RQ1: How will the number of Facebook profiles belonging to dead users develop over the course of the 21st century?

Our analysis is conducted in two stages, henceforth referred to as Scenarios A and B. In Scenario A, we assume a global freeze on new users joining the network as of 2018 and predict the resulting accumulation of dead profiles for each nation in the world. This effectively sets a ‘floor’ on the possible growth of dead profiles on the network. To carry out the analysis, we use a public dataset of projected mortality from 2000 to 2100, distributed by age group and nationality (United Nations, Department of Economic and Social Affairs, 2017). These data are matched with current Facebook user totals, scraped from Facebook’s audience insights application programming interface (API) for each country and age group. This allows us to estimate the number of Facebook users expected to die in any given country-year. In Scenario B, we expand the analysis to a hypothetical scenario for Facebook’s future growth, assuming that the network will continue to grow at a pace of 13% per year (Facebook, 2018) until it reaches a penetration rate of 100% for each country-year-age group. While unlikely, this estimate provides a ‘ceiling’ for the accumulation of dead profiles. In conjunction with the ‘floor’ defined in Scenario A, this ceiling defines the window within which we can expect the true number of dead profiles to fall.

In concluding the study, we situate the findings within the larger context of digital preservation (Whitt, 2017). We raise concerns over the current dominance of commercial data management, and warn that it may limit future generations’ access to historical data. We argue that profiles of the deceased are valuable in ways that cannot be quantified in purely economic terms, which is why we advocate an explicitly multi-stakeholder approach. If data are preserved solely on the basis of corporate profitability, we warn that non-economic considerations – e.g., the ethical, religious, scientific, and historical value of digital remains – may be neglected. Our digital heritage is difficult to measure in dollars and cents.

Three types of data were used to carry out the analysis: projected mortality over the 21st century, distributed by age and nationality; projected population data over the 21st century, also distributed by age and nationality; and current Facebook user totals for each age group and country.

Mortality rates were calculated based on UN data, which provide the expected number of mortalities and total populations for every country in the world (United Nations, Department of Economic and Social Affairs, 2017). Numbers are available for each age group – 0 to 100, divided into five-year intervals – and all years from 2000 to 2100, likewise divided into five-year intervals. The estimates are based on official data from each country’s government, and in some cases external sources (esa.un.org/unpd/wpp/DataSources/). It is unclear from the data how precision varies by country and year. All projections are reported as point estimates, with no standard errors or confidence intervals. For a more detailed account of the UN data, see esa.un.org/unpd/wpp/.

Facebook data were scraped from the company’s Audience Insights page (facebook.com/ads/audience-insights/) using a custom Python script that extracts Facebook’s active monthly users by country and age. These estimates are based on the self-reported age of users. Facebook provides lower and upper bounds for user totals across all ages and nationalities. For example, there are between 15 and 20 million 25-year-old Indians on the network.2 Variability increases with user counts, both of which are reported in round numbers divisible by 5 or 10, suggesting that they are not meant as serious estimates of standard errors or confidence intervals. We take the midpoint of each country-age window for our analysis.

Facebook’s audience insights API provides by far the most comprehensive publicly available estimate of the network’s size and distribution. Nevertheless, we wish to draw attention to several limitations of this dataset. First, there are reasonable doubts about the accuracy of Facebook’s reported monthly active users. The site has recently been sued for allegedly inflating these numbers with the intent of overcharging advertisers (Todd, 2018), and Facebook explicitly notes that their estimates are not meant to be matched with population data. In addition to these concerns about false positives, we also expect false negatives due to users visiting the site less than once a month. Unfortunately, it is impossible to say exactly how this affects results without more fine-grained detail on the distribution of errors. Second, the data exclude users under 18, preventing us from evaluating network activity among 13–17 year olds (Facebook requires all users to be at least 13). Due to the relatively low (although varied) mortality rate of this age group, the missing data should not have much impact on the projection until relatively late in the century. Third, users aged 65+ are all put into the same age category. This gives us less detailed data on penetration rates among the elderly. But as we show in the following section, this problem can be mitigated by extrapolating from a smooth curve fit to data from younger users.

Finally, we wish to emphasize that our model is devoted to the future development of death on Facebook, and therefore leaves out users who have already died and left profiles behind. Estimating the current number of dead profiles would require historical data on the age distribution of Facebook users in various countries, which are currently inaccessible through the site’s API. Furthermore, the aim of the study is to depict a larger, long-term trend, in which the current numbers play only an illustrative role.

Our methodological approach can be summarized by the following procedure for each country:

1. estimate a function f mapping age and year to expected mortality rates (see Figure 1(a));

2. estimate a function g mapping age to expected active monthly Facebook users (see Figure 1(b));

3. extend g across time under two alternative scenarios (details below);

4. multiply the outputs of f and g to estimate the number of Facebook profiles belonging to dead users of a given age in a given year (see Figure 1(c)); and

5. integrate this product across all age groups to estimate the number of dead profiles in a given year.

This pipeline is repeated for each country to get a global estimate. Projections are integrated over several years to get national or global estimates over time.

It should be noted that this approach makes a substantive and potentially problematic assumption, namely that each country’s Facebook users constitute a representative sample of the population, at least with respect to mortality rates. It is well established that internet usage, especially in developing economies, is strongly correlated with education and income (PEW Research Centre, 2018: 15). These two variables are in turn correlated with life expectancy, which means there is reason to believe that current Facebook users will live slightly longer than non-users on average. Our model does not account for this potential bias, which may result in an overestimation of dead users in developing countries.

However, a recent PEW research report (2018: 15) indicates that the divide is rapidly shrinking. Between 2015 and 2017, social media penetration in countries such as Lebanon, Jordan and the Philippines rose by more than 20 percentage points, suggesting that connectivity is fast becoming increasingly accessible. This trend is expected to continue throughout the 21st century, mitigating any potential confounding effects on projections years or decades out. Furthermore, the closer we get to full market saturation, the smaller the bias becomes since people with high and low life expectancies are both joining the network in large numbers. In the face of this, it is important to stress that the value of the present study lies in the larger trends it identifies, not in the details of the immediate future development. This should be kept in mind when assessing very short-term scenarios.

The model described in step (2) was trained on 2018 data. We vary projections for future Facebook growth according to two scenarios: (A) Shrinking. No new users join the network. All current users remain until their death. (B) Growing. The network grows at 13% per year across all markets until usership reaches 100%. To help extrapolate beyond the age of 64, the final age for which Facebook provides monthly active user totals, we anchored all regressions with an extra data point of zero users aged 100. This is almost certainly true in all markets, at least to a first approximation. Alternative anchor points may be justified, but do not have a major impact on results.

All statistical analysis was conducted in R, version 3.5.1 (R Core Team, 2018). Predictive functions were estimated using generalized additive models (GAMs), which provide a remarkably flexible framework for learning nonlinear smooths under a wide range of settings (Hastie and Tibshirani, 1990). Regressions were implemented using the mgcv package (Wood, 2017). A supplemental methods section, including data and code for reproducing all figures and results, can be found online at: https://github.com/dswatson/digital_graveyard.

We fit three separate models for each country

 Mortality_Rate=fC(Time,Age)FB_Users_2018=gC(Time=2018,Age)Population=hC(Time,Age)$\begin{array}{c}\text{Mortality}_\text{Rate}={f}_{C}\left(\text{Time},\text{Age}\right)\\ \text{FB}_\text{Users}_2018={g}_{C}\left(\text{Time}=2018,\text{Age}\right)\\ \text{Population}={h}_{C}\left(\text{Time},\text{Age}\right)\end{array}$

The subscript C indicates that each model is country-specific. We omit the subscript for notational convenience moving forward.

The mortality and population models provide nonlinear interpolations so that we can make predictions for any age-year in the data without the limitations imposed by the UN’s binning strategy.

Under Scenario A, we extrapolate model g beyond 2018 by assuming that no new users join Facebook and current users leave the network if and only if they die. This means we see zero 18-year-olds on the network in 2019, zero 18- or 19-year-olds in 2020, and so on. Attrition from current users can be calculated recursively. For each year t and age a:

Scenario A

 FB_Users=g(Time=t,Age=a)=g(Time=t−1,Age=a−1)×(1−f(Time=t−1,Age=a−1))$\begin{array}{c}\text{FB}_\text{Users}=g\left(\text{Time}=t,\text{Age}=a\right)=\\ g\left(\text{Time}=t-1,\text{Age}=a-1\right)\\ ×\left(1-f\left(\text{Time}=t-1,\text{Age}=a-1\right)\right)\end{array}$

In Scenario B, we extrapolate beyond g by assuming that Facebook will see constant growth of 13% per year in all markets until reaching a cap of 100% penetration. For each year t and age a:

Scenario B

 upper_bound=h(Time=t,Age=a)FB_proj=g(Time=t−1,Age=a−1)×1.13t−2018FB_Users=g(Time=t,Age=a)=min(upper_bound,FB_proj)$\begin{array}{c}\text{upper}_\text{bound}=h\left(\text{Time}=t,\text{Age}=a\right)\\ \text{FB}_\text{proj}=g\left(\text{Time}=t-1,\text{Age}=a-1\right)×1.{13}^{\text{t}-2018}\\ \text{FB}_\text{Users}=g\left(\text{Time}=t,\text{Age}=a\right)\\ =\text{min}\left(\text{upper}_\text{bound},\text{FB}_\text{proj}\right)\end{array}$

In both cases, our true target is

 y=∫10013∫21002018f(Age,Time)g(Age,Time)d(Age)d(Time)$y={\int }_{13}^{100}{\int }_{2018}^{2100}f\left(\text{Age},\text{Time}\right)g\left(\text{Age},\text{Time}\right)d\left(\text{Age}\right)d\left(\text{Time}\right)$

For the mortality rate model f, we used beta regression with a logit link function, a common choice for rate data. For the Facebook model g, we used negative binomial regression with a log link function, which is well suited for over-dispersed counts such as those observed in this dataset. We experimented with several alternatives for the population model h, ultimately getting the best results using Gaussian regression with a log link function. Parametric specifications for each model were evaluated using the Akaike information criterion (Akaike, 1974), a penalized likelihood measure. Age and time were incorporated as both main effects and interacting variables in models f and h, which were fit with tensor product interactions in a functional ANOVA structure (Wood, 2006). We use cubic regression splines for all smooths, with a maximum basis dimension of 10. Parameters were estimated using generalized cross-validation.

While there remains no good way to evaluate the precision of the underlying data – as noted above, neither the UN nor Facebook provides confidence intervals – we may quantify the uncertainty of the model using nonparametric techniques. GAMs provide straightforward standard errors for their predictions, but under both scenarios our true target y is a double integral of a product of two vectors. Unfortunately, there is no analytic method for calculating y’s variance as a function of those variables without making strong assumptions that almost certainly fail in this case.

For that reason, we measure uncertainty using a Bayesian bootstrap (Rubin, 1981). To implement this algorithm, we sample n weights from a flat Dirichlet prior and fit the models using these random weights. We repeat this procedure 500 times for each country and scenario, providing an approximate posterior distribution for all predictions, from which we compute standard errors. These numbers are reported in parentheses next to point estimates in the text, and in their own column in all table summaries.

As previously noted, the findings we present in this paper concern only the future accumulation of dead profiles (i.e., those who will die between 2018 and 2100). Naturally, many users have already left profiles behind when they passed away. This number, however, is unknown, but should (whatever it is) be added to the plots we present in Scenarios A and B below.

### Scenario A

Our first scenario assumes that users will cease joining the network as of 2018. While unlikely, this defines the minimum of the possible development, what we refer to as the floor (see Figure 2). Attached to the plot is a table with the exact numbers and share of each continent (Table 1).

 Table 1. Geographical distribution of dead profiles (in millions) under Scenario A.

Table 1. Geographical distribution of dead profiles (in millions) under Scenario A.

Under the assumptions of Scenario A, we estimate that some 1.4 billion (±11.15 million) Facebook users will die between 2018 and 2100 – fully 98% of the 1.43 billion users in our dataset. Under this scenario, the number of deaths per year on Facebook grows steadily for the next five decades, peaking at over 29 million (±0.31 million) in 2077 before decelerating through the rest of the century. The global sum of dead profiles exceeds 500 million (±3.86 million) in 2060 and 1 billion (±8.67 million) in 2079. Note that under these conservative assumptions, the dead will in fact overtake the living on Facebook in about 50 years. This corroborates popular claims in media (Ambrosino, 2015; Brown, 2013) about living profiles becoming a minority on the network within the (relatively) near future.

The plot further shows that Asia contains a growing plurality of deceased users for every year in the dataset, culminating with nearly 44% of the total by the end of the century. Nearly half of those profiles come from just two countries, India and Indonesia, which account for a cumulative 278.8 million (±9.8 million) Facebook mortalities by 2100 (see Table 2).

 Table 2. Geographical distribution of dead profiles (in millions) by country under Scenario A. Results for top ten countries shown.

Table 2. Geographical distribution of dead profiles (in millions) by country under Scenario A. Results for top ten countries shown.

### Scenario B

Scenario A is highly unlikely. For Facebook to see zero global growth as of 2019 would require some cataclysmic event(s) far more ruinous than the Cambridge Analytica scandal (Cadwalladr and Graham-Harrison, 2018), which revealed serious issues regarding the security and privacy of Facebook user data. To estimate how much higher the growth can possibly be, the second scenario sets a ‘ceiling’ on the development. We presume that Facebook will continue to see global growth of 13% per year until it reaches 100% penetration in all markets. As illustrated by Figure 3, this assumption drastically changes the total number of dead users by the end of the century.

A continuous growth rate of 13% per year increases the expected number of dead profiles on Facebook by a factor of 3.5, for a total sum of 4.9 billion (±97.23 million). Unlike Scenario A, the dead profiles do not show any signs of exceeding the living within this century. However, the proportion is still substantial, and the dead are likely to reach parity with the living in the first decades of the 22nd century.

A continuous 13% growth rate would change not just the total number of dead users, but their geographical distribution (see Tables 3 and 4). The most notable shift is the considerably increased share of global Facebook mortalities contributed by African nations. Nigeria in particular becomes a major hub of Facebook user deaths under Scenario B – in fact, the second largest in the world, accounting for over 6% of the global total. The shift is evident in Figures 3and 4. Niger, Mali and Burkina Faso also appear in the top 10 countries by dead profile count, while the United States is the only Western nation to crack the list. In other words, a minority of dead profiles will belong to Western users.

 Table 3. Geographical distribution of dead profiles (in millions) under Scenario B.

Table 3. Geographical distribution of dead profiles (in millions) under Scenario B.

 Table 4. Geographical distribution of dead profiles (in millions) by country under Scenario B. Results for top ten countries shown.

Table 4. Geographical distribution of dead profiles (in millions) by country under Scenario B. Results for top ten countries shown.

To illustrate the geographical distribution more clearly, we have included a heatmap that visualizes deceased Facebook users per country (see Figure 4). Unsurprisingly, the map closely tracks the list of largest Facebook markets. However, it should be noted that only two Western countries (the US and the UK) make it to the top 10 list under either scenario. Thus, the maps clearly show that death online is a global phenomenon, reaching far wider than just Europe and America.

To summarize, both scenarios are implausible. The true number almost certainly falls somewhere between Scenarios A and B, but we can only speculate as to where. Assumptions regarding growth rates have a major impact on both absolute numbers and geographical distributions of dead profiles. While richer data sources may help produce more accurate projections, an exact estimate is almost beside the point. Even in the conservative Scenario A, numbers are large. Facebook will indubitably have hundreds of millions of dead users by 2060 if not sooner.

With regards to the geographical distribution, it can be noted that in both scenarios, a handful of countries make up a large proportion of the total – mainly India (due to its large population) and the US (due to its high penetration rates), but also other countries like Nigeria and Brazil will be important stakeholders in this development. Next, we turn to a discussion of the challenges posed by the growth of death online.

Our projection of growth in dead Facebook users’ accounts marks the first step toward empirically exploring the macroscopic and quantitative aspects of death on social media. The results should be interpreted not as a prediction of the future, but as a commentary on the present, and an opportunity to respond with thoughtful and effective policy interventions.

Undoubtedly, there is a great deal of uncertainty in projections of this kind. In addition to the predictive variance discussed above, there is also uncertainty regarding the data underlying the model. For instance, we do not know if there will be a significant cultural shift among users towards deleting profiles (either one’s own or deceased relatives’, a possibility given to the appointed legacy contact). It is also possible that Facebook will unexpectedly go bankrupt in the foreseeable future, thus invalidating the assumptions underlying our models. As stressed by boyd (2006) among others, the longevity of social media sites depends on their ability to evolve, and despite the success of the past decade, we do not yet know how or if Facebook will manage to do this in the future.

But this has no bearing on our larger point – namely, that critical discussion of online death and its macroscopic implications is urgently needed (not least in regard to its geographical spread). Facebook is merely an example of what awaits any platform with similar connectivity and global reach. Furthermore, the sudden dissolution of Facebook would arguably make the subject even more important, as the company may be forced to sell or delete their user data. A sufficiently severe blow to Facebook’s finances could force a redesign of the platform with major implications for those currently using it as a memorial site (see, for instance, Arnold et al., 2018: 202 on how the relaunch of MySpace 2013 dropped features used by mourners). In what follows, we tentatively presume that Facebook or something like it will continue to exist for the foreseeable future.

Each individual who leaves a profile behind represents a unique event in its own right, which often leaves us with difficult questions of inheritance of digital assets (Banta et al., 2015Craig et al., 2013) and posthumous online privacy (Harbinja, 2014). But when aggregated, the totality of these cases amounts to something beyond the sum of its parts. The personal digital heritage left by the online dead are, or will at least become, part of our shared cultural digital heritage (Cameron and Kenderdine, 2007), which may prove invaluable not only to future historians (Brügger and Schroeder, 2017Pitsillides et al., 2012Roland and Bawden, 2012), but to future generations as part of their record and self-understanding. As stated by Matt Raymond, the former director of communications at the American Library of Congress upon receiving a large data donation from Twitter, ‘Individually tweets might seem insignificant, but viewed in the aggregate, they can be a resource for future generations to understand life in the 21st century’ (Raymond, 2010). Such records can thus be thought of as a form of future public good (Waters, 2002: 83), without which we risk falling into a ‘digital dark age’ (Kuny, 1998Smit et al., 2011).

Despite its seeming immortality, digital information is more fragile than is sometimes assumed, and future access is far from guaranteed (Whitt, 2017) – even for Facebook itself. File formats change, hardware must be updated, and data need to be continuously stewarded and organized in order to remain useful. As Jeff Rothenberg (1995) says, ‘Digital information lasts forever – or five years, whichever comes first.’ This is not primarily due to storage costs. ‘The real cost of storage’, as Palm (2006: 5) puts it, ‘is management.’ To maintain data utility, firms must routinely upgrade systems and tend to their contents, a costly and tedious undertaking for which Facebook’s current curation model was not designed. Lavoie and Dempsey (2004: 229) put it well:

Preserving our digital heritage is more than just a technical process of perpetuating digital signals over long periods of time. It is also a social and cultural process, in the sense of selecting what materials should be preserved, and in what form; it is an economic process, in the sense of matching limited means with ambitious objectives; it is a legal process, in the sense of defining what rights and privileges are needed to support maintenance of a permanent scholarship and cultural record … And perhaps most importantly, it is an ongoing, long-term commitment, often shared, and cooperatively met, by many stakeholders.

While Lavoie and Dempsey write primarily for an audience of librarians and archivists, their argument is equally applicable to the case of digital remains on Facebook: the cultural/ethical process of selecting whose data are worth preserving, and how to preserve them, is inseparable from the economic constraints that induce the question. But how is one to determine what is worth preserving? This requires a normative framework, one or several guiding principles that help us determine the value of data. There are many possible candidates for such a principle. An object can be appreciated for its sentimental, scientific, religious, or aesthetic values, to list just a few considerations of note. Furthermore, it is plausible that different regions, nations and other interest groups will appreciate different values in Facebook’s mounting historical record. Nevertheless, it is neither users nor their political representatives or religious groups who determine how their data is collectively managed – it is the corporate interests of Facebook.

For a firm, what makes data ‘worth preserving’ is ultimately their ability to directly or indirectly contribute to the company’s profit. Data belonging to deceased users may prove valuable for such purposes. For example, the memorialized profiles may still serve the function of attracting living users who visit the profile to mourn (Karppi, 2013). Indeed, time spent on Facebook can even be understood as a type of labour (Fuchs and Sevignani, 2013). While the (indirect) traffic generated by mourning relatives may not single-handedly result in enough clicks and exposure to cover the costs of curating the dead, it could still serve the indirect function of appropriating central social functions such as mourning and love (Öhman and Floridi, 2017). What is more, datasets of digital remains may also be used for training new models (Leaver, 2013) and extracting historical insight, which may provide a valuable market advantage. Few legal obstacles stand in the way of such experimentation, as deceased users are not, at least according to current legislation, protected the way living users are (see, for instance, the latest GDPR, which lacks any clear guidelines for handling digital remains).

While both the traffic generated by the bereaved and the internal training of new models are possible uses of digital remains, they do not guarantee long-term profitability. If the economic value of dead profiles were ever to become negative, market forces would compel a rationally self-interested firm to delete them. This seems to be the preferred option for many other social media including Twitter (Twitter.com, n.d.), and is also advocated on a normative basis by some scholars, perhaps most notably Mayer-Schönberger (2009). But thus far Facebook appears to have found the net value of dead profiles to be positive.

This is not to say that Facebook, nor any other platform, only appreciates the commercial value of digital remains as a source of financial exploitation. In fact, Facebook has carefully considered the ethical implications of their policy (Brubaker and Callison-Burch, 2016), and has removed advertisements from the deceased profiles, thus virtually de-commercializing the space. But the de-commercialization itself may be interpreted as a response to market incentives, in so much as it is rational for firms to maintain the good will of their customers. Curating a deceased relative’s profile could keep some users on the platform, even if it is not the main source of revenue generated by them.

Market incentives may often overlap with the interests of researchers, consumers and future generations – but they are by no means identical. Markets have been discussed rather extensively in the digital preservation literature. For instance, Lavoie (2003: 15) identifies three ideal type-roles in the economics of digital preservation: Right holder, Archive and Beneficiary. Sometimes, these roles are played by a single entity, sometimes by separate ones. Lavoie stresses that in so-called supply-side models (17), where the Right holder and the Archive are the same entity but the Beneficiary is external, there is a risk that the market does not create sufficient incentives for preservation. This is indeed a risk in the case of Facebook. The platform has both the rights to the information stored and is the archiving entity. Moreover, they have little incentive to share (to say nothing of the complexity of posthumous privacy rights). The beneficiaries – in this case future generations and historians – can neither speak for themselves nor create any current incentives, which make a purely free-market model inappropriate.

This situation requires what one may call a new macro-ethics of deletion (to borrow a term from Floridi, 2013), a curation model that encompasses and appreciates the various kinds of values involved. In line with Lavoie and Dempsey’s argument, we therefore conclude that multiple stakeholders must be considered. These stakeholders may include states, NGOs, universities, libraries, museums, and any other kind of institution that provides unique perspectives on the value of our digital heritage. The multi-stakeholder approach is not in itself a novel proposal. Indeed, the pioneering Task Force on Archiving of Digital Information (1996) was composed of a collection of individuals representing industry, museums, archives and libraries, publishers, scholarly societies and government. And newer initiatives, such as UNESCO’s strategic plan for software heritage (Di Cosmo and Zacchiroli, 2017: 4), have continued to stress the value of diversity in digital preservation:

We believe that, for Software Heritage, it is essential to build a not-for-profit foundation that has as its explicit objective the collection, preservation and sharing of our software commons. In order to minimize the risk of having a single point of failure at the institutional level, this foundation needs to be supported by various partners from civil society, academia, industry, and governments, and must provide value to all areas that may take advantage of the existence of the archive, ranging from the preservation of cultural heritage to research, from industry to education.

While the above quote deals mainly with software, the same can be said about the vast datasets accumulated by social media firms. It is important that historically significant data are preserved in a way that serves all of humanity, and this cannot be done by allocating the curation of historical social records to any one agent operating in its rational self-interest.

Finally, we wish to stress the importance of decentralizing control over aggregates of digital remains. Concentration of historical data in private hands may prove problematic for political reasons (Lor and Britz, 2012Öhman, 2018). While it is true that one’s digital remains are often distributed over multiple platforms and media (Cann, 2014; Pitsillides et al., 2012: 19), it seems that control of personal data (and hence digital remains) are increasingly concentrated in a small number of global actors (many of which are owned by Facebook, e.g. Whatsapp, Messenger, and Instagram). And, as Orwell so adroitly observed in 1984, those who control our access to the past also control how we perceive the present. So, in order to prevent a possibly dystopian future of power asymmetries and distorted historical narratives, the task before us is to design a sustainabledignified solution that takes into account multiple stakeholders and values. This inevitably requires a decentralization of control and ownership of our collective digital heritage.

Academic knowledge will be key in this process. Researchers are charged not just with providing macro-level analyses like this one, but also with providing qualitative knowledge of how individuals in different cultures and social settings make sense of death and the digital. When it comes to qualitative research, there is already a rich literature upon which to draw (Bell et al., 2015Brubaker et al., 2016Kasket, 2012). However, researchers have hitherto mainly focused on North American and European settings – with some exceptions (Choudhary, 2018). If the goal is to contribute to a fair and flexible system for curating digital remains, researchers must increasingly turn to non-western contexts, where the phenomenon is going to have the largest presence. While survey data from previous studies do not indicate any radical differences in attitudes toward online death across cultures (Grimm and Chiasson, 2014), a qualitative, nuanced understanding of this fast-evolving subject is required. We therefore encourage scholars of online death to widen the geographical scope of their research, and focus particularly on South Asia and Africa, where our models suggest the phenomenon will be most prevalent in the coming decades.

This study has provided the first rigorous projection of the accumulation of Facebook profiles belonging to the deceased. Will the dead then, ‘take over’ Facebook? We have concluded that hundreds of millions of dead profiles will be added to the network in the next few decades alone, and that the dead may well outnumber the living before the end of the century, depending on how global user penetration rates evolve. Irrespective of how the network grows in the years to come, the vast majority of dead profiles will belong to users from non-western countries.

Considering its global reach, we have argued that the totality of deceased user profiles amounts to something beyond the sum of its parts. These profiles are becoming part of our collective record as a species, and may prove invaluable to future generations. We believe that a multi-stakeholder approach is the best way to curate such a vast archive. We have also stressed that in crafting a future curation model, qualitative understanding of how different cultures make sense of death and the digital will be key. Likewise, the development poses difficult ethical problems that require careful consideration. The onus is now on policymakers and industry to rise to these challenges. We look forward to taking part in the debates to come.

We wish to express our sincere gratitude to the four referees who reviewed this study. Their insights and input have substantially improved the final result. We would also like to express our thanks to Patrick Gildersleve for helping us with the Python script that scraped data from the Facebook API.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.