Out from Under Too Much Data

Shelley Podolny’s March 12 New York Times column, “The Digital Pileup,” started me thinking.  “Because electronic information seems invisible, we underestimate the resources it takes to keep it all alive.” Podolny reports global data usage of 1.2 zettabytes (a lot of gigabytes). For the US alone, 3% of the national power supply supports “server farms,” the giant data centers with aisles and aisles of servers. 
 
Three per cent may not seem like a lot. As the world reels from one natural disaster to another, at least some of which have to be connected to rising energy consumption, it adds up. On other occasions, I have commented on how the move to cloud computing—Salesforce, Google, and so on—reduces energy usage. It concentrates resources. Hmm. Maybe it only makes power demands more manageable against the rate of growth of data storage. 
 
Podolny suggests that by 2020, “the [information] volume  will be 44 times greater than it was in 2009. There finally may be, in fact, T.M.I.” It’s that last comment that prompted this reflection. My job often involves helping organizations collect more data—facilitating better contact management and data tracking, enabling and encouraging richer web content, and so on. For sure, planning involves balancing need against capacity. Yet the main trend is more. 
 
In this context, Podolny says, “Despite the conveniences our online lives provide, we end up being buried by data at home and at work. An overabundance of data makes important things harder to find and impedes good decision-making. Efficiency withers as we struggle to find and manage the information we need to do our jobs. Estimates abound on how much productivity is lost because of information overload, but all of them are in the hundreds of millions of dollars yearly.” 
 
Even as organizations trend toward collecting more, better data, we need to regularly ask when we have too much.  Some thoughts.

Taking the Time to Focus on What We Need

First, data needs regular cleansing. And no one is immune. Just this week, we purged 30% of the contacts in a key internal system. They had been mingled in when we upgraded from an older system, marked as inactive and ignored. Yet however dormant, it turns out that gaps compared to current needs hindered synchronization with our time tracking system. For a year, it has added additional monthly steps.  
 
When you bulk load old data into new systems, sometimes rules of data integrity and validation get by-passed to shoehorn old data in. You may use Demand Tools, the Data Loader or other powerful data manipulation tools to get started with Salesforce. Yet you (or more likely we) may be tempted to leave some new data rules turned off until mounds of historical data find their new home.  
 
Lesson one: Data storage may cost less, but still bears an organizational price. Better to invest the time up front, and then through periodic reviews, of old, data clutter. 
 
Second, while some still wrestle with getting more news up on their website, too much content holds others back. It’s really the same story for content as with data.  Has your migration to  Drupal or other modern system gotten pushed back and back because of the weight of old site content? Less frequently used pages still may need tons of attention  to realign them with new site navigation and search tools.
 
Some possible lessons here about large web projects: Focus first on the 30, 40, 50 pages, or whatever the number, that get used the  most. Those probably need to be rewritten anyhow. Leave the 80-90% of the pages that get used only 10 or 20% of the time in their old format until they can be pared down, reindexed if still needed, or archived into some other format. 
 
Another lesson: If you see a new website coming, make sure now you have good, usable analytics on the current site. When it comes times to begin active planning, you want to know which pages and resource get used the most.  You also want to know which things visitors search for the most, may exist, but don’t get accessed, suffering under the dead weight of the total information past. 

Even Good Back-ups Proliferate Data Overload

Third, in a time where we need to pay more attention to back-up policies, we can’t neglect duplicative and redundant data storage. We understand this better about social  media. Once a photo posts, copies may proliferate very quickly, both by users and by the systems they sit on. Podolny estimates that 70% of information storage today is generated by individual use, and that 75% of all data storage is duplicative. 
 
Lessons here: We need to make it easier for staff and constituents people to forward links to content and use it on line, and not just attach full document copies to group emails. As for formal back-up, the storage for a few hundred web pages, documents and multimedia needs to be multiplied by how many back-up copies exist. A “good” back-up policy many mean multiple copies in many places. Organizational email may also be multiply backed up, consuming gigabytes of storage. The lesson here is, make sure you do have redundant back-ups, yet also make sure that they haven’t grown so large and unwieldy as to be impractical when you need them. 
 
Fourth, with data back-up especially, we need to be sure of our confidential information. A client finally gave up on using social security number as a convenient contact search index. Removing the data point from active use is one thing. Now comes the hard part: finding all those back-ups, including ones in inconvenient archival formats, that still pose a data security risk. 
 
As Podolny points out, “In the corporate realm, companies stockpile data because keeping it seems easier than figuring out what they can delete. This behavior has hidden costs and creates risks of security and privacy breaches as data goes rogue.” Data security laws, such as the new ones in Massachusetts, apply equally to nonprofit organizations as to private businesses. 
 
Polodny’s call to action makes sense: “We can live a productive digital life without hoarding information. As stockholders and consumers, we can demand that our companies and service providers aggressively engage in data-reduction strategies. We can clean up the stockpiles of dead data that live around us, be wiser data consumers, text less and talk more. We can try hitting delete more often.”
 
The overall message: whether for the environment or our sanity, even though technology is giving us “more” digital data, less may be more.
 
 

 

Trackback URL for this post:

http://idealware.org/trackback/2484

Comments

Been thinking about this myself

Like Heather commented, I'm usually the 'go to' person when someone has lost some file, years old maybe, because I usually have it and can find it. But I've been wondering too, about how useful some of this data is and what/when to get rid of it. Also thinking about cleaning old data out of databases, so that it just gets easier to think about and work with. Appreciate any lessons/rules that others have set up to help decide what data to get rid of.

Guilty as charged

I have been a data hoarder and actually took pride in the fact that I am providing in essence a living archive of all my past companies and clients' work. It does come in handy when a client from 5 years ago pops up to ask for a hi res version of their logo that I have resent several times over the years already, but that they lost again.

However, this years computer crashes and restorations from back ups have led me to rethink the amount of data I keep (emails, files, even program settings and bookmarks). One way I am attempting to create some order on a personal level is to create rules around what to keep and for how long. Sort of like our house tax documents rule - after the audit period expires, out they go.

The rules you suggest for migrating websites are spot on target with my thinking and should create new sites of greater value as well as reduce the load.

Thanks for the post Steve, it has motivated me to keep rethinking my own data bloat.