BIG DATA FOR ALL: Resources, references, links, and more.

Photo Credit: DARPA


I wrote this post before 
In a recent article in Forbes, "Big Data In the Enterprise: A Lesson or Two from Big Brother",  Franz Arman, of SGI,  points out that out that in March, the U.S. government spearheaded a Big Data initiative.  

Why is this important?  

With advances in technology, a vast amount of data has been generated and stored, and much of this information has not yet been properly organized or analysed.   Decisions are made, across sectors, every day, without the data required to support them.   In some instances, such as health care or national defense, this can be a life or death matter.  

A main goal of the US government's Big Data initiative is to harness and utilize the enormous amount of information that is gathered across a number of agencies and departments, including archival data. Through this process, researchers and technologists from a variety of disciplines will be forging new territory while gaining related skills and expertise.  

Arman goes on to point out that there growing need for "big data specialists" within both the private and public sectors, and also a need for people to develop more "big data" skills.  From his viewpoint, the private sector might be a bit behind the ball when it comes to Big Data decision-making and analytics, citing a 2012 global study of over 600 business leaders. According to the study, 85% of the respondents said that the main challenge they face with Big Data is the "ability to analyze and act on the data in real time."   Quite a challenge! 
(For more information, the slides from the study can be found near the end of this post.)

Big Data, Everywhere, for Everyone

Big Data plays a role in our lives in one way or another.  For most people, Big Data is used by others to figure us out, not the other way around. More people than ever before freely share information that would rarely be shared online 10 years ago. Pictures, video, blog posts, tweets, interests, "likes".  The information we share is analysed in order to generate  the ads we see when we use Gmail, Facebook, or when we are shopping online and come across product or service recommendations.

Big Data isn't just for retailers, corporations, government organizations, researchers, scientists, techies, financial analysts, or policy wonks. Big Data can be put to work in a number of ways.   It is for everyone, young and old.  The rest of us. My mom. Your grandpa.  My grandchild.  

Once the kinks are worked out, it is likely that we'll have Big Data in our pockets.  We'll interact with (and share) "big data" across a number of devices, screens, and surfaces, all connected via the Internet of Everything - computers, tablets, smartphones, sensors, buildings, appliances, vehicles, buildings, roads, wildlife sanctuaries, and more. 

Moving Towards Big Data Literacy
Large corporations such as CiscoIBMMicrosoft and others are preparing for what the future holds, and are spreading the word through television ads, corporate websites, videos, blog posts, presentations, and publications. One example that has had a lot of airplay is Cisco's commercial, "Tomorrow Starts Here" ad.  Instead of explaining the details about the company's goods and services, it focuses on informing us of the road ahead. 

It takes some time understand the concept of Big Data. Many people need training wheels. Revisiting your old statistics textbook is a step, but getting hands-on experience might be more exciting. One good place to start is IBM's  Many Eyes website. You won't find streaming data on the site, but you can learn quite a bit about working with data sets and visualizations.   Users can upload data sets to create a number of data visualizations. Data visualizations can be created from data-sets uploaded by other users as well.  The website provides easy-to-understand FAQs, "how-to" information, and message boards. 

Another good place to learn more about data is the  U.S. website.  It has a vast store of data-sets and apps, including mobile data apps. There is support for developers as well. The website includes a section designed specifically for the education community. There are links to a number of open data sites, including data from 36 states, 20 US cities and counties, 181 agencies and sub-agencies, and 41 international sites.The Next Generation provides government agencies with a way to share their public data in one place, using a cloud platform.  

If you are tech-savvy, Google's Big Query provides free access to existing sample data-sets, up to 100 GB a month, for free, and it works fast. There is a fee for users who want to analyze more data and create their own tables.  Amazon Web Services provides public data sets that are available to subscribers.

Below is a video from Microsoft that provides an overview of Big Data and the potential for the use of machine learning to support knowledge discovery:  Machine learning algorithms allow the computer to efficiently and effectively determine patterns and co-relations that would not be detected through traditional means. 

Its about the data, but don't forget the humans!

An interesting attempt to encourage people to think about Big Data is the Human Face of Big Data project, which is discussed in detail in a recently published book of the same name.

According to information posted on the project's YouTube channel, "The Human Face of Big Data is a globally crowd sourced media project focusing on humanity's new ability to collect, analyze, triangulate and visualize vast amounts of data in real time."  


The Human Face of Big Data project's primary sponsor is EMC Corporation. Additional sponsors include CiscoFedExVMwareTableau, and Originate. The project is discussed in more detail in the video below:

A Human-Centered Approach
In my experience, the most of the productivity applications I've used over the course of my career have been clunky and somewhat user-unfriendly, including data applications.  I find that there are too many steps to follow. Too many clicks are required to get from point A to point B.  Many applications still don't play well together or allow for easy collaboration with a colleague. This is true across many fields, and is a problem that must not be ignored.

Developers, designers, engineers will need to think seriously about how "Big Data" applications, technologies, and systems will be used in a variety of contexts and scenarios.  What sort of information, and in what format, will be needed for data exploration? Preparing presentations?  Collaborating on a team project?  Making mission-critical or life and death decisions?   Building a shared knowledge base?  Calling others to action? 

Big Data Literacy is likely to become the next push in the workplace. More people will be required to be knowledgeable about data than ever before in order to carry out their jobs effectively. For this reason, Big Data R&D teams will need to include people with backgrounds related to the fields such as human-computer interaction, cognition, and the social sciences. What sort of interactions and interfaces will be required? Immersive 3DMulti-touch & Gestures?  Audio displaysTactile graphics?

Government and Big Data
Big Data is a complex subject. As mentioned previously, the US government, by providing the website and funding Big Data initiatives across government agencies, is moving in the right direction. But I do have concerns.

All of the data in the world is meaningless if it can't be understood, or easily accessed, or provide information needed to support effective and efficient decision making processes. Even with machine learning algorithms, the data is meaningless if humans make incorrect assumptions when developing or choosing these algorithms. Even the most brilliant quants can stumble - think about the events surrounding the economic crisis. The old saying, "Garbage in, garbage out" still applies.

In my opinion, there are many questions that remain unanswered.   Who will lead the leaders towards Big Data Literacy? Can we trust that those in leadership positions in the private and public sector have a good grasp of the issues related to Big Data? 

Who can we entrust to make appropriate decisions about the way Big Data is gathered, filtered, archived, curated, and accessed?  Who is responsible for ensuring data quality and integrity?  What about data security and privacy concerns? 

It will be interesting to see how the Big Data story will unfold.

Examples of the U.S. Government's Big Data Initiative:

National Science Foundation and the National Institutes of Health - Core Techniques and Technologies for Advancing Big Data Science & Engineering
“Big Data” is a new joint solicitation supported by the National Science Foundation (NSF) and the National Institutes of Health (NIH) that will advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large and diverse data sets. This will accelerate scientific discovery and lead to new fields of inquiry that would otherwise not be possible. NIH is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other data sets related to health and disease." -Whitehouse Press Release, 3/29/12)

Department of Defense (DoD) Data-to-Decisions S&T Priority Initiative (pdf) Dr. Carey Schwartz, PSC Lead, Office of Navel Research

DoD/DARPA Minds' Eye Program"The Mind's Eye program seeks to develop a capability for “visual intelligence” in machines. Whereas traditional study of machine vision has made progress in recognizing a wide range of objects and their properties—what might be thought of as the nouns in the description of a scene—Mind's Eye seeks to add the perceptual and cognitive underpinnings needed for recognizing and reasoning about the verbs in those scenes. Together, these technologies could enable a more complete visual narrative." 

DARPA Calls for Advances in "Big Data"to Help the Warfighter DARPA to engage Applied Mathematics, Computer Science and Data Visualization communities to develop "big data" analytics and usability solutions for warfighters. (DARPA Press Release, 3/29/12)

"The XDATA program aims to meet challenges presented by this volume of data by developing computational techniques and software tools for processing and analyzing the vast amount of mission-oriented information for Defense activities. As part of this exploration, XDATA aims to address the need for scalable algorithms for processing and visualization of imperfect and incomplete data. And because of the variety of DoD users, XDATA intends to create human-computer interaction tools that could be easily customized for different missions. Finally, to enable large scale data processing in a wide range of potential settings, XDATA plans to release open-source software toolkits to enable collaboration among the applied mathematics, computer science and data visualization communities."

National Institutes of Health: 1000 Genomes Project Data Available on Cloud
"The National Institutes of Health is announcing that the world's largest set of data on human genetic variation - produced by the international 1000 Genomes Project - is now freely available on the Amazon Web Services (AWS) cloud.  At 200 terabytes - the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs - the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them.  AWS is storing the 1000 Genomes Project as a publicly available data set for free and researchers only iwll pay for the computing services that they use.

Department of Energy: Scientific Discovery Through Advanced Computing
"...The SDAV Institute will bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the Department's supercomputers, which will further streamline the processes that lead to discoveries made by scientists using the Department's research facilities.  The need for these new tools has grown as the simulations running on the Department's supercomputers have increased in size and complexity."


DARPA Investigates Big Data's 'National Security Threat'
Mike Wheatly, Silcon Angle 8/14/13
Obama Administration Unveils "Big Data" Initiative: Announces $200 Million in New R&D Investments (pdf)
Fact Sheet: Big Data Across the Federal Government (pdf)
NSF: Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA)
NSF-NIH Interagency Initiative
USGS Press Release: Big Data a Big Deal
Department of Defense: Mind's Eye Press Release 
Denny Lee, MicrosoftThe New World of Data:  IOT + Big Data + Cloud (pdf) (good overview of topic, from Microsoft's pov)
Big Data Experience in the Federal Government: 160+ Efforts Identified  Alex Rossino, GovWin Network Blog, 11/13/12
Datanami is a website dedicated to news and information about Big Data and related topics.)
Open Data Foundation is a non-profit organization that focuses on the improvement of data and metadata accessibility and quality.
Data Quality Campaign (Educational Data Website)
The Convergence of mobile and Big Data Will Kelly, TechRepublic, 12/17/12
Big Data Initiative Or Big Government Boondoggle? Doug Henschen, InformationWeek, 4/2/12
Overview of Information Visualization Concepts
Dianna Xu & Deepak Kumar, Bryn Mawr:
CS380 Slides, Part 1 (pdf)
CS380 Slides,Part 2 (pdf) 
Note: the above presentation references Robert Kosara, my Information Visualization professor from a graduate course I took a few years ago.  He now works as a Visual Analysis Researcher at Tableau Software.

No comments: