This article, written by Joe Yogerst, originally appeared on the Los Angeles Times website.
One of the nation’s leading authorities on big data retrieval, Erik Linstead is an assistant professor of computer science in the School of Computational Sciences at Chapman University’s Schmid College of Science and Technology. We asked him why big data retrieval is so important right now and what to expect in the not-so-distant future.
Q: Is there a shortage of big data scientists in the U.S?
A: I was recently reading an article that states that in the next five years in the U.S. we could be facing a shortage of almost 200,000 qualified data scientists. This is in part due to the diversity of skills required to mine big data effectively.
Q: What skills are required for effective big data mining?
A: One needs to be mathematically savvy as well as able to design and apply new computational approaches, because the volume of data being generated makes standard computational tools alone ineffective. Because every data set is domain-specific, data scientists also have to have the skills to talk to experts from a wide variety of areas and know what questions to ask so the data can be effectively leveraged.
Q: Why has the analyzing of large data sets become important for U.S. companies and industries?
A: The volume of data being generated … in every industry has grown drastically. This data can be used to build predictive models, understand consumers, improve marketing, evolve business models, develop products and more. There is tremendous value in the data, but you have to be able to deal with the volume, and you have to be able to deal with noise in the data. That’s where big data comes into play. It allows us to find the “signal” in the data, as well as build models that explain that signal.
Q: How does big data create value for a company or industry?
A: The better you are at understanding your customers, products, services, etc., the more effectively you will be able to compete.
Q: What components are essential to develop a game plan for assembling, integrating and optimizing data?
A: It’s challenging. You need to have the right infrastructure to collect and store data, as well as the computational infrastructure to analyze it in a reasonable amount of time. More importantly, you need to know what it is you’re looking for in the data — how you want to leverage it. This guides the process of identifying the big data techniques that will be effective in answering the questions you are interested in. Once you have the analysis — the answer to those questions — you have to decide what it means for your business. It’s an iterative, collaborative process.
The good news is that the computational infrastructure problem is reasonably straightforward. There are several options, and many of them are affordable. Having the right people with the right mix of skills to integrate, model and interpret the data is a much harder problem. At Chapman University, our graduate programs in computational and data science are trying to attack this problem by producing students with the mathematical and computational backgrounds to effectively solve big data problems.