DataStreams’ Lee Young-sang: Making Sense of Big Data

Lee Young-sang, president and CEO of DataStreams

The explosion of data on the internet in recent years is mindboggling – so much so that the actual number of bytes is completely meaningless for most people. It’s safe to say, though, that many companies could potentially be drowning in their own data. Yet, the power contained in this data is extremely meaningful to those who know what to do with it.

In fact, big data has been at the heart of almost every area of public and private enterprise by exponentially boosting productivity and growth – and ultimately, the bottom line. The retail, information, finance, and insurance sectors have all increased productivity by not only selling more, but by reducing mistakes and detecting fraud, thereby cutting costs. Big data, then, is a powerful builder of businesses and organizations. And now, with the burgeoning IoT and AI technologies, the promise of big data is even more dazzling.

The problem, of course, is to knowing how to make sense of it all.

Lee Young-sang is the President and CEO of DataStreams, a data management company headquartered in Seoul, Korea. His breakthrough product, TeraStream, was introduced to the Korean market in 2001 and quickly became the only viable competition to imported products. According to the DataStreams’ website, their client list includes 3 of the top 10 global manufacturing companies, 5 of the top 40 global insurance companies, and 90% of Korea’s banking industry.

Now, after over a decade of success in Korea and Asia, Mr. Lee has taken his high-speed and real-time technology into the U.S. market. Below is an in-depth conversation with Lee Young-sang.

Q: DataStreams began with TeraStream, a data integration program. What makes it such a successful product

We started with a large data processing technology, and the speed and volume of the data were the cornerstones of our business. At the time, relational databases like Oracle were too slow for the large volumes of data, not to mention the complicated operation system. Plus, the speed to a large extent depends on the user. We have a very high-speed engine to shorten the data processing time, which is good for very large volumes of data. We use this technology for data integration. When we first introduced this technology, it was quite an innovation. Our competitor, DataStage, was very slow. We were about four or five times or in the first version about 20 times faster than DataStage. This is why we were so dominant in that particular market, especially in finance. So it was basically all about speed.

Q: With your expertise in data integration, you are now looking to branch out into data analysis. You refer to this intermediary stage as data quality. What specifically entails data quality; how difficult is it to achieve; and what sort of technical skills are required for this

A: In industry terms, data quality means standardization. Because we use computers – and they aren’t as intelligent as AlphaGo – the data must be standardized so that the computer can distinguish between information that may seem similar but are different. Humans do this from context, but the computer lacks this contextualizing ability. We take the data, integrate it, and perform a series of management tasks, in which “keys” are needed. The keys should be unique – a person’s name, for example, but in cases where the same name is used, the data becomes meaningless. To guarantee data quality, a data quality solution such as MetaStream is needed. So everything starts with meta data, the definitions of the data, and data standardization. Then, in order to keep the data sound, it needs to be compared to that data in standard form.

The metadata management repository doesn’t store the content itself – just what we refer to as the meta data of the content. So rather than a specific name, it would store “family name” and “given name.” In our meta data management solution, we have a data glossary with the terms, context, and meaning. Every definition is stored in the repository of our metadata management solution. Another is the address. As you move, your collected data becomes old data, and unless it is updated, the data becomes unusable. Our MetaStream software can standardize this data. It’s basically just a process of comparisons – many, many comparisons. And because there are so many comparisons it takes a lot of computing power, which limits the results because the data could deteriorate between the relatively long periods between comparisons. So keeping the data in standardized form is very important and MetaStream can do this.

(From left) TeraStream, TeraStream Bass, Q-Track

Q: TeraStream has the most in demand in Korea, but what about elsewhere Does the popularity of your products vary from country to country – especially now that you’ve entered the US market

A: We’re starting from TeraStream – we have 11 or 12 lines of products so all can work harmoniously to achieve data governance and complete foundation of data. TeraStream has been the most proven solution for the global customer but recently we’ve concluded quite a large contract in Japan, which means that we are expanding our territory beyond TeraStream to include DataStream.

Q: What is the development status of your big data platform

A: It was natural and easy for us to expand our territory to big data because we already have a data integration platform. So we added big data parts like Hadoop. With the same platform, we added Hadoop so people don’t need to know how to operate it. Hadoop works as part of TeraStream, so it is very user friendly.

Q: How difficult was it for DataStreams to make that integration

A: It wasn’t difficult. Hadoop is just file processing but it is just distributed file processing, so we just added that technology as a part of ours. Hadoop is a distributed file system, which means you have to organize the information. Think of it like this: if there are 10 nodes then there are in essence ten workers. It is difficult to orchestrate the 10 workers to work harmoniously together. This requires a strategy to get the 10 nodes to work together. This is where algorithms come in. Once you set up the algorithm, then you need to make a Java program. So it’s a very difficult process, and you need high tech engineers to develop the Hadoop program. Some companies find it difficult but it was relatively easy for us to integrate it with our technology.

Q: Data procurement isn’t just for businesses. What other applications are there

A: Big data and in particular our technology can be used for many purposes—including national security. In the conventional method, when attackers contaminate software on users’ computers or important institutions, the data is collected from the logs of the system but when you store that log data or traffic data to analyze later – it is too late. But we can pick up that data from the traffic and compare them in real time. Then you can respond to that threat right away. This can be used to safeguard national security but it is highly applicable to the maintenance of good infrastructure as well.

Public facilities like railroads or bridges or construction would all benefit from this technology. You could, for example, use IoT sensors to collect data in real time, and the appropriate authorities could monitor the data in real time to assess any potential situations and respond to it. In health, wearable devices could allow health professionals to monitor a dangerous situation for, say, elderly patients, so they can immediately recognize the danger and respond.

Q: DataStreams has done very well in the Korean market. You are poised now to take on the US market. How steep has been the learning curve

A: We are in many ways a new start up in the American market, so we’ve experienced frustration, then hope. And this repeats itself. But the important thing is that they understand and appreciate our technology. They appreciate that it is very advanced compared to that of similar domestic products.

Q: So would you say that your edge really has to do with the speed and the real time

A: Speed and performance were the strong points when we were a start up in Korea but these days it’s not only the speed but the variety and “concepts” of our products as well. We are competing with many other companies, and the reason why we possess many types of technologies is because they are needed for data governance to build a good foundation of data – a good infrastructure of data requires all of this technology. So I’m very hopeful about the US market that this powerful technology is recognized for what it is. Also we are able to achiever better data governance. Customers need data governance for better analysis because the integrity and credibility of the data are guaranteed, and we do this better than most of the other products.

Q: AI will be the basis for virtually every industry and will determine the success and failure of the security industry. What do you see DataStreams’ role in all of this

A: A terrorist could attack through the network and the attack pattern could be variable making it difficult to catch. So you would need an intelligent system to recognize that as an attack. But when you insert the rules, the attacker could avoid them because the rule is just a fixed form. We need a type of machine learning so if we learn that this is an attack it can be listed in the pattern as an additional rule. The machine learning needs to understand the type of attack even though people don’t provide that particular information.

Q. So you’ve cornered the market on data integration and data quality and now you want to get into data analysis.

A: Yes, this is a very natural course for us because after data integration and quality, you have a very high quality data with your platform. You can then use your data because we have considerable experience with the data warehouse. The data warehouse uses the data to make analyses so we can expand that area into big data. We anticipate a lot of business in data analysis.

Q: AI has the ability to predict. Is that what you’re looking forward to next

A: Not only prediction. Prediction just happens in the data warehouse technique. For prediction you just need past data – history data – so you can anticipate what happens next. But AI is a little different. It is not only prediction but also situational understanding with data, which is very important in the area of dynamic data analysis.

Q: Let’s move on now to the fintech [finance + technology] industry. Fintech is supporting the globalization of Korea’s financial industry. For Fintech’s drive into the overseas market, its success again depends on big data. What do you see as DataStreams’ strengths in this move

A: In the first stage of fintech’s business, they don’t really care about big data. They care more about relating businesses, like the bank business with the IT service business or home shopping with the banking. But the most important thing is trust in e-commerce, so you need a system to calculate the trustworthiness or credit of the customer. Credit info is very important here, and this is achieved with big data. That information is very valuable to a business. So, you need a huge data analysis system to excel at Fintech. In order use big data you need data governance and the skills to handle the data and infrastructure. So we are targeting that.

Q: You are very active in PMO and you’re currently its chair. Although it is software that provides the basis for all industries, there’s barely any software coming out of Korea. What is your view on this, and as the chair of PMO how do you plan to use your position to resolve this problem

A: In order for the Korean software business to be more knowledgeable and become experts at this business, you need PMO. For example, requests for proposal [RFP] should be very professional, but these days, the RFP is ambiguous and vague. That is why there is some unfairness in the market. As a result, 16 companies banded together to fund the PMO in order to solve this problem. But the business wasn’t easy because there was no regulation of the laws to protect the PMO business. It isn’t easy but we believe this is urgent for Korean software business.

Q: Do you think there should be more involvement from the government – more supportive policies

A: Yes. Certainly.

Q: Something like this would boost the national economy so it seems like it would be in the interest of the government to do this.

A: They say the software industry could boost the Korean economy and create thousands of jobs for young people but there aren’t any activities to practically promote the software business, and it’s difficult to find a person in the government who understands the software business.

Q: Why do you think that’s true If this would be good for the national economy

A: Because this is a totally different type of industry. The industry they’re familiar with is the manufacturing industry and the business of conglomerates. But there’s not a lot of familiarity with knowledge-based businesses.

Q. I found that one of the biggest problems is finding talent because working with data requires advanced training in a number of areas. How difficult has it been for you to find talented workers in Korea or the US in order to improve your products

A: In general it is easier to find more creative and logical employees in the U.S., although they do cost a bit more. I’ve found it’s a bit easier for Americans to build
up a theory or methodology. However, once the conditions have been set, developing the products are easier for Koreans. They are more efficient at this.

By Julia Yoo (julia@koreaittimes.com) 다른기사 보기