What’s on URA activities

To Search for Valuable Information in Big Data


Unit Name:
Data Engineering and Knowledge Discovery
Unit representative:
Professor Hiroyuki Kitagawa, Faculty of Engineering, Information, and Systems

Unit members:
10 (7 faculty members, no postdoctoral fellows, 3 from other organizations)

Key words:
data engineering, knowledge discovery, database, data mining, big data



With the sophistication of social information technologies, such as mobile and ubiquitous computing, and the diversification of information-intensive businesses and applications, the era of big data has arrived, in which people are required to handle unprecedented amounts of data. As significant amounts of real-time data are handled at a global level, the acquisition of necessary data and effective information utilization have become increasingly important. The research unit “Data Engineering and Knowledge Discovery”, led by Professor Hiroyuki Kitagawa, is devoted to research focusing on technologies for the advanced management of big data and utilization of valuable information included in big data (Figure 1).

Figure 1: “Data Engineering and Knowledge Discovery” research unit

Figure 1: “Data Engineering and Knowledge Discovery” research unit

Modern technology facilitating the utilization of big data

One of the technologies used to manage significant amounts of data is databases. In the early days of databases, it was necessary to select, organize, and process only data required for business operations because the data storage capacity of computers was small and storage costs were very expensive. Currently, their data storage capacity has virtually no limit, and the storage costs are low. Furthermore, anyone can provide information on the Internet. As a result, significant amounts of fluctuating, diversified and noisy data are present on the Internet as big data, and it is very difficult to identify information that is actually useful.

To utilize big data, a wide range of technologies for the entire life cycle of data, including its creation, application, and use, are required. Based on data engineering approaches, the research unit conducts research on a variety of technologies related to the management and utilization of big data. Our research subjects include infrastructures for the utilization of stream data from sensors, which has become increasingly popular in recent years, identification of outliers in noisy data, algorithms for speeding up data mining processing several dozen times using GPU, and mining of useful information from Twitter and social media. More specifically, to extract information on real life events from Twitter, we are developing new methods to estimate the home locations of Twitter users. According to the experimental results, the accuracy of our method is muchhigher than that of any other previous methods (Figure 2).

Figure 2: Estimating home locations of Twitter users based on landmarks

Figure 2: Estimating home locations of Twitter users based on landmarks

Collaboration with other fields and external research institutions to apply technologies to practical settings

Core members of the research unit also belong to the Center for Computational Sciences. They conduct research in collaboration with other researchers specializing in particle physics, meteorology, astronomy and other fields. The results of the research and development include a particle simulation data search system using XML technology, databases for meteorological data (Figure 3), and technologies for similarity searches of X-ray astronomical data. The research unit also actively collaborates with external research institutions and private companies to conduct joint and commissioned research.

Figure 3: Research and development of GPV/JMA meteorological databases

Figure 3: Research and development of GPV/JMA meteorological databases

Social contributions and achievements
● Joint research with a company to develop technologies for the integration and analysis of information with a multilayered structure essential for the development of smart communities
● Joint research with a company to develop basic technologies for the analysis of time-series data
● Joint research with a company for the identification of similar or abnormal data from stream data
● Technical advice on technologies for data privacy protection
● Industry-university cooperation to promote practical ICT human resource development
● Industry-university cooperation to promote human resource development in the field of large-scale data analysis
● Collaboration with a company to develop supporting technologies for software maintenance
● Promotion of academic activities in the fields of data engineering and machine learning in Japan and other countries

(Interviewed on September 5, 2013)

Research Administration/Management Office at U Tsukuba TEL 029-853-4434