Chapter One With rapid development of computer technology, network technology and sensor technology, data acquisition and data transmission in astronomy, military, biology, medical, management, and other disciplines, are becoming much easier and quicker。 Because of data style becoming more and more complex and data scale becoming more and more increasing, large number of data sets with high dimensions and large scale have been produced, in which data style is various and data form is isomerous。 Complex data includes numerous data, nominal data, interval data, missing data and set-valued data as well as their combinations。 Modeling, analyzing and applications for complex data have become main tasks to knowledge discovery in many practical fields。 Complexity of data is one of key challenges in knowledge discovery。 In a word, complex data has become the main body of data source and knowledge discovery in modern society。84451
Data modeling is the foundation of analysis and applications for complex data。 In recent years, people paid more attention to research theory and method of data modeling using results from cognitive science。 These studies are often classified into two sides。 One is to understand and simulate perception mechanism, and the other is to understand and simulate cognitive mechanism。 As one of important characters in human’s cognition, granulation cognition plays a key role for complex data modeling。 Through introducing human’s granulation cognition mechanism, we look forward to developing novel theories and methods of data modeling。 For studying complex data modeling based on granulation mechanism, there are three key problems to solve as follows。
1 How to effectively granulate complex data?
2 How to analyze granulation uncertainty?
3 How to perform data modeling via granulation mechanism?
Based on these considerations, aiming at complex data including numerous data, nominal data, interval data, missing data and set-valued data, this paper investigates these three key problems from information granulation, granulation uncertainty modeling strategy and model selection four viewpoints via using human’s granulation cognition mechanism。 Main results obtained are as follows。
1。 We have further established methods and algorithms of information granulation for complex data, and have profoundly revealed granulation mechanism of complex data。 These results provided the foundation for data modeling of complex data via granulation mechanism。
We have presented have presented one new clustering issue, that is how to effectively organize data with measurement errors, and have proposed its corresponding strategy for this new issue。 Experimental results show that:
(a)clustering algorithms with measurement errors may be much closer to real classes of data sets than those only considering measurement values;
(b)error-number distance provides an effective method for measuring the difference between two objects with measurement errors。
We have developed a kind of clustering algorithms based on selecting cluster representative, called k-representative algorithm。 In the context of semi-supervised clustering algorithm, k-representative algorithm shows its advantages from accuracy, precision, recall and iteration times for clustering nominal data, set-valued data and missing data。 In particular, because k-representative algorithm does not analyze the space structure of a data set, it can effectively organize both single type data and mixed type data including numerous, nominal, set-valued, missing, and other types。
2。 We have established operation method for granular spaces and characterized structure property of granular spaces from algebra view point and geometry view point, respectively; and we have revealed the essence of information granularity which provides con strained theory and directable method for studying granulation uncertainty。