To search, Click below search items.

 

All Published Papers Search Service

Title

Discretization of Continuous Valued Dimensions in OLAP Data Cubes

Author

Sellappan Palaniappan, Tan Kim Hong

Citation

Vol. 8  No. 11  pp. 116-126

Abstract

Continuous valued dimensions in OLAP data cubes are usually grouped into countable disjoint intervals using na?ve methods such as equal width binning, histogram analysis, or splitting into intervals defined by domain experts according to their understanding of the data. This paper explores an integration of ‘intelligent’ discretization techniques currently available in data mining research into the construction of a SEER breast cancer survivability data cube with continuous dimension. Observational and empirical evaluations on the resulting cube with discretized intervals show that ‘intelligent’ discretization methods provide the same benefits to OLAP data cubes as in data mining algorithms, that is, they are able to simplify the data representation with minimal or no loss of information. Additionally, it was found that an unsupervised discretization method using k-means algorithm had exhibited equivalent performance as the supervised counterparts, namely, the entropy-based (ID3) and χ2?based (CHAID) methods.

Keywords

OLAP, data mining, discretization, entropy, ID3, CHAID, k-means

URL

http://paper.ijcsns.org/07_book/200811/20081117.pdf