Robert T. Olszewski

Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data Degree Type: Ph.D. in Computer Science
Advisor(s): Roy Maxion, Daniel Siewiorek
Graduated: May 2001

Abstract:

Pattern recognition encompasses two fundamental tasks: description and classification. Given an object to analyze, a pattern recognition system first generates a description of it (i.e., the pattern) and then classifies the object based on that description (i.e., the recognition). Two general approaches for implementing pattern recognition systems, statistical and structural, employ different techniques for description and classification. Statistical approaches to pattern recognition use decision-theoretic concepts to discriminate among objects belonging to different groups based upon their quantitative features. Structural approaches to pattern recognition use syntactic grammars to discriminate among objects belonging to different groups based upon the arrangement of their morphological (i.e., shape-based or structural) features. Hybrid approaches to pattern recognition combine aspects of both statistical and structural pattern recognition.

Structural pattern recognition systems are difficult to apply to new domains because implementation of both the description and classification tasks requires domain knowledge. Knowledge acquisition techniques necessary to obtain domain knowledge from experts are tedious and often fail to produce a complete and accurate knowledge base. Consequently, applications of structural pattern recognition have been primarily restricted to domains in which the set of useful morphological features has been established in the literature (e.g., speech recognition and character recognition) and the syntactic grammars can be composed by hand (e.g., electrocardiogram diagnosis). To overcome this limitation, a domain-independent approach to structural pattern recognition is needed that is capable of extracting morphological features and performing classification without relying on domain knowledge. A hybrid system that employs a statistical classification technique to perform discrimination based on structural features is a natural solution. While a statistical classifier is inherently domain independent, the domain knowledge necessary to support the description task can be eliminated with a set of generally-useful morphological features. Such a set of morphological features is suggested as the foundation for the development of a suite of structure detectors to perform generalized feature extraction for structural pattern recognition in time-series data.

The ability of the suite of structure detectors to generate features useful for structural pattern recognition is evaluated by comparing the classification accuracies achieved when using the structure detectors versus commonly-used statistical feature extractors. Two real-world databases with markedly different characteristics and established ground truth serve as sources of data for the evaluation. The classification accuracies achieved using the features extracted by the structure detectors were consistently as good as or better than the classification accuracies achieved when using the features generated by the statistical feature extractors, thus demonstrating that the suite of structure detectors effectively performs generalized feature extraction for structural pattern recognition in time-series data.

Thesis Committee:
Roy Maxion (Co-chair)
Dan Siewiorek (Co-chair)
Christos Faloutsos
David Banks, DOT

Randy Bryant, Head, Computer Science Department
James Morris, Dean, School of Computer Science

Keywords:
Structural pattern recognition, classification, feature extraction, time series, Fourier transformation, wavelets, seminconductor fabrication, electrocardiography

CMU-CS-01-108.pdf (769.2 KB) ( 111 pages)
Copyright Notice