Optimizing land use classification using decision tree approaches
Vaibhav Walia+ under guidance of DR. Sameer Saran
Indian Institute of remote sensing (NRSA), Dehradun (UA)-248 001
+ Manipal institute of technology, Manipal (KA)
Abstract: Supervised classification is one of the important tasks in remote sensing image interpretation, in which the image pixels are classified to various predefined land use/land cover classes based on the spectral reflectance values in different bands. In reality some classes may have very close spectral reflectance values that overlap in feature space. This produces spectral confusion among the classes and results in inaccurate classified images. To remove such spectral confusion one requires extra spectral and spatial knowledge. This report presents a decision tree classifier approach to extract knowledge from spatial data in form of classification rules using Gini Index and Shannon Entropy (Shannon and Weaver, 1949) to evaluate splits. This report also features calculation of optimal dataset size required for rule generation, in order to avoid redundant Input/output and processing.
· Improving land use classification methods to achieve better classification
· Optimising the size of training dataset needed to generate classification rules
· Developing an application to generate classification rules, given a particular dataset and information about the attributes
- Better classification was achieved by :
I. using Decision tree algorithm instead of classical approaches such as MLC
II. using “Gini Index” as the attribute selection criteria when “Information gain” fails
- Optimum dataset size was found by extracting and comparing decision rules for increasing dataset sizes of the same dataset.
Example: ‘X’ tuples are read and converted to rules in the first pass, similarly ‘X + jump’ tuples are read and converted to rules in the second pass, the resulting rules are compared and the procedure is repeated for at least another ‘width’ tuples, where ‘jump’ and ‘width’ are user defined variables. If the resulting rules are same throughout the ‘width’ then ‘no of tuples read minus width’ is the optimum dataset size required for rule generation.
- The decision tree algorithm was implemented using C++, Nokia/trolltech‘s Qt framework for the gui and “qcoustomplot” an open source library, which was used for plotting graphs