Training Set Selection to Improve Crop Classification
Abstract
In some classification problems, acquiring class label information is much more expensive than collecting attribute data. One such problem is crop classification from satellite imagery. While random sampling is one option, we demonstrate that a targeted training set selection can be beneficial, when the acquisition of class label information occurs in sets: In crop classification, the number of data points is given by the number of pixels in the imagery, while all data points within one field can be assumed to have the same class label. Each data point is constructed from multiple images throughout the growing season, and each field corresponds to a multi-dimensional distribution of those data points. We demonstrate that it is beneficial to use clustering to select the fields for class label collection. Using this technique, we show that crop classification for partially labeled data can be substantially improved.