[en] In many remote-sensing projects, one is usually interested in a small
number of land-cover classes present in a study area and not in all the
land-cover classes that make-up the landscape. Previous studies in
supervised classification of satellite images have tackled specific class
mapping problem by isolating the classes of interest and combining all
other classes into one large class, usually called others, and by developing
a binary classifier to discriminate the class of interest from the
others. Here, this approach is called focused approach. The strength of
the focused approach is to decompose the original multi-class supervised
classification problem into a binary classification problem, focusing
the process on the discrimination of the class of interest. Previous
studies have shown that this method is able to discriminate more
accurately the classes of interest when compared with the standard
multi-class supervised approach. However, it may be susceptible to
data imbalance problems present in the training data set, since the
classes of interest are often a small part of the training set. A result the
classification may be biased towards the largest classes and, thus, be
sub-optimal for the discrimination of the classes of interest. This study
presents a way to minimize the effects of data imbalance problems in
specific class mapping using cost-sensitive learning. In this approach
errors committed in theminority class are treated as being costlier than
errors committed in the majority class. Cost-sensitive approaches are
typically implemented by weighting training data points accordingly
to their importance to the analysis. By changing the weight of individual
data points, it is possible to shift theweight from the larger classes
to the smaller ones, balancing the data set. To illustrate the use of the
cost-sensitive approach to map specific classes of interest, a series of
experiments with weighted support vector machines classifier and
Landsat Thematic Mapper data were conducted to discriminate two
types of mangrove forest (high-mangrove and low-mangrove) in
Saloum estuary, Senegal, a United Nations Educational, Scientific and
Cultural Organisation World Heritage site. Results suggest an increase
in overall classification accuracy with the use of cost-sensitive method
(97.3%) over the standard multi-class (94.3%) and the focused
approach (91.0%). In particular, cost-sensitive method yielded higher
sensitivity and specificity values on the discrimination of the classes of
interest when compared with the standard multi-class and focused
approaches.
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Silva, Joel; NOVA Information Management School, Universidade Nova de Lisboa, Lisboa, Portugal
Bacao, Fernando; NOVA Information Management School, Universidade Nova de Lisboa, Lisboa, Portugal
Akbani, R., S. Kwek, and N. Japkowicz. 2004. Applying Support Vector Machines to Imbalanced Datasets. Proceedings of the 15th European Conference on Machine Learning (ECML), 39-50. Berlin: Springer.
Alcantara, C., T. Kuemmerle, A. V. Prishchepov, and V. C. Radeloff. 2012. “Mapping Abandoned Agriculture with Multi-Temporal MODIS Satellite Data.” Remote Sensing of Environment 124: 334-347. doi:10.1016/j.rse.2012.05.019.
Atkinson, P. M., G. M. Foody, P. W. Gething, A. Mathur, and C. K. Kelly. 2007. “Investigating Spatial Structure in Specific Tree Species in Ancient Semi-Natural Woodland Using Remote Sensing and Marked Point Pattern Analysis.” Ecography 30 (1): 88-104. doi:10.1111/ eco.2007.30.issue-1.
Baldeck, C. A., G. P. Asner, R. E. Martin, C. B. Anderson, D. E. Knapp, J. R. Kellner, and S. J. Wright. 2015. “Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy.” Plos ONE 10 (7): e0118403. doi:10.1371/journal.pone.0118403.
Bishop, C. M. 2006. Pattern Recognition and Machine Learning, Information Science and Statistics. Berlin: Springer.
Boyd, D., C. Sanchez-Hernandez, and G. Foody. 2006 March 2015. “Mapping a Specific Class for Priority Habitats Monitoring from Satellite Sensor Data.” International Journal of Remote Sensing 27: 2631-2644. doi:10.1080/01431160600554348.
Cao, P., D. Zhao, and O. Zaiane. 2013. An Optimized Cost-Sensitive SVM for Imbalanced Data Learning. Advances in Knowledge Discovery and Data Mining, 280-292. Berlin: Springer.
Chang, C. C., and C.-L. Lin. 2011. “Libsvm: A Library of Support Vector Machines.” ACM Transactions on Intelligent Systems and Technology 2: 1-27. doi:10.1145/1961189.1961199.
Chawla, N. V. 2005. “Data Mining for Imbalanced Datasets: An Overview.” In Data Mining and Knowledge Discovery Handbook, edited by M. Oded and R. Lior, 853-867. Boston, MA: Springer-US.
Cockx, K., T. van de Voorde, and F. Canters. 2014. “Quantifying Uncertainty in Remote Sensing-Based Urban Land-Use Mapping.” International Journal of Applied Earth Observation and Geoinformation 31 (1): 154-166. doi:10.1016/j.jag.2014.03.016.
Deng, N., Y. Tian, and C. Zhang. 2012. Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions. Boca Raton, Florida: CRC Press.
Dieng, M., J. Silva, M. Goncalves, S. Faye, and M. Caetano. 2014. The Land/Ocean Interactions in the Coastal Zone of West and Central Africa, Estuaries of the World. Estuaries of the World. New York City: Springer
Diop, E. S., 1986. “Estuaires holocènes tropicaux. etude géographique physique comparée des rivières du sud du saloum (sénégal) à la mellcorée (république de guinée).” Ph.D. thesis., Université Louis Pasteur, Strasbourg.
Du, S., and S. Chen, 2005. “Weighted Support Vector Machine for Classification.” Systems, Man and Cybernetics, 2005 IEEE 2, 859-864. Tarrytown, NY: Pergamon Press, Inc.
Faye, S., M. Diaw, R. Malou, and A. Faye. 2008.Impacts of Climate Change on Groundwater Recharge and Salinization of Groundwater Resources in Senegal. Groundwater and Climate in Africa Proceeding of the Kampala Conference. Wallingford, UK: IAHS Press
Feng, X., G. Foody, P. Aplin, and S. N. Gosling. 2015. “Enhancing the Spatial Resolution of Satellite-Derived Land Surface Temperature Mapping for Urban Areas.” Sustainable Cities and Society 19: 341-348. doi:10.1016/j.scs.2015.04.007.
Fernandez, A., V. Lopez, M. Galar, M. J. Del Jesus, and F. Herrera. 2013. “Analysing the Classification of Imbalanced Data-Sets with Multiple Classes: Binarization Techniques and Ad-Hoc Approaches.” Knowledge-Based Systems 42: 97-110. doi:10.1016/j.knosys.2013.01.018.
Fleiss, J. L., B. Levin, and M. C. Paik. 2003. “Statistical Methods for Rates and Proportions.” 3rd. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley.
Foody, G. M. 2004. “Supervised Image Classification by MLP and RBF Neural Networks with and without an Exhaustively Defined Set of Classes.” International Journal of Remote Sensing 25 (15): 3091-3104. doi:10.1080/01431160310001648019.
Foody, G. M. 2009. “Classification Accuracy Comparison: Hypothesis Tests and the Use of Confidence Intervals in Evaluations of Difference, Equivalence and Non-Inferiority.” Remote Sensing of Environment 113 (8): 1658-1663. doi:10.1016/j.rse.2009.03.014.
Foody, G. M., P. M. Atkinson, P. W. Gething, N. A. Ravenhill, and C. K. Kelly. 2005. “Identification of Specific Tree Species in Ancient Semi-Natural Woodland from Digital Aerial Sensor Imagery.” Ecological Applications 15 (4): 1233-1244. doi:10.1890/04-1061.
Foody, G. M., D. S. Boyd, and C. Sanchez-Hernandez. 2007. “Mapping a Specific Class with an Ensemble of Classifiers.” International Journal of Remote Sensing 28 (8): 1733-1746. doi:10.1080/ 01431160600962566.
Foody, G. M., A. Mathur, C. Sanchez-Hernandez, and D. S. Boyd. 2006. “Training Set Size Requirements for the Classification of a Specific Class.” Remote Sensing of Environment 104 (1, sep): 1-14. doi:10.1016/j.rse.2006.03.004.
Galar, M., A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera. 2011. “An Overview of Ensemble Methods for Binary Classifiers in Multi-Class Problems: Experimental Study on One-Vs-One and One-Vs-All Schemes.” Pattern Recognition 44 (8): 1761-1776. doi:10.1016/j. patcog.2011.01.017.
Graves, S. J., G. P. Asner, R. E. Martin, C. B. Anderson, M. S. Colgan, L. Kalantari, and S. A. Bohlman. 2016. “Tree Species Abundance Predictions in a Tropical Agricultural Landscape with a Supervised Classification Model and Imbalanced Data.” Remote Sensing In Review 2: 1-21. doi:10.3390/rs8020161.
Hastie, T., R. Tibshinari, and J. Friedman. 2009. The Elements of Statistical Learning. second ed. Springer Series in Statistics, New York: Springer.
He, H., and E. A. Garcia. 2009. “Learning from Imbalanced Data.” IEEE Transactions on Knowledge and Data Engineering 21 (9): 1263-1284. doi:10.1109/TKDE.2008.239.
He, H., and M. Yunqian. 2013. Imbalanced Learning: Foundation, Algorithms and Applications, the Instit Edition. Hoboken, NJ: John Wiley Sons, Ltd.
Hsu, C.-W., and C.-J. Lin. 2002. “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks 13 (2): 415-425. doi:10.1109/72.991427.
Huang Yin-Min, D. S.-X., 2005. Weighted Support Vector Machine for Classification with Uneven Training Class Sizes. 2005 IEEE International Conference on Systems, Man and Cybernetics 4 (August), 3866-3871. Los Alamitos: IEEE press.
Hwang, J. P., S. Park, and E. Kim. 2011. “A New Weighted Approach to Imbalanced Data Classification Problem via Support Vector Machine with Quadratic Cost Function.” Expert Systems with Applications 38 (7): 8580-8585. doi:10.1016/j.eswa.2011.01.061.
Japkowiciz, N., and S. Stephen. 2002. “The Class Imbalance Problem: A Systematic Study.” Intelligent Data Analysis 6 (5): 1-39.
Kotsiantis, S., D. Kanellopoulos, and P. Pintelas. 2006. Handling Imbalanced Datasets: A Review. GESTS International Transactions on Computer Science and Engineering, Vol. 30.
Krawczyk, B. 2015. “One-Class Classifier Ensemble Pruning and Weighting with Firefly Algorithm.” Neurocomputing 150 (PB): 490-500. doi:10.1016/j.neucom.2014.07.068.
Krawczyk, B., M. Woźniak, and F. Herrera. 2015. “On the Usefulness of One-Class Classifier Ensembles for Decomposition of Multi-Class Problems.” Pattern Recognition 48 (12): 3969-3982. doi:10.1016/j.patcog.2015.06.001.
Kubat, M., and S. Matwin, 1997. Addressing the Curse of Imbalanced Training Sets: One Sided Selection. Proceedings of the Fourteenth International Conference on Machine Learning. Vol. 4. pp. 179-186. Massachusetts, US: Morgan Kaufmann.
Laba, M., R. Downs, S. Smith, S. Welsh, C. Neider, S. White, M. Richmond, W. Philpot, and P. Baveye. 2008. “Mapping Invasive Wetland Plants in the Hudson River National Estuarine Research Reserve Using Quickbird Satellite Imagery.” Remote Sensing of Environment 112 (1): 286-300. doi:10.1016/j.rse.2007.05.003.
Lark, R. M. 1995. “Components of Accuracy of Maps with Special Reference to Discriminant Analysis on Remote Sensor Data.” International Journal of Remote Sensing 16 (8): 1461-1480. doi:10.1080/01431169508954488.
Lee, T. M., and H. C. Yeh. 2009. “Applying Remote Sensing Techniques to Monitor Shifting Wetland Vegetation: A Case Study of Danshui River Estuary Mangrove Communities.” Taiwan. Ecological Engineering 35 (4): 487-496. doi:10.1016/j.ecoleng.2008.01.007.
Liu, S., C. Jia, and H. Ma. 2005. “A New Weighted Support Vector Machine with GA-Based Parameter Selection.” Machine Learning and Cybernetics 2005 (August): 18-21.
Lopez, V., A. Fernandez, J. G. Moreno-Torres, and F. Herrera. 2012. “Analysis of Preprocessing Vs. Cost-Sensitive Learning for Imbalanced Classification. Open Problems on Intrinsic Data Characteristics.” Expert Systems with Applications 39 (7): 6585-6608. doi:10.1016/j.eswa.2011.12.043.
Mack, B., R. Roscher, and B. Waske. 2014. “Can I Trust My One-Class Classification?.” Remote Sensing 6 (9): 8779-8802. doi:10.3390/rs6098779.
Mellor, A., S. Boukir, A. Haywood, and S. Jones. 2015. “Exploring Issues of Training Data Imbalance and Mislabelling on Random Forest Performance for Large Area Land Cover Classification Using the Ensemble Margin.” ISPRS Journal of Photogrammetry and Remote Sensing 105: 155-168. doi:10.1016/j.isprsjprs.2015.03.014.
Mitsch, W., and J. Gosselink. 2015. Wetlands. Hoboken, New Jersey: Wiley.
Mountrakis, G., J. Im, and C. Ogole. 2011. “Support Vector Machines in Remote Sensing: A Review.” ISPRS Journal of Photogrammetry and Remote Sensing 66 (3): 247-259. doi:10.1016/j. isprsjprs.2010.11.001.
Nguyen, G. H., S. L. Phung, and A. Bouzerdoum. 2010. “Efficient SVM Training with Reduced Weighted Samples.” Proceedings of the International Joint Conference on Neural Networks, Hong Kong, June 1-6, 2981-2987.
Qiao, X., and L. Zhang. 2013. Distance-Weighted Support Vector Machine. Statistics and Its Interface, 8 (3): 331-345.
Rahman, M. M., and D. N. Davis. 2014. “Transactions on Engineering Technologies: Special Volume of the World Congress on Engineering 2013.” In Semi Supervised Under-Sampling: A Solution to the Class Imbalance Problem for Classification and Feature Selection, edited by Y. Gi-Chul, A. Sio-Iong and G. Len, 611-625. Dordrecht, Ch: Springer Netherlands. doi:10.1007/978-94-017-8832-8_44
Rifkin, R., and A. Klautau. 2004. “In Defense of One-Vs-All Classification.” Journal of Machine Learning Research 5: 101-141.
Sanchez-Hernandez, C., D. S. Boyd, and G. M. Foody. 2007. “One-Class Classification for Mapping a Specific Land-Cover Class: SVDD Classification of Fenland.” IEEE Transactions on Geoscience and Remote Sensing 45 (4): 1061-1073. doi:10.1109/TGRS.2006.890414.
Schölkopf, B., A. J. Smola, R. C. Williamson, and P. L. Bartlett. 2000. “New Support Vector Algorithms.” Neural Computation 12 (5): 1207-1245. doi:10.1162/089976600300015565.
Shalev-Shwartz, S., and S. Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. New York, NY, USA: Cambridge University Press.
Shawe-Taylor, J., and N. Cristianini. 2004. Kernel Methods for Pattern Analysis. New York, NY, USA: Cambridge University Press.
Sheeren, D., M. Fauvel, V. Josipovi, M. Lopes, and C. Planque. 2016. Tree Species Classification in Temperate Forests Using Formosat-2 Satellite Image Time Series, Remote Sensing 8 (9): 734. doi:10.3390/rs8090734
Song, C., C. E. Woodcock, K. C. Seto, M. P. Lenney, and S. A. Macomber. 2001. “Classification and Change Detection Using Landsat TM Data: When and How to Correct Atmospheric Effects?.” Remote Sensing of Environment 75 (2): 230-244. doi:10.1016/S0034-4257(00)00169-3.
Tang, Y., Y. Q. Zhang, and N. V. Chawla. 2009. “Svms Modeling for Highly Imbalanced Classification.” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39 (1): 281-288. doi:10.1109/TSMCB.2008.2002909.
Tax, D. M. J., 2001. “One-class classification.” Ph.D. thesis., Delft University of Technology, The Netherlands.
Vo, T., C. Kuenzer, and N. Oppelt. 2015. “How Remote Sensing Supports Mangrove Ecosystem Service Valuation: A Case Study in Ca Mau Province, Vietnam.” Ecosystem Services 14 (MAY): 67-75. doi:10.1016/j.ecoser.2015.04.007.
Weiss, G. M. 2004. “Mining with Rarity: A Unifying Framework.” ACM SIGKDD Explorations Newsletter 6 (1): 7-19. doi:10.1145/1007730.
Weiss, G. M., and F. Provost. 2003. “Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction.” Journal of Artificial Intelligence Research 19: 315-354.
Xanthopoulos, P., and T. Razzaghi. 2014. “A Weighted Support Vector Machine Method for Control Chart Pattern Recognition.” Computers & Industrial Engineering 70 (October): 134-149. doi:10.1016/j.cie.2014.01.014.
Yang, X., Q. Song, and Y. Wang. 2007. “A Weighted Support Vector Machine for Data Classification.” International Journal of Pattern Recognition and Artificial Inteligence 2 (5): 859-864.
Zhang, S., S. Sadaoui, and M. Mouhoub. 2015. “An Empirical Analysis of Imbalanced Data Classification.” Computer and Information Science 8 (1): 151-162. doi:10.5539/cis.v8n1p151.