Statistical Analysis of the Soil Chemical Survey Data
Sauptik Dhar, Vladimir Cherkassky
Report no. Mn/DOT 2010-22
This report describes data-analytic modeling of the Minnesota soil chemical data produced by the 2001 metro soil survey and by the 2003 state-wide survey. The chemical composition of the soil is characterized by the concentration of many metal and non-metal constituents, resulting in high-dimensional data. This high dimensionality and possible unknown (nonlinear) correlations in the data make it difficult to analyze and interpret using standard statistical techniques. This project applies a machine learning technique, called Self Organizing Map (SOM), to present the high-dimensional soil data in a 2D format suitable for human understanding and interpretation. This SOM representation enables analysis of the soil chemical concentration trends within the metro area and in the state of Minnesota. These trends are important for various Minnesota regulatory agencies concerned with the concentration of polluting chemical elements due to both (a) human activities, i.e., different industrial land usage, and (b) natural geological factors, such as the geomorphic codes and provenance of glacial sediments.