The existence of a large number of GIS models for the identification
of landslide occurrence probability makes difficult the selection of
a specific one. The present study focuses on the application of two
quantitative models: the logistic and the BSA models. The comparative
analysis of the results aims at identifying the most suitable
model. The territory corresponding to the Niraj Mic Basin
(87 km
One of the main natural hazards affecting the territory of Romania is represented by landslides which have a high spatial and temporal frequency and cause damages to transport infrastructure and buildings and determine environmental changes (Bălteanu and Micu, 2009; Bilaşo et al., 2011; Năsui and Petreuş, 2014).
EEA European Directive from 2004 underlines the need to mapping and identification areas with vulnerability to landslides using indirect techniques in European and national context (Guzetti, 2006; Van Westen et al., 2006; Magliulio et al., 2008; Polemio and Petruci, 2010).
Thus, the studies determining their probability of occurrence are highly valuable in the process of reducing their potential negative effects. Among the methods used for determining the spatial probability of landslides, statistical methods are recommended by very good results and high validation rates (Zezere et al., 2004; Petrea et al., 2014; Roşca et al., 2015a, b).
Considering the increase in the number of possibilities for data processing and the evolution of methods developed in the GIS environment, various methods of landslide susceptibility assessment have been developed, out of which the logistic regression and bivariate statistical analysis methods is one of the most frequently used (Harrell, 2001; Kleinbaum and Klein, 2002; Ayalew and Yamagishi, 2004, 2005; Dai and Lee, 2002; Lee, 2010; Cuesta et al., 2010; Chiţu, 2010; Mancini et al., 2010; Wang et al., 2011; Guns and Vanacker, 2012; Jurchescu, 2013; Măguţ et al., 2013; Akbari et al., 2014; Van den Eeckhaut et al., 2010). This analysis starts from the hypothesis that the combination of factors which led to the occurrence of landslides in the past will have the same effect in the future (Crozier and Glade, 2005).
Among the advantages of this method one must take into consideration the possibility of simultaneously integrating both quantitative and qualitative data in the model and the testing of v represent dependent variables while their triggering and preparing factors are the independent (explanatory) variables.
The purpose of this study is to identify the large scale susceptibility of landslide occurrence by applying the logistic model in the sub-basin of the Small Niraj (Fig. 1). The database included a complete landslide inventory and the descriptive data of 16 causing factors used for generating the model. These factors describe the morphometrical, geological and the hydroclimatic characteristics of the territory under analysis.
The study area is located in the north-east of Transylvania
Depression, Romania, and has recorded important economical and
environmental losses over in the last two years: 67 persons,
45 houses, 115
Based on the Romanian National Meteorological Administration Institute
the mean temperature varies between
GIS spatial analysis models are built upon complex structures and databases generated from varied sources. One of the main problems to solve during the building of a spatial analysis model that localizes the areas with different landslide susceptibility values is represented by the identification of its actual format along with the building and the integrated management of the model input data.
The large variety of databases serving as input data in the complex identification model concerning landslide susceptibility, makes it that the different model structures have a resolution dependent on the model scale. Bearing in mind that the scale for the models fits within the large scale category, the authors have built a database both vector (landslide areas, geology, seismicity, land use) and raster data (slope angle, aspect, fragmentation depth, fragmentation density, elevation, CTI, SPI, plan and profile curvature etc.) (Table 1).
The spatial distribution of the 16 factors included in the model was determined using GIS functions of spatial analysis included in the ArcGis software.
The different database sources made their validation mandatory so as to ensure an accurate representation. The validation of the databases was done using the comparison technique (the database was compared to field data) as well as using observation (by visual identification of the correspondence existing between the cartographic representation and the existing situation in the field). Having the certainty that a valid and accurate database is used, the logical schemas of the BSA and logistic model were subsequently completed in order to be used for determining the probability of landslide occurrence.
The landslide susceptible areas are identified through the BSA model by considering the statistic value specific to each class of the factors included in the initial database, without taking into account the importance of the factor within the informational flux of the model. The statistical model based on the bivariate probability analysis was applied to predict the spatial distribution of landslides by estimating the probability of landslide occurrence based on the assumption that the prediction should start from the existing landslides (Chung et al., 1995; Dhakal et al., 2000; Saha, 2002; Sarkar and Kanungo, 2004; Magiulio et al., 2008; etc.).
The statistical value of each factor class included in the bivariate
model was calculated using the equation proposed by Yin and Yan
(1988), as well as Jade and Sarkar (1993):
By using Eq. (
In order to predict landslide susceptibility at pixel level in the
study area the model of logistic regression was also taken into
consideration. This method was mathematically described by Harrel (2001):
represents the set of points (pixels from the study area);
Thus, the probability of occurrence for a new landslide event is
represented by:
One can notice that the probability of occurrence becomes a linear
function for each variable included in the model (Kleimbaum and Klein,
2002). In order to estimate the parameters, a logarithmic
transformation of the odds ratio was necessary (represented by the
ratio of the probability of success and the probability of failure)
which changes the variation interval from (0, 1) to a sigmoid curve,
in the interval (
The
The coefficient values (
The multiplication coefficient of each variable was determined by
applying the logistic regression (Table 2). The
A value below 0.05 is considered optimal, representing the threshold
for the data acceptable within the model database. A statistical
threshold value of
The goodness of fit was determined by generating the area under the ROC curve using the training data, while the prediction capacity of the model was identified using the validation data set (Hosmer and Lemeshow, 2000; Guzzetti, 2006). The quality of the information included in the input variables for the landslide susceptibility model as well as the number of variables need to be considered in the process of variable selection, in order to reduce redundancy (Chiţu, 2010).
The 16 variables (elevation, slope angle, average precipitation, slope aspect, drainage density, drainage depth, hydrological soil classes, distance to streams, distance to roads and settlements, Stream Power Index (SPI), land use, lithology, plan curvature and profile curvature, Topographic Wetness Index (CTI) were included in the model, their selection being performed according to their statistical relevance in the logistic regression.
The establishing of the research methodology applied in the present study needs a comparative approach of the methods and of the results obtained through the implementing of the previously mentioned models.
The comparison of the spatial analysis methods integrated within the two models emphasises the difference among the necessary databases, as well as the complexity and implementation possibility of the models. The comparative approach of the results on the different levels of the modelling process as well as of the final results shows the practical utility of such databases within each model, as well as the accuracy of the representation.
The statistical correlation between the mapped landslides from the Niraj River Basin and their causing factors was determined for the logistic model using the statistical software R. The training variables were included in the logistic regression and the AIC was used to perform an automated stepwise selection of the best model, namely the combination of variables which best explains the occurrence of landslides in the analysed territory.
The model with the best AIC value (AIC
For the interpretation of the results, the odds difference plays
a very important role (Table 2). For example, keeping all the input
variables constant while the average precipitation value is set at
650
Thus, the highest increase in probability for landslide occurrence is recorded when comparing the south-western slopes with the reference class of level areas (195 %) indicating a powerful dependency relationship between landslide occurrence and south-western slopes.
The resulting coefficients were multiplied with their corresponding 13
raster files using Raster Calculator according to Eq. (4):
The goodness of fit and the predictability of the model were determined using the ROC curve for the model sample and the testing sample, respectively. The sensitivity of the model represents the true positive rate (pixels with a high probability of landslide occurrence being validated by real landslides), while the model specificity represents the probability that the areas identified as highly susceptible to landslides to be invalidated by the lack of any landslides (false positive rate) (Hosmer and Lemeshow, 2000).
The area under the ROC (Relative Operational Curve) is 0.86 for the training data set and 0.63 for the testing (validation) data set, the first value indicating the goodness of model fit while the second represents the predictability of the model, or its capacity to predict future events (Fig. 4).
The large area under the ROC indicates a high sensitivity of the model as well as a low false positive rate which account for a satisfying precision of the results. The smaller ROC area in the case of the validation data, though still above the threshold of 0.5, is due to a smaller landslide set available for validation.
The classification of the results in the final susceptibility classes was based on the success rate (Chung and Fabbri, 1999, 2003, 2008; Van Westen et al., 2003; Remondo et al., 2003), resulting the map in Fig. 5.
The processing of the derived and modelled database by means of the ArcGis software using the specific functions of conversion, analysis and spatial integration has led to the generation of landslide susceptibility maps and their corresponding raster databases according to the statistical values of each coefficient class.
The results of the models are included in a raster database which
highlights the probability of landslide occurrence for each pixel of
the analysed area with a statistical value ranging from
When analysing the classified susceptibility map one can note the vast expansion of the high and very high susceptibility classes (65 % of the analysed area) which correspond to the slopes from the upper river basin of the Small Niraj (in the administrative territory of the Şirea Nirajului settlement), as well as in the hilly sector of the lower river basin (in the administrative territories of Miercurea Nirajului, Drojdi and Maia).
The validation of the results was performed in a first stage using the
percentage of the landslide areas in each class (Fig. 6). Thus, there
is a very good validation of the results as the largest proportion of
the active landslides (71.23 %) are included in the very high
susceptibility class which also represents the second largest area in
the Small Niraj River Basin (28.3
By comparing the two databases it becomes obvious that 92.8 % of the active landslides overlay the high and very high susceptibility areas and only 6.55 % are included in the medium susceptibility class. This high degree of model fit is represented by the large area under the ROC (0.983) which indicates a good correlation between the model results and the landslides in the field (Fig. 6).
The spatial distribution of the susceptibility classes in the case of the map generated with the help of the logistic model highlights a similar distribution in for the middle slope sectors from the lower and middle river basin, in the administrative territory of Miercurea Nirajului, Eremitu and Maia, but on the western slope of Măgherani Hill there are some obvious differences (Fig. 7).
The results differ between the application of the BSA model and the logistic model (Fig. 8). By applying the BSA model in which all the classes of the 16 factors were included in the model, namely all the 72 dummy variables, there is an overestimation of the high susceptibility class (32.7 %) and of the very high susceptibility class (32.5 %). By applying the logistic model, these values decrease to 15.2 % for the high susceptibility class and to 10.9 % for the very high susceptibility class, as the variables corresponding to statistically insignificant classes were eliminated.
When comparing the input databases for the two models, there is
a decrease in the initial number of variables (16) in the case of the
logistic regression due to the application of the likelihood test
(Table 6.21). Hence, the variable classes with a very reduced spatial
expansion were excluded from the model as they would lead to
additional errors (for example: the territories ranging between 700
and 800
Another series of variable classes were excluded from the analysis,
for example the territories with a drainage density between
0.5–1
As a result of the landslide susceptibility assessment performed with the help of the two quantitative models (bivariate statistical analysis and logistic regression) the areas with a high probability of landslide occurrence were highlighted in the study area as well as the stable territories. These results are considerably superior to previous analyses (surse) which used the legislative semi-quantitative Romanian methodology (H.G. 447/2003) (Rosca et al. 2015a). However, there is still the necessity of increasing the quality of the databases corresponding to the causing factors and the number of the landslides included in the modelling processes, as well as a more thorough analysis of the relationships between the parameters.
The two models under analysis in the present study, the logistic and the BSA models, have shown the high complexity of the databases involved, the multiple correlation between several factors determining landslide activation as well as the obvious practical utility of the logistic model in future similar studies.
The use of the logistic model has allowed the testing of variable interdependencies leading to a reduction of the input data, hence a shorter modelling time. The BSA model operates with all databases, 16 variables represented as 72 dummy variables, hence it takes longer for the model to be implemented and leads to an increased redundancy of the data, while the database management is slower and needs better software and hardware resources. One needs to consider that the database quality is essential for creating the model and that the inventory list of active landslides used in this study needs to be completed in order to successfully validate the BSA model in a similar way with the validation of the logistic model performed at this point.
However, the better validation results given by the BSA model (0.98), as compared to the 0.86 value resulted from the logistic model, indicates a better model fit of the BSA model. This fact is explained by the use within the BSA model of input data consisting of all the active digitised landslides which were also used to determine the landslide density for each of the existing classes of the variables, namely their statistical value. This can be analysed from a two-point perspective: it can be seen as an advantage when evaluating the ability of the model to correctly determine the existence or inexistence of the phenomenon, although with a slight overestimation of the results, and it can be seen as a disadvantage when a prediction is desired, just like in the case of the present study.
Database structure.
Regression coefficients of the input variables. The bolded data represents the variables considered representatives.
Spatial distribution of susceptibility classes.
Spatial distribution of susceptibility classes.
Comparative statistical values (for BSA and logistic regression).
Continued.
0 – excluded classes due to low sample size.0 (bold) – excluded classes due to lack of statistical significance. Bold values represent the classes included in the model due to their statistical significance.The italic values (ex.
Geomorphological map of the Small Niraj catchment and geographical position of the study area (1 – flood plain, 2 – slopes and connecting surfaces, 3 – slopes with complex modellation, 4 – active landslides, 5 – permanent hydrographic network, 6 – temporary hydrographic network, 7 – watershed divide, 8 – settlements).
Applied methodological flow chart.
Landslide susceptibility map generated using the logistic model.
Area under the ROC curve for the training data (left panel) and the testing data (right panel).
Landslide susceptibility map generated using the BSA model.
Percentage distribution of active landslide on the probability classes and ROC curve value.
Regional differences of susceptibility classes obtained through BSA model or by applying logistic model.
Comparative percentage distribution on susceptibility classes obtained by applying BSA model