Issue 8.2 | November 2004
Information within this issue may be outdated. Click here to link to the most recent issue

ITEP Test Trials for Detection Reliability Assessment of Metal Detectors

by Christina Mueller, Mate Gaal, Martina Scharmach and Sylke Bär, BAM Berlin; Adam Lewis and Tom Bloodworth, JRC Ispra; Dieter Guelle, Secretariat, ITEP; and Peter-Th. Wilrich, FU Berlin

Abstract

The total detection reliability of a mine-searching system is governed by the following three elements:

  • Intrinsic capability, which describes the basic physical-technical capability of the method.
  • Application factors, which include those due to environment.
  • Human factor, which is the effect of human operators on the detection reliability.

Some of these can be determined in simple laboratory measurements in which the effect on detection capability of individual parameters is measured. However, the human factor and some aspects of the effects of environmental conditions on the system need to be treated statistically.

By far the most common "mine-searching system" in use today is the metal detector. The test and evaluation procedures for metal detectors described in European Committee for Standardization (CEN) Workshop Agreement (CWA) 14747: 2003 include the above ideas. This is why, in addition to parameter tests, they include detection reliability or blind field tests under local conditions with local personnel.

A series of three field trials was performed in the International Test and Evaluation Program for Humanitarian Demining (ITEP) project 2.1.1.2, "Reliability Model for Test and Evaluation of Metal Detectors," in order to specify the optimal conditions to obtain reliable trial results with affordable effort. Each set of specific working conditions is characterized in terms of a combination of one mine type in one soil with one detector handled by local personnel. For each set of conditions, the searching system will deliver a working performance, expressed as mine-detection rates as a function of mine depth, and a certain overall false alarm rate (FAR). During the ITEP trials in Benkovac and Oberjettenberg, the authors learned to determine this function separately for each mine type in each soil. This is especially important for low-metal mines in soil that can influence metal detectors, as will be illustrated for the case of the PMA-2. Two discussion points still remain: how representative the trials are of field conditions and what statistical setup is required if we are to distinguish between the capabilities of individual detectors.

Introduction and Background

The CEN Working Group 07 began the process of standardizing test and evaluation methods for metal detectors in humanitarian demining, including both laboratory measurements of detection capability and blind field trials (reliability tests). In reliability tests, the probability of detection (POD) and receiver operating characteristics (ROC) curves help to summarize the performance results. Under the umbrella of ITEP, a number of test trials with metal detectors have been conducted. The aim was to specify the trial setup and the statistical rules necessary to achieve true, repeatable and reproducible results under representative field conditions. The trial scenarios ranged from straightforward detection of a large, metallic anti-tank mine buried near the surface in a soil that does not give metal-detector signals to the most difficult challenge of detecting low-metal anti-personnel mines deeply buried in magnetic soil that affects detectors strongly. Individual human factors, such as training and currency of skills, were assessed. A full report about the trial conditions and results, including rules for minimum number of targets, operators, and test repetitions necessary to achieve true and reproducible results, was published on the ITEP website in the summer of 2004 (see http://www.itep.ws/reports/last_reports.php?projectid=293).

In order to ensure that the requirements of practical demining are met and that the analysis is performed on a sound scientific basis, the authors organized an international workshop to discuss the problems of reliability test trials in December 2003.

Workshop on Reliability Tests for Demining

About 100 international experts in demining met for the "Workshop on Reliability Tests for Demining." The proceedings (published at http://www.kb.bam.de/ITEP-workshop-03/) contain presentations of the oral sessions in which the general national, European and international concepts in demining are described as well as the main activities and results of the ITEP trials. An up-to-date series of lessons learned and problems to be solved was presented by international mine action centres. In four focused sessions, the authors and a number of competent international experts discussed the following specific topics: configuration of test lanes and test target (mine) selection, soil influence and ground compensation, human factors, and rules for test planning and statistical evaluation.

A highlight of the workshop was the second session, which addressed the problem of soils that influence metal detectors—such soils are described variously as "noisy" (CWA 14747:2003), "uncooperative" or "difficult." These effects were due principally to magnetic properties of the soil—both the magnitude of the magnetic susceptibility and its frequency dependence (see especially the presentation by S. Billings, et al.1). The fundamental magnetic properties were related to the empirical "ground reference height" measurement, developed by D. Gülle: the maximum distance above the ground at which a calibrated, static-mode detector gives an alarm due to that ground.

Further presentations dealt with conclusions for future practical activities, such as the Geneva International Centre for Humanitarian Demining (GICHD) Manual Demining Study (T. Lardner) or a worldwide accident database (A. L. Smith). One of the conclusions for future research requirements was that there was still a need to get a more comprehensive understanding of soil influences (S. Billings, et al.). Finally, the workshop assembly expressed "findings and recommendations" with recommendations for how to deal further with the topic of reliability and with modelling for the improvement of demining techniques.

POD and ROC—Summary of Rates for Detection and False Alarms

 
Figure 1: Explanation for ROC and POD diagrams.

The ROC of a mine detection system2 shows the detection rate or probability of detection versus the FAR or number of false alarms per unit area (Figure 1). The ROC shows how successful the system is in distinguishing between a real signal from a mine and a noise signal arising from any other possible perturbation (from the soil, from other buried artefacts, from the electronics). The closer to the upper left corner the position of a ROC point is, the better the system.

In the case discussed here, the mine-detection systems being tested are metal detectors. Whether detection alarms caused by metal pieces in the ground are considered "true" or "false," detection depends upon the aims of the detection reliability trial. An ideal mine-detection system would, in principle, be able to distinguish between a mine and a piece of scrap metal. Unfortunately, metal detectors currently used in demining do not have this capability.

When land is cleared of mines where minimum-metal mines are the main threat, the "metal-free" procedure is sometimes used. This means that detectors are used on maximum sensitivity and all metallic pieces found are removed from the ground. In trials for metal detectors to be used in this way, any metal piece found should be considered a true detection, not a false alarm.

In some mine/UXO clearance operations, relatively large metal objects are sought. In this scenario, it is often possible to reduce metal detector sensitivity to avoid detecting all of the possible metallic clutter that may be present, while still having the detection capability to find the targets. In trials designed for this type of operating procedure, it is possible to consider detection of extraneous small pieces of metal as a false call. However, the validity of this approach depends upon the sizes of metal pieces in the test lanes. If metal pieces are present that have an equivalent response to the targets, then the test becomes rather meaningless because reporting these detections as false calls does not indicate that the detector is not performing as required.


Figure 2: Typical POD Curves.

For a fixed amount of false alarms, the ROC point or operating point of the system for a fixed sensitivity can be taken and further analysed for its dependence on the main influencing factors such as the mine depth or the metal content of the mine (Figure 2). All these points and curves need to be interpreted in connection with the corresponding confidence bounds to consider the scatter of results. The latter scatter depends on the underlying statistical basis (the number of opportunities to detect the mine) and the natural variability of the factors. The smooth POD or detection rate curves, presented in Figure 2, were determined by an advanced logistic regression model.3 A simple way of obtaining the detection rate curves is by plotting the mean values of the experimentally measured detection rates for each step of burial depth. 4, 5, 6, 8

Overview of the Parameter Matrix of the Trials

The main aim of the trials was to investigate how the device performance manifests itself in different application circumstances. The authors organized three sets of trials for which the main parameter setup can be seen in Figure 3. The first and third took place in Oberjettenberg WTD 52 on the testing ground of the German army.


Figure 3: Test parameters.

The conditions for the first trial in May 2003 were representative of poor circumstances likely to yield low performance: inexperienced operators with a short training period and test lanes with significant metal contamination. Three neutral soils were used and a fourth lane was artificially made "uncooperative" by adding a layer of magnetic blast-furnace slag. (With the benefit of hindsight, we would not recommend this technique because the slag was found to contain metallic particles, creating additional metal contamination.) The buried mines were characterized by a medium to large metal content. Some generic International Test Operations Procedure (ITOP) targets were also used, irregularly distributed over a predefined depth range.

The second trial set was organized in Benkovac, Croatia, with eight experienced Croatian operators, three of whom were active deminers at the time of the trials. A brief training period (half a day for each detector) was given. There were three types of soil on eight lanes: neutral soil, homogeneous uncooperative soil and heterogeneous uncooperative soil. The last two had frequency-dependent susceptibility. The mines had large, medium or small metal content and were systematically distributed over a depth ranging between zero and 20 cm to allow statistical analysis. For testing metal detectors, the normal target depth should be to the limits of the physical detection capability in the soil. The depth of 20 cm was chosen because it is the required depth for mine clearance under Croatian law. The lanes were "almost" clean of metal pieces.

Next Page