1446 lines
24 KiB
Plaintext
1446 lines
24 KiB
Plaintext
2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012)
|
||
|
||
Statistical Inference on Recall, Precision and Average Precision under Random Selection
|
||
|
||
Peng Zhang Department of Mathematics
|
||
Zhejiang University Hangzhou, Zhejiang, 310027 China
|
||
pengzzju@gmail.com
|
||
|
||
Wanhua Su Department of Mathematics and Statistics
|
||
Grant MacEwan University Edmonton, Alberta, Canada T5J 4S2
|
||
|
||
Abstract—The objective of a rare target detection problem is to identify the rare targets as early as possible. Recall, precision and average precision are three popular performance measures for evaluating different detection methods. However, there is little literature on the statistical properties of these three measures. We develop a framework for conducting statistical inference on recall, precision and average precision through establishing their asymptotic properties. Simulations are used to illustrate the idea. The proposed methods can also be applied in other areas where ranking systems need to be evaluated, such as information retrieval.
|
||
|
||
are not fraudulent and we are interested in catching the frauds. 3) Direct Marketing: Here x𝑖 is a vector of descriptors for a potential customer and 𝑦𝑖 indicates whether the customer will respond to the advertisement or not. Most customers will not respond and we are interested in identifying those most likely to respond.
|
||
Rare target detection is analogous to information retrieval by treating the rare targets as relevant objects.
|
||
|
||
I. INTRODUCTION
|
||
A. Information Retrieval
|
||
Information retrieval is the science of searching for information relevant to a given topic from a huge database. The information can be in the form of text documents, photos, videos, audio clips or the combinations of these media. An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines (e.g., Google). The system then computes a numeric score on how well each object in the database matches the query. This score is called relevant score. The objects are ranked and shown to the user according to the relevant score. Objects with larger relevant score appear earlier in the list.
|
||
B. Rare Target Detection
|
||
In a typical rare target detection problem, we have data {𝑦𝑖, x𝑖}, where x𝑖 is the predictor vector of the 𝑖th observation and 𝑦𝑖 = {0, 1} is its class label. The objective is to identify the rare class-1 observations (targets) as early as possible. In general, those selected targets will be passed to the next stage for further investigation. Three typical applications of rare target detection are:
|
||
1) Drug Discovery: Here x𝑖 is a vector of descriptors for a chemical compound and 𝑦𝑖 indicates whether the compound is considered an active drug agent for a certain disease. Most compounds are inactive and we are interested in detecting the active ones.
|
||
2) Credit Card Fraud Detection [1]: Here x𝑖 is a vector of descriptors for a credit card transaction and 𝑦𝑖 indicates whether the transaction is fraudulent. Most transactions
|
||
|
||
C. Performance Measures
|
||
|
||
Many measures for evaluating the performance of informa-
|
||
|
||
tion retrieval systems have been proposed, among which recall
|
||
|
||
and precision are the most popular ones. Suppose there are
|
||
|
||
𝑛 objects in the database, among which 𝑚 are relevant. Let
|
||
|
||
𝑠1, ⋅ ⋅ ⋅ , 𝑠𝑛 be the relevant scores of the objects, 𝑠(1), ⋅ ⋅ ⋅ 𝑠(𝑛)
|
||
|
||
be the corresponding sorted scores in a descending order, and
|
||
|
||
𝑦(1), ⋅ ⋅ ⋅ , 𝑦(𝑛) be binary variables indicating whether the 𝑖th
|
||
|
||
ranked document is relevant to the query or not. That is,
|
||
|
||
{
|
||
|
||
𝑦(𝑖) =
|
||
|
||
1 the 𝑖th object is relevant; 0 the 𝑖th object is irrelevant.
|
||
|
||
A retrieved item is called a hit if it is a relevant object. Define
|
||
|
||
t𝑡heobhjeitctfsu, nic.eti.o, nℎ(ℎ𝑡()𝑡)=as∑th𝑡𝑖=e1n𝑦u(𝑖m).beRrecoafllhiatts
|
||
|
||
among rank 𝑡,
|
||
|
||
the top 𝑟(𝑡), is
|
||
|
||
defined as the fraction of relevant objects that are successfully
|
||
|
||
retrieved among the top 𝑡 objects. Precision at rank 𝑡, 𝑝(𝑡),
|
||
|
||
is the fraction of the retrieved objects that are relevant to the
|
||
|
||
query. That is,
|
||
|
||
𝑟(𝑡)
|
||
|
||
=
|
||
|
||
ℎ(𝑡) 𝑚
|
||
|
||
,
|
||
|
||
𝑝(𝑡)
|
||
|
||
=
|
||
|
||
ℎ(𝑡) 𝑡
|
||
|
||
.
|
||
|
||
(1)
|
||
|
||
Recall and precision can be also used in rare target detection problems. Similar to information retrieval, the first step to solve a rare target detection problem is to rank the items by a score function which can be modelled by 𝐾-nearest neighbours (KNN, e.g., [2], [3]), support vector machines (SVM, e.g., [4]), LAGO [5] or other methods. Then according to one’s budget, the top 𝑄 items with the largest scores are classified as class 1 and the remaining items are classified as class 0. The classification result is usually presented in a confusion matrix presented in Table I.
|
||
|
||
Authori9ze7d8-li1ce-4ns6e7d3u-0se02lim4-it7e/d1t0o/:$T2e6c.h0n0isc©h2e0In1f2ormIEaEtioEnsbibliothek (TIB). Downl1o3ad4e8d on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
True
|
||
|
||
classification result
|
||
|
||
label
|
||
|
||
1
|
||
|
||
0
|
||
|
||
1 TP (true positive) FN (false negative)
|
||
|
||
0 FP (false positive) TN (true negative)
|
||
|
||
TABLE I CONFUSION MATRIX OF CLASSIFICATION
|
||
|
||
In the context of rare target detection,
|
||
|
||
recall
|
||
|
||
=
|
||
|
||
𝑇
|
||
|
||
𝑃
|
||
|
||
𝑇𝑃 +𝐹
|
||
|
||
𝑁
|
||
|
||
,
|
||
|
||
precision
|
||
|
||
=
|
||
|
||
𝑇𝑃 𝑇𝑃 +𝐹𝑃
|
||
|
||
.
|
||
|
||
Given 𝑄, the ranking system gives larger 𝑟(𝑄) or 𝑝(𝑄) is
|
||
|
||
the better. However, there is a trade-off between recall and
|
||
|
||
precision in such a way that recall 𝑟(𝑡) generally increases
|
||
|
||
with 𝑡 while precision 𝑝(𝑡) decreases with 𝑡. More details about
|
||
|
||
this relationship between recall and precision can be found in
|
||
|
||
[6] and [7]. Therefore, recall and precision should be taken
|
||
|
||
into account simultaneously when we evaluate or compare
|
||
|
||
different ranking systems. One common way to combine recall
|
||
|
||
and precision is the average precision (AP), which is defined
|
||
|
||
as the average of the precisions of those hits. That is,
|
||
|
||
AP
|
||
|
||
=
|
||
|
||
1 𝑚
|
||
|
||
∑𝑛
|
||
|
||
𝑦(𝑡)
|
||
|
||
ℎ(𝑡) 𝑡
|
||
|
||
=
|
||
|
||
1 𝑚
|
||
|
||
∑𝑛
|
||
|
||
𝑦(𝑡)
|
||
|
||
∑𝑡
|
||
𝑗=1
|
||
𝑡
|
||
|
||
𝑦(𝑗)
|
||
|
||
.
|
||
|
||
(2)
|
||
|
||
𝑡=1
|
||
|
||
𝑡=1
|
||
|
||
Suppose we need to rank eight items, among which only
|
||
three belong to class 1. Table II shows the calculation of hit function ℎ(𝑡), recall 𝑟(𝑡) and precision 𝑝(𝑡) at each rank 𝑡 = 1, . . . , 8.
|
||
|
||
Result at Each Rank
|
||
|
||
Rank (𝑡)
|
||
|
||
12345678
|
||
|
||
Hit list
|
||
|
||
11010000
|
||
|
||
Hit function ℎ(𝑡) 1 2 2 3 3 3 3 3
|
||
|
||
Recall 𝑟(𝑡) Precision 𝑝(𝑡)
|
||
|
||
12233333 33333333 12233333 12345678
|
||
|
||
TABLE II CALCULATION OF HIT FUNCTION, RECALL AND PRECISION
|
||
|
||
Recall that average precision is calculated as the average of
|
||
|
||
precisions over the hits. Since hits appear at the first, second
|
||
|
||
and fourth places, the average precision is
|
||
|
||
(
|
||
|
||
)
|
||
|
||
AP
|
||
|
||
=
|
||
|
||
1 3
|
||
|
||
1 1
|
||
|
||
+
|
||
|
||
2 2
|
||
|
||
+
|
||
|
||
3 4
|
||
|
||
.
|
||
|
||
Plotting 𝑝(𝑡) against 𝑟(𝑡) gives the recall-precision curve, average precision is an approximation to the area under the curve (e.g., [8]).
|
||
|
||
D. Existing methods
|
||
To our best knowledge, little has been done on making statistical inference on the performance measure. Yeh [9] suggested using pair-t, signed rank and Wilcoxon test for comparing recalls and using computationally intensive randomization test for precision, nothing mentioned about average precision.
|
||
|
||
Goutte & Gaussier [10] handled this problem from a Bayesian aspect, they argued that the joint distribution of (TP, FN, FP, TN) follows a multinomial distribution; and the conditional distribution of true positive given the total number of class-1 observation, TP∣TP+FP ∼ Binomial(TP+FP, 𝑝); the conditional distribution of true positive given the number of retrieved items, TP∣TP+FN ∼ Binomial(TP+FN, 𝑟). Adopting conjugate Beta priors to precision 𝑝 and recall 𝑟, inference on 𝑝 and 𝑟 can be made on the posterior distribution. For comparing two ranking systems, each object falls into one of the following categories:
|
||
∙ System 1 gives the correct assignment, system 2 fails; ∙ System 2 gives the correct assignment, system 1 fails; ∙ Both two systems yield the same assignment.
|
||
Let 𝜋1, 𝜋2 and 𝜋3 be the probabilities that the above events happen. A Dirichlet prior is assigned to (𝜋1, 𝜋2, 𝜋3), and the posterior distribution of 𝜋1, 𝜋2, 𝜋3 is another Dirichlet distribution. By drawing samples from the posterior distribution, one can calculate the probability that system 1 is superior to system 2, i.e., 𝑃 𝑟(𝜋1 > 𝜋2) empirically by Monte Carlo method.
|
||
The rest of the paper is organized as follows. Section II gives the distributions of recall, precision and average precision under random selection. Our approach to making inference on recall, precision and average precision is also presented. Section III presents a simulation study. We end with a discussion in section IV.
|
||
|
||
II. DISTRIBUTIONS OF RECALL, PRECISION AND AVERAGE PRECISION UNDER RANDOM SELECTION
|
||
|
||
A. Distributions of Recall and Precision
|
||
|
||
As given in Equation (1), recall and precision at rank 𝑡 are
|
||
|
||
defined as
|
||
|
||
𝑟(𝑡)
|
||
|
||
=
|
||
|
||
ℎ(𝑡) 𝑚
|
||
|
||
,
|
||
|
||
𝑝(𝑡)
|
||
|
||
=
|
||
|
||
ℎ(𝑡) 𝑡
|
||
|
||
,
|
||
|
||
where the hit function ℎ(𝑡) is the number of hits among the top 𝑡 retrieved objects. Suppose there are totally 𝑛 objects, among which 𝑚 are targets (class 1). Under random selection, every
|
||
item has the same probability of being retrieved among the top-𝑡 list; therefore, the number of hits among the top 𝑡 items, ℎ(𝑡), follows a hypergeometric distribution with parameters (𝑛, 𝑚, 𝑡). As a result,
|
||
|
||
E{ℎ(𝑡)}
|
||
|
||
=
|
||
|
||
𝑡𝑚 𝑛
|
||
|
||
,
|
||
|
||
Var{ℎ(𝑡)}
|
||
|
||
=
|
||
|
||
𝑡𝑚 𝑛
|
||
|
||
(𝑛
|
||
|
||
− 𝑚)(𝑛 − 𝑛(𝑛 − 1)
|
||
|
||
𝑡)
|
||
|
||
.
|
||
|
||
Hence, the expected value of recall 𝑟(𝑡) and precision 𝑝(𝑡) are
|
||
|
||
E{𝑟(𝑡)}
|
||
|
||
=
|
||
|
||
𝑡 𝑛
|
||
|
||
,
|
||
|
||
E{𝑝(𝑡)}
|
||
|
||
=
|
||
|
||
𝑚 𝑛
|
||
|
||
;
|
||
|
||
(3)
|
||
|
||
and correspondingly the variance of 𝑟(𝑡) and 𝑝(𝑡) are
|
||
|
||
Var{𝑟(𝑡)}
|
||
|
||
=
|
||
|
||
𝑡(𝑛 − 𝑚)(𝑛 𝑚𝑛2(𝑛 −
|
||
|
||
− 1)
|
||
|
||
𝑡)
|
||
|
||
,
|
||
|
||
Var{𝑝(𝑡)}
|
||
|
||
=
|
||
|
||
𝑚(𝑛 − 𝑚)(𝑛 − 𝑡𝑛2(𝑛 − 1)
|
||
|
||
𝑡)
|
||
|
||
.
|
||
|
||
(4)
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de4d9on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
B. The Distribution of AP Under Random Selection
|
||
|
||
Under random selection, the first two moments of aver-
|
||
|
||
age precision can be exactly calculated by establishing the
|
||
|
||
marginal and joint distributions of the location of hits. Recall
|
||
|
||
that average precision is calculated as the average of precisions
|
||
|
||
over the hits. Since hits appear at the first, second and fourth
|
||
|
||
places, the average precision is
|
||
|
||
(
|
||
|
||
)
|
||
|
||
AP
|
||
|
||
=
|
||
|
||
1 3
|
||
|
||
1 1
|
||
|
||
+
|
||
|
||
2 2
|
||
|
||
+
|
||
|
||
3 4
|
||
|
||
.
|
||
|
||
Let 𝐿1, 𝐿2, ⋅ ⋅ ⋅ , 𝐿𝑚 be the ranks (locations) of the 𝑖th hits,
|
||
|
||
average precision given in (2) can be rewritten as
|
||
|
||
(
|
||
|
||
)
|
||
|
||
AP
|
||
|
||
=
|
||
|
||
1 𝑚
|
||
|
||
1 𝐿1
|
||
|
||
+
|
||
|
||
2 𝐿2
|
||
|
||
+⋅⋅⋅
|
||
|
||
+
|
||
|
||
𝑚 𝐿𝑚
|
||
|
||
.
|
||
|
||
(5)
|
||
|
||
In our example, 𝐿1 = 1, 𝐿2 = 2, 𝐿3 = 4. Viewing average
|
||
|
||
precision in this way, we can calculate the moments of
|
||
|
||
average precision by finding the marginal distributions of 𝐿𝑖’s.
|
||
|
||
Suppose there are totally 𝑛 observations among which 𝑚
|
||
|
||
objects belong to class 1. Fixing 𝐿𝑖 at the location 𝑘(, the tot)al
|
||
|
||
distinct ways to arrange these 𝑚 class-1 objects is
|
||
|
||
𝑛−1 𝑚−1
|
||
|
||
.
|
||
|
||
The first hit 𝐿1 can take values of 1, 2, ⋅ ⋅ ⋅ , 𝑛 − 𝑚 + 1, the second hit 𝐿2 takes 2, 3, ⋅ ⋅ ⋅ , 𝑛 − 𝑚 + 2, the 𝑚th hit 𝐿𝑚 takes
|
||
|
||
𝑚, ⋅ ⋅ ⋅ , 𝑛.
|
||
|
||
In general, the probability that the 𝑗th hits appears at the
|
||
|
||
𝑘th position, 𝑃𝑗𝑘, is given by
|
||
|
||
(
|
||
|
||
)( )
|
||
|
||
𝑛−𝑘 𝑘−1
|
||
|
||
𝑃𝑗𝑘 = 𝑃 (𝐿𝑗 = 𝑘) =
|
||
|
||
𝑚 −( 𝑗 𝑗)− 1 𝑛−1
|
||
|
||
(6)
|
||
|
||
𝑚−1
|
||
|
||
This is due to the fact that the 𝑗th hits at the 𝑘th position
|
||
|
||
implies 𝑗 −1 hits appearing in the previous 𝑘 −1 locations and
|
||
|
||
the remaining 𝑗 − 1 hits appearing in the rest 𝑘 − 1 positions.
|
||
|
||
Similarly, the joint distribution of 𝐿𝑖 and 𝐿𝑗 for 𝑖 < 𝑗 can be
|
||
|
||
calculated as
|
||
|
||
( )(
|
||
|
||
)(
|
||
|
||
)
|
||
|
||
𝑠−1 𝑡−𝑠−1 𝑛−𝑡
|
||
|
||
𝑖−1 𝑃 (𝐿𝑖 = 𝑠, 𝐿𝑗 = 𝑡) =
|
||
|
||
𝑗(− 𝑖 − 1) 𝑛−2
|
||
|
||
𝑚−𝑗 , (7)
|
||
|
||
𝑚−2
|
||
|
||
which means 𝑖−1 hits appearing among the first 𝑠−1 locations,
|
||
|
||
𝑗−𝑖−1 hits between the 𝑠th and 𝑡th positions, and 𝑚−𝑗 among
|
||
|
||
the last 𝑛−𝑡 positions. The denominator is the total number of
|
||
|
||
arranging 𝑚 − 2 hits among 𝑛 − 2 locations, fixing 𝐿𝑖 and 𝐿𝑗.
|
||
|
||
Given the marginal distribution (6) and the joint distribution
|
||
|
||
(7), it is straight forward to calculate the expectation and
|
||
|
||
variance of 𝐴𝑃 given in (5). That is
|
||
|
||
𝐸(AP)
|
||
|
||
=
|
||
|
||
1 𝑚
|
||
|
||
∑ 𝑚 𝐸
|
||
𝑖=1
|
||
|
||
()
|
||
|
||
𝑖 𝐿𝑖
|
||
|
||
,
|
||
|
||
𝑉 𝑎𝑟(AP) = 𝐸(AP2) − 𝐸2(AP)
|
||
|
||
where 𝐸(AP2)
|
||
|
||
=
|
||
|
||
1 𝑚2
|
||
|
||
⎧ ⎨∑ 𝑚 ⎩𝐸
|
||
𝑖=1
|
||
|
||
(
|
||
|
||
𝑖2 𝐿2𝑖
|
||
|
||
)
|
||
|
||
+
|
||
|
||
2
|
||
|
||
𝑚∑−1
|
||
𝑖=1
|
||
|
||
𝑗
|
||
|
||
∑ 𝑚
|
||
=𝑖+1
|
||
|
||
𝐸
|
||
|
||
(
|
||
|
||
𝑖𝑗 𝐿𝑖𝐿𝑗
|
||
|
||
)⎫⎬ ⎭
|
||
|
||
with
|
||
|
||
𝐸 𝐸
|
||
|
||
() 1
|
||
𝐿𝑖 (
|
||
1
|
||
𝐿𝑖𝐿𝑗
|
||
|
||
= )
|
||
|
||
∑𝑛
|
||
|
||
1 𝑡
|
||
|
||
𝑃
|
||
|
||
(𝐿𝑖
|
||
|
||
=
|
||
|
||
𝑡),
|
||
|
||
𝑡=1
|
||
|
||
=
|
||
|
||
∑𝑛
|
||
|
||
∑𝑛
|
||
|
||
1 𝑠𝑡
|
||
|
||
𝑃
|
||
|
||
(𝐿𝑖
|
||
|
||
𝑠=1 𝑡=1
|
||
|
||
(
|
||
|
||
𝐸
|
||
|
||
1 𝐿2𝑖
|
||
|
||
= 𝑠, 𝐿𝑗
|
||
|
||
) =
|
||
|
||
∑𝑛 =
|
||
𝑡=1
|
||
𝑡).
|
||
|
||
1 𝑡2
|
||
|
||
𝑃
|
||
|
||
(𝐿𝑖
|
||
|
||
=
|
||
|
||
𝑡),
|
||
|
||
This method, however, is computationally expensive when 𝑚 and 𝑛 are large. The following approach based on an approx-
|
||
|
||
imate normal distribution is appropriate then. We present the
|
||
|
||
result here and leave the detailed derivation in an appendix.
|
||
|
||
With straight forward algebra we have,
|
||
|
||
𝐸(𝐿𝑖)
|
||
|
||
=
|
||
|
||
𝑖
|
||
|
||
𝑛 𝑚
|
||
|
||
,
|
||
|
||
(8)
|
||
|
||
𝐸(𝐿2𝑖 )
|
||
|
||
=
|
||
|
||
𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 − 𝑚(𝑚 + 1)
|
||
|
||
𝑚)
|
||
|
||
,
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖)
|
||
|
||
=
|
||
|
||
𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 − 𝑚(𝑚 + 1)
|
||
|
||
𝑚)
|
||
|
||
−
|
||
|
||
𝑖2𝑛2 𝑚2
|
||
|
||
,
|
||
|
||
𝐸 (𝐿𝑖 𝐿𝑗 )
|
||
|
||
=
|
||
|
||
𝑖(𝑛
|
||
|
||
−
|
||
|
||
1)(𝑛𝑗 + 𝑛 𝑚(𝑚 − 1)
|
||
|
||
−
|
||
|
||
𝑚)
|
||
|
||
,
|
||
|
||
(9)
|
||
|
||
𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) = 𝐸(𝐿𝑖𝐿𝑗) − 𝐸(𝐿𝑖)𝐸(𝐿𝑗).
|
||
|
||
(10)
|
||
|
||
where 𝐸(𝐿𝑖), 𝐸(𝐿𝑗) and 𝐸(𝐿𝑖𝐿𝑗) are given by equations (8)
|
||
|
||
and (9).
|
||
|
||
The
|
||
|
||
moments
|
||
|
||
of
|
||
|
||
1 𝐿𝑖
|
||
|
||
are
|
||
|
||
expansion of functions 𝑓
|
||
|
||
obtained
|
||
|
||
(𝑥)
|
||
|
||
=
|
||
|
||
1 𝑥
|
||
|
||
with and
|
||
|
||
a second 𝑔(𝑥) =
|
||
|
||
order
|
||
|
||
1 𝑥2
|
||
|
||
at
|
||
|
||
Taylor 𝑥0 =
|
||
|
||
𝐸(𝐿𝑖). The 𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) is obtained using delta method at
|
||
|
||
𝐸(𝐿𝑖), 𝐸(𝐿𝑗).
|
||
|
||
()
|
||
|
||
{
|
||
|
||
}
|
||
|
||
𝐸
|
||
|
||
(
|
||
|
||
1 𝐿𝑖
|
||
|
||
)
|
||
|
||
≈
|
||
|
||
1 𝐸(𝐿𝑖)
|
||
|
||
1 {
|
||
|
||
+
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
|
||
|
||
, }
|
||
|
||
(11)
|
||
|
||
𝐸
|
||
|
||
1 (𝐿2𝑖
|
||
|
||
)
|
||
|
||
≈
|
||
|
||
1 𝐸2(𝐿𝑖)
|
||
|
||
1 {
|
||
|
||
+
|
||
|
||
3𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
|
||
|
||
, }
|
||
|
||
𝑉 𝑎𝑟 (
|
||
|
||
1 𝐿𝑖 )
|
||
|
||
≈
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)
|
||
|
||
1
|
||
|
||
−
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
|
||
|
||
,
|
||
|
||
𝐶 𝑜𝑣
|
||
|
||
1 𝐿𝑖
|
||
|
||
,
|
||
|
||
1 𝐿𝑗
|
||
|
||
≈
|
||
|
||
𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) 𝐸2(𝐿𝑖)𝐸2(𝐿𝑗
|
||
|
||
)
|
||
|
||
.
|
||
|
||
where 𝐸(𝐿𝑖), 𝐸(𝐿𝑗) and 𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) are given by equations
|
||
|
||
(8) and (10) {Because}
|
||
|
||
𝐴re𝑃spec=tivel𝑛1y.∑𝑚 𝑖=1
|
||
|
||
𝐸 (𝐿𝑖 ) 𝐿𝑖
|
||
|
||
by
|
||
|
||
equation
|
||
|
||
(8),
|
||
|
||
and
|
||
|
||
𝐸
|
||
|
||
𝐸 (𝐿𝑖 ) 𝐿𝑖
|
||
|
||
≈
|
||
|
||
1
|
||
|
||
+
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) 𝐸 2 (𝐿𝑖 )
|
||
|
||
by
|
||
|
||
equation
|
||
|
||
(11),
|
||
|
||
which
|
||
|
||
goes
|
||
|
||
to 1 when 𝑚 goes to ∞, it can be shown, using central
|
||
|
||
limit theorem on weakly dependent variables, that 𝐴𝑃 is
|
||
|
||
approximately normally distributed, 𝐴𝑃 ∼𝐿 𝑁 (𝜇, 𝜎2) where
|
||
|
||
𝜇 𝜎2
|
||
|
||
= =
|
||
|
||
𝑚𝑚112∑ 𝑖=𝑚∑ 𝑖=𝑚11𝐸𝑉(𝑎𝑟𝐿𝑖(𝑖 )𝐿𝑖𝑖
|
||
|
||
)
|
||
|
||
+
|
||
|
||
2 𝑚2
|
||
|
||
∑
|
||
𝑖<𝑗
|
||
|
||
𝐶 𝑜𝑣
|
||
|
||
( 𝑖 𝐿𝑖
|
||
|
||
,
|
||
|
||
𝑗 𝐿𝑗
|
||
|
||
(12)
|
||
) (13)
|
||
|
||
III. SIMULATION STUDY
|
||
|
||
We carry out a simulation study to verify the validity of our approach on recall, precision and average precision presented in Section II. We choose two settings, with 𝑚 = 100, 𝑛 =
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d0on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
1000 and 𝑚 = 500, 𝑛 = 2000 and go through the following steps:
|
||
1) Generate a random permutation, 2) Calculate the value of recall, precision when 𝑡 = 𝑚
|
||
(they are the same) and average precision, 3) Repeat steps 1) and 2) 10000 times, obtain a sampling
|
||
distribution of recall (precision) and average precision, 4) Find the empirical mean and variance of those 10000
|
||
recalls and average precisions,
|
||
The theoretical means and variances of recall and precision are calculated with equations (3) and (4), which give 𝐸{𝑟(100)} = 𝐸{𝑝(100)} = 0.1 and 𝑉 𝑎𝑟{𝑟(100)} = 𝑉 𝑎𝑟{𝑝(100)} = 0.00081 for 𝑚 = 100, 𝑛 = 1000 and 𝐸{𝑟(500)} = 0.25 and 𝑉 𝑎𝑟{𝑟(500)} = 0.00028 for 𝑚 = 500, 𝑛 = 2000. The corresponding empirical means and variances, obtained from the simulation, are 0.099985, 0.000824 for 𝑚 = 100, 𝑛 = 1000 and 0.250023, 0.0002824 for 𝑚 = 500, 𝑛 = 2000, respectively. Table III lists the theoretical and empirical means and variances, as well as selected quantiles of the distribution of the simulated average precisions, with the first two rows showing the first setting and the next two rows presenting results for the second setting. The histogram of the simulated average precisions of the second setting is presented in Figure 1 with normal density curve of the limiting distribution overlaid. It is shown clearly that the result is satisfactory.
|
||
|
||
Empirical Theoretical Empirical Theoretical
|
||
|
||
mean .1037 .1037 .2527 .2522
|
||
|
||
variance .0001286 .0001392 .000096 .000142
|
||
|
||
2.5% .0876 .0806 .2347 .2288
|
||
|
||
50% .1044 .1037 .2521 .2522
|
||
|
||
97.5% .1321 .1268 .2731 .2755
|
||
|
||
TABLE III COMPARISON BETWEEN EMPIRICAL AND THEORETICAL DISTRIBUTIONS
|
||
OF SIMULATED AVERAGE PRECISIONS, ON MEANS, VARIANCES AND QUANTILES.
|
||
|
||
40
|
||
|
||
30
|
||
|
||
20
|
||
|
||
Density
|
||
|
||
10
|
||
|
||
0
|
||
|
||
0.22
|
||
|
||
0.24
|
||
|
||
0.26
|
||
|
||
0.28
|
||
|
||
0.30
|
||
|
||
Average Precision
|
||
|
||
Fig. 1. Histogram of simulated average precisions with theoretical density.
|
||
|
||
IV. CONCLUSION
|
||
Recall, precision and average precision are three performance measures in evaluating different ranking algorithms for rare target detection problems. Little has been done on the statistical properties of these three measures. In this article, we derived the sampling distribution for recall, precision and average precision under random selection. More important, we provided a theoretical way to calculate the limiting distribution for average precisions when the number of hits is large. Based on the limiting distribution, hypothesis testing can be conducted to test whether one ranking algorithm significantly outperforms the other in detecting the rare targets.
|
||
|
||
APPENDIX 1) Derivation of moments of 𝐿𝑖:
|
||
|
||
(
|
||
|
||
)( )
|
||
|
||
𝐸(𝐿𝑖)
|
||
|
||
=
|
||
|
||
𝑛−∑𝑚+𝑖 𝑘
|
||
|
||
𝑛−𝑘 𝑚(− 𝑗
|
||
|
||
𝑘−1 )𝑗 − 1
|
||
|
||
𝑘=𝑖
|
||
|
||
𝑛−1 ( ) (𝑚 − 1 )
|
||
|
||
=
|
||
|
||
𝑛−∑𝑚+𝑖 𝑖
|
||
|
||
𝑘 𝑗
|
||
|
||
𝑛−𝑘 (𝑚)− 𝑗
|
||
|
||
𝑘=𝑖
|
||
|
||
𝑚𝑛 𝑛𝑚
|
||
|
||
=
|
||
|
||
𝑖𝑛 𝑚
|
||
|
||
(
|
||
|
||
)( )
|
||
|
||
𝐸(𝐿2𝑖 )
|
||
|
||
=
|
||
|
||
𝑛−∑𝑚+𝑖 𝑘2
|
||
𝑘=𝑖
|
||
|
||
𝑛−𝑘 𝑘−1 𝑚(− 𝑗 )𝑗 − 1
|
||
𝑛−1
|
||
|
||
𝑚−1
|
||
|
||
(
|
||
|
||
)( )
|
||
|
||
=
|
||
|
||
𝑛−∑𝑚+𝑖
|
||
|
||
𝑘(𝑘
|
||
|
||
+
|
||
|
||
1)
|
||
|
||
𝑛−𝑘 (𝑚 − 𝑗
|
||
|
||
)
|
||
|
||
𝑘−1 𝑗−1
|
||
|
||
𝑘=𝑖
|
||
|
||
𝑛−1
|
||
|
||
(
|
||
|
||
𝑚)−(1 )
|
||
|
||
𝑛−∑𝑚+𝑖 𝑘 −
|
||
|
||
𝑛−𝑘 𝑚(− 𝑗
|
||
|
||
𝑘−1 )𝑗 − 1
|
||
|
||
𝑛−1
|
||
|
||
𝑘=𝑖
|
||
|
||
(𝑚 − 1) (
|
||
|
||
)
|
||
|
||
=
|
||
|
||
𝑛−∑𝑚+𝑖 𝑖(𝑖 + 1)
|
||
|
||
𝑘+1 𝑗 +(1
|
||
|
||
𝑛−𝑘 𝑚)− 𝑗
|
||
|
||
𝑘=𝑖
|
||
|
||
𝑚(𝑚+1) 𝑛 + 1 𝑛(𝑛+1) 𝑚 + 1
|
||
|
||
− 𝐸(𝐿𝑖)
|
||
|
||
=
|
||
|
||
𝑖𝑛(𝑖 + 1)(𝑛 + 𝑚(𝑚 + 1)
|
||
|
||
1)
|
||
|
||
−
|
||
|
||
𝑖𝑛 𝑚
|
||
|
||
=
|
||
|
||
𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 − 𝑚) 𝑚(𝑚 + 1)
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) = 𝐸(𝐿2𝑖 ) − 𝐸2(𝐿𝑖)
|
||
|
||
=
|
||
|
||
𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 − 𝑚(𝑚 + 1)
|
||
|
||
𝑚)
|
||
|
||
−
|
||
|
||
𝑖2𝑛2 𝑚2
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d1on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
𝐸 (𝐿𝑖 𝐿𝑗 )
|
||
|
||
( )(
|
||
|
||
)(
|
||
|
||
)
|
||
|
||
∑ ∑ 𝑠𝑡
|
||
|
||
𝑠−1 𝑖−1
|
||
|
||
=
|
||
|
||
𝑡−𝑠−1 (𝑗 − 𝑖 −)1
|
||
|
||
𝑛−𝑡 𝑚−𝑗
|
||
|
||
𝑠𝑡
|
||
|
||
𝑛−2
|
||
|
||
𝑚−2
|
||
|
||
and
|
||
|
||
()
|
||
|
||
𝐸
|
||
|
||
1 𝐿𝑖
|
||
|
||
≈
|
||
|
||
1 𝐸(𝐿𝑖)
|
||
|
||
+ {
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) 𝐸3(𝐿𝑖)
|
||
|
||
}
|
||
|
||
=
|
||
|
||
1 𝐸(𝐿𝑖)
|
||
|
||
1
|
||
|
||
+
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
|
||
|
||
.
|
||
|
||
= 𝐴+𝐵 −𝐶 ( )(
|
||
|
||
)( )
|
||
|
||
𝐴
|
||
|
||
=
|
||
|
||
∑ ∑ 𝑠(𝑡 − 𝑠)
|
||
|
||
𝑠−1 𝑖 − 1(
|
||
|
||
𝑡−𝑠−1 𝑗 − )𝑖 − 1
|
||
|
||
𝑛−𝑡 𝑚−𝑗
|
||
|
||
𝑠𝑡
|
||
|
||
𝑛−2
|
||
|
||
𝑚−2
|
||
|
||
( )( )( )
|
||
|
||
∑ ∑ 𝑖(𝑗 − 𝑖)
|
||
|
||
𝑠 𝑖
|
||
|
||
=
|
||
|
||
𝑡−𝑠 𝑛−𝑡 𝑗 −(𝑖 ) 𝑚 − 𝑗
|
||
|
||
𝑠𝑡
|
||
|
||
𝑚(𝑚−1) 𝑛
|
||
|
||
𝑛(𝑛−1) 𝑚
|
||
|
||
=
|
||
|
||
𝑖𝑛(𝑗 − 𝑖)(𝑛 − 1) 𝑚(𝑚 − 1)
|
||
|
||
( )(
|
||
|
||
)(
|
||
|
||
)
|
||
|
||
𝐵
|
||
|
||
=
|
||
|
||
∑ ∑ 𝑠(𝑠 + 1)
|
||
|
||
𝑠−1 𝑖 − 1(
|
||
|
||
𝑡−𝑠−1 𝑗 −)𝑖 − 1
|
||
|
||
𝑛−𝑡 𝑚−𝑗
|
||
|
||
𝑠𝑡
|
||
|
||
𝑛−2
|
||
|
||
𝑚−2
|
||
|
||
( )(
|
||
|
||
)(
|
||
|
||
)
|
||
|
||
∑ ∑ 𝑖(𝑖 + 1)
|
||
|
||
𝑠 𝑖
|
||
|
||
=
|
||
|
||
𝑡−𝑠−1 𝑛−𝑡 𝑗 −(𝑖 − 1 ) 𝑚 − 𝑗
|
||
|
||
𝑠𝑡
|
||
|
||
𝑚(𝑚−1) 𝑛 − 1
|
||
|
||
𝑛(𝑛−1) 𝑚 − 1
|
||
|
||
The
|
||
|
||
second
|
||
|
||
order
|
||
|
||
Taylor
|
||
|
||
expansion
|
||
|
||
of
|
||
|
||
𝑔(𝑥)
|
||
|
||
=
|
||
|
||
1 𝑥2
|
||
|
||
at
|
||
|
||
𝑥0 = 𝐸(𝐿𝑖) is given by
|
||
|
||
𝑔(𝑥)
|
||
|
||
≈
|
||
|
||
1 𝑥20
|
||
|
||
−
|
||
|
||
2(𝑥 − 𝑥0) 𝑥30
|
||
|
||
+
|
||
|
||
3(𝑥
|
||
|
||
− 𝑥0)2 𝑥40
|
||
|
||
.
|
||
|
||
Hence
|
||
|
||
1 𝐿2𝑖
|
||
|
||
≈
|
||
|
||
1 𝐸2(𝐿𝑖)
|
||
|
||
−
|
||
|
||
2{𝐿𝑖 − 𝐸(𝐿𝑖)} 𝐸3(𝐿𝑖)
|
||
|
||
+
|
||
|
||
3{𝐿𝑖 − 𝐸(𝐿𝑖)}2 𝐸4(𝐿𝑖)
|
||
|
||
and
|
||
|
||
()
|
||
|
||
𝐸
|
||
|
||
1 𝐿2𝑖
|
||
|
||
≈
|
||
|
||
1 𝐸2(𝐿𝑖)
|
||
|
||
+ {
|
||
|
||
3𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)
|
||
|
||
}
|
||
|
||
=
|
||
|
||
1 𝐸2(𝐿𝑖)
|
||
|
||
1
|
||
|
||
+
|
||
|
||
3𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
|
||
|
||
.
|
||
|
||
Then
|
||
|
||
()
|
||
|
||
()
|
||
|
||
()
|
||
|
||
𝑉 𝑎𝑟
|
||
|
||
1 𝐿𝑖
|
||
|
||
=
|
||
|
||
𝐸
|
||
|
||
1 𝐿2𝑖
|
||
|
||
− 𝐸2 {
|
||
|
||
1 𝐿𝑖
|
||
|
||
}
|
||
|
||
=
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)
|
||
|
||
1
|
||
|
||
−
|
||
|
||
𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
|
||
|
||
REFERENCES
|
||
|
||
=
|
||
|
||
𝑖𝑛(𝑖 + 1)(𝑛 − 1) 𝑚(𝑚 − 1)
|
||
|
||
( )(
|
||
|
||
)(
|
||
|
||
)
|
||
|
||
∑∑ 𝑠
|
||
|
||
𝑠−1 𝑖−1
|
||
|
||
𝐶=
|
||
|
||
𝑡−𝑠−1 (𝑗 − 𝑖 −)1
|
||
|
||
𝑛−𝑡 𝑚−𝑗
|
||
|
||
𝑠𝑡
|
||
|
||
𝑛−2
|
||
|
||
𝑚−2
|
||
|
||
( )(
|
||
|
||
)(
|
||
|
||
)
|
||
|
||
∑∑ 𝑖
|
||
|
||
𝑠 𝑖
|
||
|
||
=
|
||
|
||
𝑡−𝑠−1 𝑛−𝑡 𝑗 − 𝑖(− 1 )𝑚 − 𝑗
|
||
|
||
𝑠𝑡
|
||
|
||
𝑚−1 𝑛 − 1
|
||
|
||
𝑛−1 𝑚 − 1
|
||
|
||
=
|
||
|
||
𝑖(𝑛 − 1) 𝑚−1
|
||
|
||
𝐸 (𝐿𝑖 𝐿𝑗 )
|
||
|
||
=
|
||
|
||
𝑖(𝑛 − 1)(𝑛𝑗 + 𝑛 − 𝑚(𝑚 − 1)
|
||
|
||
𝑚)
|
||
|
||
2)
|
||
|
||
Moments
|
||
|
||
of
|
||
|
||
1 𝐿𝑖
|
||
|
||
:
|
||
|
||
The second order
|
||
|
||
Taylor
|
||
|
||
expansion
|
||
|
||
of
|
||
|
||
𝑓 (𝑥)
|
||
|
||
=
|
||
|
||
1 𝑥
|
||
|
||
at
|
||
|
||
𝑥0
|
||
|
||
=
|
||
|
||
𝐸(𝐿𝑖) is given by
|
||
|
||
𝑓 (𝑥)
|
||
|
||
≈
|
||
|
||
1 𝑥0
|
||
|
||
−
|
||
|
||
𝑥
|
||
|
||
− 𝑥0 𝑥20
|
||
|
||
+
|
||
|
||
(𝑥
|
||
|
||
− 𝑥0 𝑥30
|
||
|
||
)2
|
||
|
||
.
|
||
|
||
[1] Bolton, R. J. and Hand, D. J. (2002), Statistical Fraud Detection: A Review, Statistical Science, 17, 235–255.
|
||
[2] Fix, E. and Hodges, J. L. (1951), Discriminatory AnalysisNonparametric Discrimination: Consistency Properties, Report Number 4, Project Number 21-49-004, USAF School of Aviation Medicine, Randolph Field, Texas.
|
||
[3] Cover, T. and Hart, P. (1967), Nearest Neighbor Pattern Classification, IEEE Transactions on Information Theory, IT-13, 21–27.
|
||
[4] Cortes, C. and Vapnik, V. (1995), Support Vector Networks, Machine Learning, 20, 273–294.
|
||
[5] Zhu, M., Su, W. and Chipman, H. A. (2006), LAGO: A Computationally Efficient Approach for Statistical Detection, Technometrics, 48, 193– 205.
|
||
[6] Michael Buckland and Fredric Gey (1994), The Relationship Between Recall and Precision, Journal of the American Society for Information Science, 45(1), 12–19.
|
||
[7] Zhu, Mu. (2004), Recall, Precision and Average Precision, unpublished technical report, Department of Statistics and Actuarial Science, University of Waterloo.
|
||
[8] Javed A. Aslam, Emine Yilmaz, and Virgiliu Pavlu (2005), A Geometric Interpretation of R-precision and Its Correlation with Average Precision, Proceedings of the 28th annual international ACM SIGIR conference on Research and Development in Information Retrieval, 573 - 574, Salvador, Brazil.
|
||
[9] Alexander Yeh (2000), More Accurate Tests for the Statistical Significance of Result Differences, Proceedings of the 18th conference on Computational Linguistics, Volume 2, 947–953.
|
||
[10] Cyril Goutte and Eric Gaussier (2005), A Probabilistic Interpretation of Precision, Recall and F-score, with Implication for Evaluation, Advances in Information Retrieval, 345–359, Springer.
|
||
|
||
Hence
|
||
|
||
1 𝐿𝑖
|
||
|
||
≈
|
||
|
||
1 𝐸(𝐿𝑖)
|
||
|
||
−
|
||
|
||
𝐿𝑖 − 𝐸(𝐿𝑖) 𝐸2(𝐿𝑖)
|
||
|
||
+
|
||
|
||
{𝐿𝑖 − 𝐸(𝐿𝑖)}2 𝐸3(𝐿𝑖)
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d2on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
|
||
|