Files
DissLiteratur/storage/UKBIUCEH/.zotero-ft-cache
Johannes Paehr c4354c0441 init
2025-10-18 15:35:31 +02:00

1446 lines
24 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012)
Statistical Inference on Recall, Precision and Average Precision under Random Selection
Peng Zhang Department of Mathematics
Zhejiang University Hangzhou, Zhejiang, 310027 China
pengzzju@gmail.com
Wanhua Su Department of Mathematics and Statistics
Grant MacEwan University Edmonton, Alberta, Canada T5J 4S2
Abstract—The objective of a rare target detection problem is to identify the rare targets as early as possible. Recall, precision and average precision are three popular performance measures for evaluating different detection methods. However, there is little literature on the statistical properties of these three measures. We develop a framework for conducting statistical inference on recall, precision and average precision through establishing their asymptotic properties. Simulations are used to illustrate the idea. The proposed methods can also be applied in other areas where ranking systems need to be evaluated, such as information retrieval.
are not fraudulent and we are interested in catching the frauds. 3) Direct Marketing: Here x𝑖 is a vector of descriptors for a potential customer and 𝑦𝑖 indicates whether the customer will respond to the advertisement or not. Most customers will not respond and we are interested in identifying those most likely to respond.
Rare target detection is analogous to information retrieval by treating the rare targets as relevant objects.
I. INTRODUCTION
A. Information Retrieval
Information retrieval is the science of searching for information relevant to a given topic from a huge database. The information can be in the form of text documents, photos, videos, audio clips or the combinations of these media. An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines (e.g., Google). The system then computes a numeric score on how well each object in the database matches the query. This score is called relevant score. The objects are ranked and shown to the user according to the relevant score. Objects with larger relevant score appear earlier in the list.
B. Rare Target Detection
In a typical rare target detection problem, we have data {𝑦𝑖, x𝑖}, where x𝑖 is the predictor vector of the 𝑖th observation and 𝑦𝑖 = {0, 1} is its class label. The objective is to identify the rare class-1 observations (targets) as early as possible. In general, those selected targets will be passed to the next stage for further investigation. Three typical applications of rare target detection are:
1) Drug Discovery: Here x𝑖 is a vector of descriptors for a chemical compound and 𝑦𝑖 indicates whether the compound is considered an active drug agent for a certain disease. Most compounds are inactive and we are interested in detecting the active ones.
2) Credit Card Fraud Detection [1]: Here x𝑖 is a vector of descriptors for a credit card transaction and 𝑦𝑖 indicates whether the transaction is fraudulent. Most transactions
C. Performance Measures
Many measures for evaluating the performance of informa-
tion retrieval systems have been proposed, among which recall
and precision are the most popular ones. Suppose there are
𝑛 objects in the database, among which 𝑚 are relevant. Let
𝑠1, ⋅ ⋅ ⋅ , 𝑠𝑛 be the relevant scores of the objects, 𝑠(1), ⋅ ⋅ ⋅ 𝑠(𝑛)
be the corresponding sorted scores in a descending order, and
𝑦(1), ⋅ ⋅ ⋅ , 𝑦(𝑛) be binary variables indicating whether the 𝑖th
ranked document is relevant to the query or not. That is,
{
𝑦(𝑖) =
1 the 𝑖th object is relevant; 0 the 𝑖th object is irrelevant.
A retrieved item is called a hit if it is a relevant object. Define
t𝑡heobhjeitctfsu, nic.eti.o, n(𝑡()𝑡)=as∑th𝑡𝑖=e1n𝑦u(𝑖m).beRrecoafllhiatts
among rank 𝑡,
the top 𝑟(𝑡), is
defined as the fraction of relevant objects that are successfully
retrieved among the top 𝑡 objects. Precision at rank 𝑡, 𝑝(𝑡),
is the fraction of the retrieved objects that are relevant to the
query. That is,
𝑟(𝑡)
=
(𝑡) 𝑚
,
𝑝(𝑡)
=
(𝑡) 𝑡
.
(1)
Recall and precision can be also used in rare target detection problems. Similar to information retrieval, the first step to solve a rare target detection problem is to rank the items by a score function which can be modelled by 𝐾-nearest neighbours (KNN, e.g., [2], [3]), support vector machines (SVM, e.g., [4]), LAGO [5] or other methods. Then according to ones budget, the top 𝑄 items with the largest scores are classified as class 1 and the remaining items are classified as class 0. The classification result is usually presented in a confusion matrix presented in Table I.
Authori9ze7d8-li1ce-4ns6e7d3u-0se02lim4-it7e/d1t0o/:$T2e6c.h0n0isc©h2e0In1f2ormIEaEtioEnsbibliothek (TIB). Downl1o3ad4e8d on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
True
classification result
label
1
0
1 TP (true positive) FN (false negative)
0 FP (false positive) TN (true negative)
TABLE I CONFUSION MATRIX OF CLASSIFICATION
In the context of rare target detection,
recall
=
𝑇
𝑃
𝑇𝑃 +𝐹
𝑁
,
precision
=
𝑇𝑃 𝑇𝑃 +𝐹𝑃
.
Given 𝑄, the ranking system gives larger 𝑟(𝑄) or 𝑝(𝑄) is
the better. However, there is a trade-off between recall and
precision in such a way that recall 𝑟(𝑡) generally increases
with 𝑡 while precision 𝑝(𝑡) decreases with 𝑡. More details about
this relationship between recall and precision can be found in
[6] and [7]. Therefore, recall and precision should be taken
into account simultaneously when we evaluate or compare
different ranking systems. One common way to combine recall
and precision is the average precision (AP), which is defined
as the average of the precisions of those hits. That is,
AP
=
1 𝑚
∑𝑛
𝑦(𝑡)
(𝑡) 𝑡
=
1 𝑚
∑𝑛
𝑦(𝑡)
∑𝑡
𝑗=1
𝑡
𝑦(𝑗)
.
(2)
𝑡=1
𝑡=1
Suppose we need to rank eight items, among which only
three belong to class 1. Table II shows the calculation of hit function (𝑡), recall 𝑟(𝑡) and precision 𝑝(𝑡) at each rank 𝑡 = 1, . . . , 8.
Result at Each Rank
Rank (𝑡)
12345678
Hit list
11010000
Hit function (𝑡) 1 2 2 3 3 3 3 3
Recall 𝑟(𝑡) Precision 𝑝(𝑡)
12233333 33333333 12233333 12345678
TABLE II CALCULATION OF HIT FUNCTION, RECALL AND PRECISION
Recall that average precision is calculated as the average of
precisions over the hits. Since hits appear at the first, second
and fourth places, the average precision is
(
)
AP
=
1 3
1 1
+
2 2
+
3 4
.
Plotting 𝑝(𝑡) against 𝑟(𝑡) gives the recall-precision curve, average precision is an approximation to the area under the curve (e.g., [8]).
D. Existing methods
To our best knowledge, little has been done on making statistical inference on the performance measure. Yeh [9] suggested using pair-t, signed rank and Wilcoxon test for comparing recalls and using computationally intensive randomization test for precision, nothing mentioned about average precision.
Goutte & Gaussier [10] handled this problem from a Bayesian aspect, they argued that the joint distribution of (TP, FN, FP, TN) follows a multinomial distribution; and the conditional distribution of true positive given the total number of class-1 observation, TPTP+FP Binomial(TP+FP, 𝑝); the conditional distribution of true positive given the number of retrieved items, TPTP+FN Binomial(TP+FN, 𝑟). Adopting conjugate Beta priors to precision 𝑝 and recall 𝑟, inference on 𝑝 and 𝑟 can be made on the posterior distribution. For comparing two ranking systems, each object falls into one of the following categories:
∙ System 1 gives the correct assignment, system 2 fails; ∙ System 2 gives the correct assignment, system 1 fails; ∙ Both two systems yield the same assignment.
Let 𝜋1, 𝜋2 and 𝜋3 be the probabilities that the above events happen. A Dirichlet prior is assigned to (𝜋1, 𝜋2, 𝜋3), and the posterior distribution of 𝜋1, 𝜋2, 𝜋3 is another Dirichlet distribution. By drawing samples from the posterior distribution, one can calculate the probability that system 1 is superior to system 2, i.e., 𝑃 𝑟(𝜋1 > 𝜋2) empirically by Monte Carlo method.
The rest of the paper is organized as follows. Section II gives the distributions of recall, precision and average precision under random selection. Our approach to making inference on recall, precision and average precision is also presented. Section III presents a simulation study. We end with a discussion in section IV.
II. DISTRIBUTIONS OF RECALL, PRECISION AND AVERAGE PRECISION UNDER RANDOM SELECTION
A. Distributions of Recall and Precision
As given in Equation (1), recall and precision at rank 𝑡 are
defined as
𝑟(𝑡)
=
(𝑡) 𝑚
,
𝑝(𝑡)
=
(𝑡) 𝑡
,
where the hit function (𝑡) is the number of hits among the top 𝑡 retrieved objects. Suppose there are totally 𝑛 objects, among which 𝑚 are targets (class 1). Under random selection, every
item has the same probability of being retrieved among the top-𝑡 list; therefore, the number of hits among the top 𝑡 items, (𝑡), follows a hypergeometric distribution with parameters (𝑛, 𝑚, 𝑡). As a result,
E{(𝑡)}
=
𝑡𝑚 𝑛
,
Var{(𝑡)}
=
𝑡𝑚 𝑛
(𝑛
𝑚)(𝑛 𝑛(𝑛 1)
𝑡)
.
Hence, the expected value of recall 𝑟(𝑡) and precision 𝑝(𝑡) are
E{𝑟(𝑡)}
=
𝑡 𝑛
,
E{𝑝(𝑡)}
=
𝑚 𝑛
;
(3)
and correspondingly the variance of 𝑟(𝑡) and 𝑝(𝑡) are
Var{𝑟(𝑡)}
=
𝑡(𝑛 𝑚)(𝑛 𝑚𝑛2(𝑛
1)
𝑡)
,
Var{𝑝(𝑡)}
=
𝑚(𝑛 𝑚)(𝑛 𝑡𝑛2(𝑛 1)
𝑡)
.
(4)
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de4d9on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
B. The Distribution of AP Under Random Selection
Under random selection, the first two moments of aver-
age precision can be exactly calculated by establishing the
marginal and joint distributions of the location of hits. Recall
that average precision is calculated as the average of precisions
over the hits. Since hits appear at the first, second and fourth
places, the average precision is
(
)
AP
=
1 3
1 1
+
2 2
+
3 4
.
Let 𝐿1, 𝐿2, ⋅ ⋅ ⋅ , 𝐿𝑚 be the ranks (locations) of the 𝑖th hits,
average precision given in (2) can be rewritten as
(
)
AP
=
1 𝑚
1 𝐿1
+
2 𝐿2
+⋅⋅⋅
+
𝑚 𝐿𝑚
.
(5)
In our example, 𝐿1 = 1, 𝐿2 = 2, 𝐿3 = 4. Viewing average
precision in this way, we can calculate the moments of
average precision by finding the marginal distributions of 𝐿𝑖s.
Suppose there are totally 𝑛 observations among which 𝑚
objects belong to class 1. Fixing 𝐿𝑖 at the location 𝑘(, the tot)al
distinct ways to arrange these 𝑚 class-1 objects is
𝑛1 𝑚1
.
The first hit 𝐿1 can take values of 1, 2, ⋅ ⋅ ⋅ , 𝑛 𝑚 + 1, the second hit 𝐿2 takes 2, 3, ⋅ ⋅ ⋅ , 𝑛 𝑚 + 2, the 𝑚th hit 𝐿𝑚 takes
𝑚, ⋅ ⋅ ⋅ , 𝑛.
In general, the probability that the 𝑗th hits appears at the
𝑘th position, 𝑃𝑗𝑘, is given by
(
)( )
𝑛𝑘 𝑘1
𝑃𝑗𝑘 = 𝑃 (𝐿𝑗 = 𝑘) =
𝑚 ( 𝑗 𝑗) 1 𝑛1
(6)
𝑚1
This is due to the fact that the 𝑗th hits at the 𝑘th position
implies 𝑗 1 hits appearing in the previous 𝑘 1 locations and
the remaining 𝑗 1 hits appearing in the rest 𝑘 1 positions.
Similarly, the joint distribution of 𝐿𝑖 and 𝐿𝑗 for 𝑖 < 𝑗 can be
calculated as
( )(
)(
)
𝑠1 𝑡𝑠1 𝑛𝑡
𝑖1 𝑃 (𝐿𝑖 = 𝑠, 𝐿𝑗 = 𝑡) =
𝑗( 𝑖 1) 𝑛2
𝑚−𝑗 , (7)
𝑚2
which means 𝑖1 hits appearing among the first 𝑠1 locations,
𝑗𝑖1 hits between the 𝑠th and 𝑡th positions, and 𝑚−𝑗 among
the last 𝑛𝑡 positions. The denominator is the total number of
arranging 𝑚 2 hits among 𝑛 2 locations, fixing 𝐿𝑖 and 𝐿𝑗.
Given the marginal distribution (6) and the joint distribution
(7), it is straight forward to calculate the expectation and
variance of 𝐴𝑃 given in (5). That is
𝐸(AP)
=
1 𝑚
∑ 𝑚 𝐸
𝑖=1
()
𝑖 𝐿𝑖
,
𝑉 𝑎𝑟(AP) = 𝐸(AP2) 𝐸2(AP)
where 𝐸(AP2)
=
1 𝑚2
⎧ ⎨∑ 𝑚 ⎩𝐸
𝑖=1
(
𝑖2 𝐿2𝑖
)
+
2
𝑚∑1
𝑖=1
𝑗
∑ 𝑚
=𝑖+1
𝐸
(
𝑖𝑗 𝐿𝑖𝐿𝑗
)⎫⎬ ⎭
with
𝐸 𝐸
() 1
𝐿𝑖 (
1
𝐿𝑖𝐿𝑗
= )
∑𝑛
1 𝑡
𝑃
(𝐿𝑖
=
𝑡),
𝑡=1
=
∑𝑛
∑𝑛
1 𝑠𝑡
𝑃
(𝐿𝑖
𝑠=1 𝑡=1
(
𝐸
1 𝐿2𝑖
= 𝑠, 𝐿𝑗
) =
∑𝑛 =
𝑡=1
𝑡).
1 𝑡2
𝑃
(𝐿𝑖
=
𝑡),
This method, however, is computationally expensive when 𝑚 and 𝑛 are large. The following approach based on an approx-
imate normal distribution is appropriate then. We present the
result here and leave the detailed derivation in an appendix.
With straight forward algebra we have,
𝐸(𝐿𝑖)
=
𝑖
𝑛 𝑚
,
(8)
𝐸(𝐿2𝑖 )
=
𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 𝑚(𝑚 + 1)
𝑚)
,
𝑉 𝑎𝑟(𝐿𝑖)
=
𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 𝑚(𝑚 + 1)
𝑚)
𝑖2𝑛2 𝑚2
,
𝐸 (𝐿𝑖 𝐿𝑗 )
=
𝑖(𝑛
1)(𝑛𝑗 + 𝑛 𝑚(𝑚 1)
𝑚)
,
(9)
𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) = 𝐸(𝐿𝑖𝐿𝑗) 𝐸(𝐿𝑖)𝐸(𝐿𝑗).
(10)
where 𝐸(𝐿𝑖), 𝐸(𝐿𝑗) and 𝐸(𝐿𝑖𝐿𝑗) are given by equations (8)
and (9).
The
moments
of
1 𝐿𝑖
are
expansion of functions 𝑓
obtained
(𝑥)
=
1 𝑥
with and
a second 𝑔(𝑥) =
order
1 𝑥2
at
Taylor 𝑥0 =
𝐸(𝐿𝑖). The 𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) is obtained using delta method at
𝐸(𝐿𝑖), 𝐸(𝐿𝑗).
()
{
}
𝐸
(
1 𝐿𝑖
)
1 𝐸(𝐿𝑖)
1 {
+
𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
, }
(11)
𝐸
1 (𝐿2𝑖
)
1 𝐸2(𝐿𝑖)
1 {
+
3𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
, }
𝑉 𝑎𝑟 (
1 𝐿𝑖 )
𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)
1
𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
,
𝐶 𝑜𝑣
1 𝐿𝑖
,
1 𝐿𝑗
𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) 𝐸2(𝐿𝑖)𝐸2(𝐿𝑗
)
.
where 𝐸(𝐿𝑖), 𝐸(𝐿𝑗) and 𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) are given by equations
(8) and (10) {Because}
𝐴re𝑃spec=tivel𝑛1y.∑𝑚 𝑖=1
𝐸 (𝐿𝑖 ) 𝐿𝑖
by
equation
(8),
and
𝐸
𝐸 (𝐿𝑖 ) 𝐿𝑖
1
+
𝑉 𝑎𝑟(𝐿𝑖) 𝐸 2 (𝐿𝑖 )
by
equation
(11),
which
goes
to 1 when 𝑚 goes to ∞, it can be shown, using central
limit theorem on weakly dependent variables, that 𝐴𝑃 is
approximately normally distributed, 𝐴𝑃 𝐿 𝑁 (𝜇, 𝜎2) where
𝜇 𝜎2
= =
𝑚𝑚112∑ 𝑖=𝑚∑ 𝑖=𝑚11𝐸𝑉(𝑎𝑟𝐿𝑖(𝑖 )𝐿𝑖𝑖
)
+
2 𝑚2
𝑖<𝑗
𝐶 𝑜𝑣
( 𝑖 𝐿𝑖
,
𝑗 𝐿𝑗
(12)
) (13)
III. SIMULATION STUDY
We carry out a simulation study to verify the validity of our approach on recall, precision and average precision presented in Section II. We choose two settings, with 𝑚 = 100, 𝑛 =
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d0on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
1000 and 𝑚 = 500, 𝑛 = 2000 and go through the following steps:
1) Generate a random permutation, 2) Calculate the value of recall, precision when 𝑡 = 𝑚
(they are the same) and average precision, 3) Repeat steps 1) and 2) 10000 times, obtain a sampling
distribution of recall (precision) and average precision, 4) Find the empirical mean and variance of those 10000
recalls and average precisions,
The theoretical means and variances of recall and precision are calculated with equations (3) and (4), which give 𝐸{𝑟(100)} = 𝐸{𝑝(100)} = 0.1 and 𝑉 𝑎𝑟{𝑟(100)} = 𝑉 𝑎𝑟{𝑝(100)} = 0.00081 for 𝑚 = 100, 𝑛 = 1000 and 𝐸{𝑟(500)} = 0.25 and 𝑉 𝑎𝑟{𝑟(500)} = 0.00028 for 𝑚 = 500, 𝑛 = 2000. The corresponding empirical means and variances, obtained from the simulation, are 0.099985, 0.000824 for 𝑚 = 100, 𝑛 = 1000 and 0.250023, 0.0002824 for 𝑚 = 500, 𝑛 = 2000, respectively. Table III lists the theoretical and empirical means and variances, as well as selected quantiles of the distribution of the simulated average precisions, with the first two rows showing the first setting and the next two rows presenting results for the second setting. The histogram of the simulated average precisions of the second setting is presented in Figure 1 with normal density curve of the limiting distribution overlaid. It is shown clearly that the result is satisfactory.
Empirical Theoretical Empirical Theoretical
mean .1037 .1037 .2527 .2522
variance .0001286 .0001392 .000096 .000142
2.5% .0876 .0806 .2347 .2288
50% .1044 .1037 .2521 .2522
97.5% .1321 .1268 .2731 .2755
TABLE III COMPARISON BETWEEN EMPIRICAL AND THEORETICAL DISTRIBUTIONS
OF SIMULATED AVERAGE PRECISIONS, ON MEANS, VARIANCES AND QUANTILES.
40
30
20
Density
10
0
0.22
0.24
0.26
0.28
0.30
Average Precision
Fig. 1. Histogram of simulated average precisions with theoretical density.
IV. CONCLUSION
Recall, precision and average precision are three performance measures in evaluating different ranking algorithms for rare target detection problems. Little has been done on the statistical properties of these three measures. In this article, we derived the sampling distribution for recall, precision and average precision under random selection. More important, we provided a theoretical way to calculate the limiting distribution for average precisions when the number of hits is large. Based on the limiting distribution, hypothesis testing can be conducted to test whether one ranking algorithm significantly outperforms the other in detecting the rare targets.
APPENDIX 1) Derivation of moments of 𝐿𝑖:
(
)( )
𝐸(𝐿𝑖)
=
𝑛−∑𝑚+𝑖 𝑘
𝑛𝑘 𝑚( 𝑗
𝑘1 )𝑗 1
𝑘=𝑖
𝑛1 ( ) (𝑚 1 )
=
𝑛−∑𝑚+𝑖 𝑖
𝑘 𝑗
𝑛𝑘 (𝑚) 𝑗
𝑘=𝑖
𝑚𝑛 𝑛𝑚
=
𝑖𝑛 𝑚
(
)( )
𝐸(𝐿2𝑖 )
=
𝑛−∑𝑚+𝑖 𝑘2
𝑘=𝑖
𝑛𝑘 𝑘1 𝑚( 𝑗 )𝑗 1
𝑛1
𝑚1
(
)( )
=
𝑛−∑𝑚+𝑖
𝑘(𝑘
+
1)
𝑛𝑘 (𝑚 𝑗
)
𝑘1 𝑗1
𝑘=𝑖
𝑛1
(
𝑚)(1 )
𝑛−∑𝑚+𝑖 𝑘
𝑛𝑘 𝑚( 𝑗
𝑘1 )𝑗 1
𝑛1
𝑘=𝑖
(𝑚 1) (
)
=
𝑛−∑𝑚+𝑖 𝑖(𝑖 + 1)
𝑘+1 𝑗 +(1
𝑛𝑘 𝑚) 𝑗
𝑘=𝑖
𝑚(𝑚+1) 𝑛 + 1 𝑛(𝑛+1) 𝑚 + 1
𝐸(𝐿𝑖)
=
𝑖𝑛(𝑖 + 1)(𝑛 + 𝑚(𝑚 + 1)
1)
𝑖𝑛 𝑚
=
𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 𝑚) 𝑚(𝑚 + 1)
𝑉 𝑎𝑟(𝐿𝑖) = 𝐸(𝐿2𝑖 ) 𝐸2(𝐿𝑖)
=
𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 𝑚(𝑚 + 1)
𝑚)
𝑖2𝑛2 𝑚2
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d1on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.
𝐸 (𝐿𝑖 𝐿𝑗 )
( )(
)(
)
∑ ∑ 𝑠𝑡
𝑠1 𝑖1
=
𝑡𝑠1 (𝑗 𝑖 )1
𝑛𝑡 𝑚−𝑗
𝑠𝑡
𝑛2
𝑚2
and
()
𝐸
1 𝐿𝑖
1 𝐸(𝐿𝑖)
+ {
𝑉 𝑎𝑟(𝐿𝑖) 𝐸3(𝐿𝑖)
}
=
1 𝐸(𝐿𝑖)
1
+
𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
.
= 𝐴+𝐵 𝐶 ( )(
)( )
𝐴
=
∑ ∑ 𝑠(𝑡 𝑠)
𝑠1 𝑖 1(
𝑡𝑠1 𝑗 )𝑖 1
𝑛𝑡 𝑚−𝑗
𝑠𝑡
𝑛2
𝑚2
( )( )( )
∑ ∑ 𝑖(𝑗 𝑖)
𝑠 𝑖
=
𝑡𝑠 𝑛𝑡 𝑗 (𝑖 ) 𝑚 𝑗
𝑠𝑡
𝑚(𝑚1) 𝑛
𝑛(𝑛1) 𝑚
=
𝑖𝑛(𝑗 𝑖)(𝑛 1) 𝑚(𝑚 1)
( )(
)(
)
𝐵
=
∑ ∑ 𝑠(𝑠 + 1)
𝑠1 𝑖 1(
𝑡𝑠1 𝑗 )𝑖 1
𝑛𝑡 𝑚−𝑗
𝑠𝑡
𝑛2
𝑚2
( )(
)(
)
∑ ∑ 𝑖(𝑖 + 1)
𝑠 𝑖
=
𝑡𝑠1 𝑛𝑡 𝑗 (𝑖 1 ) 𝑚 𝑗
𝑠𝑡
𝑚(𝑚1) 𝑛 1
𝑛(𝑛1) 𝑚 1
The
second
order
Taylor
expansion
of
𝑔(𝑥)
=
1 𝑥2
at
𝑥0 = 𝐸(𝐿𝑖) is given by
𝑔(𝑥)
1 𝑥20
2(𝑥 𝑥0) 𝑥30
+
3(𝑥
𝑥0)2 𝑥40
.
Hence
1 𝐿2𝑖
1 𝐸2(𝐿𝑖)
2{𝐿𝑖 𝐸(𝐿𝑖)} 𝐸3(𝐿𝑖)
+
3{𝐿𝑖 𝐸(𝐿𝑖)}2 𝐸4(𝐿𝑖)
and
()
𝐸
1 𝐿2𝑖
1 𝐸2(𝐿𝑖)
+ {
3𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)
}
=
1 𝐸2(𝐿𝑖)
1
+
3𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
.
Then
()
()
()
𝑉 𝑎𝑟
1 𝐿𝑖
=
𝐸
1 𝐿2𝑖
𝐸2 {
1 𝐿𝑖
}
=
𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)
1
𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)
REFERENCES
=
𝑖𝑛(𝑖 + 1)(𝑛 1) 𝑚(𝑚 1)
( )(
)(
)
∑∑ 𝑠
𝑠1 𝑖1
𝐶=
𝑡𝑠1 (𝑗 𝑖 )1
𝑛𝑡 𝑚−𝑗
𝑠𝑡
𝑛2
𝑚2
( )(
)(
)
∑∑ 𝑖
𝑠 𝑖
=
𝑡𝑠1 𝑛𝑡 𝑗 𝑖( 1 )𝑚 𝑗
𝑠𝑡
𝑚1 𝑛 1
𝑛1 𝑚 1
=
𝑖(𝑛 1) 𝑚1
𝐸 (𝐿𝑖 𝐿𝑗 )
=
𝑖(𝑛 1)(𝑛𝑗 + 𝑛 𝑚(𝑚 1)
𝑚)
2)
Moments
of
1 𝐿𝑖
:
The second order
Taylor
expansion
of
𝑓 (𝑥)
=
1 𝑥
at
𝑥0
=
𝐸(𝐿𝑖) is given by
𝑓 (𝑥)
1 𝑥0
𝑥
𝑥0 𝑥20
+
(𝑥
𝑥0 𝑥30
)2
.
[1] Bolton, R. J. and Hand, D. J. (2002), Statistical Fraud Detection: A Review, Statistical Science, 17, 235255.
[2] Fix, E. and Hodges, J. L. (1951), Discriminatory AnalysisNonparametric Discrimination: Consistency Properties, Report Number 4, Project Number 21-49-004, USAF School of Aviation Medicine, Randolph Field, Texas.
[3] Cover, T. and Hart, P. (1967), Nearest Neighbor Pattern Classification, IEEE Transactions on Information Theory, IT-13, 2127.
[4] Cortes, C. and Vapnik, V. (1995), Support Vector Networks, Machine Learning, 20, 273294.
[5] Zhu, M., Su, W. and Chipman, H. A. (2006), LAGO: A Computationally Efficient Approach for Statistical Detection, Technometrics, 48, 193 205.
[6] Michael Buckland and Fredric Gey (1994), The Relationship Between Recall and Precision, Journal of the American Society for Information Science, 45(1), 1219.
[7] Zhu, Mu. (2004), Recall, Precision and Average Precision, unpublished technical report, Department of Statistics and Actuarial Science, University of Waterloo.
[8] Javed A. Aslam, Emine Yilmaz, and Virgiliu Pavlu (2005), A Geometric Interpretation of R-precision and Its Correlation with Average Precision, Proceedings of the 28th annual international ACM SIGIR conference on Research and Development in Information Retrieval, 573 - 574, Salvador, Brazil.
[9] Alexander Yeh (2000), More Accurate Tests for the Statistical Significance of Result Differences, Proceedings of the 18th conference on Computational Linguistics, Volume 2, 947953.
[10] Cyril Goutte and Eric Gaussier (2005), A Probabilistic Interpretation of Precision, Recall and F-score, with Implication for Evaluation, Advances in Information Retrieval, 345359, Springer.
Hence
1 𝐿𝑖
1 𝐸(𝐿𝑖)
𝐿𝑖 𝐸(𝐿𝑖) 𝐸2(𝐿𝑖)
+
{𝐿𝑖 𝐸(𝐿𝑖)}2 𝐸3(𝐿𝑖)
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d2on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.