DissLiteratur/storage/UKBIUCEH/.zotero-ft-cache

2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012)

Statistical Inference on Recall, Precision and Average Precision under Random Selection

Peng Zhang Department of Mathematics
Zhejiang University Hangzhou, Zhejiang, 310027 China
pengzzju@gmail.com

Wanhua Su Department of Mathematics and Statistics
Grant MacEwan University Edmonton, Alberta, Canada T5J 4S2

Abstract—The objective of a rare target detection problem is to identify the rare targets as early as possible. Recall, precision and average precision are three popular performance measures for evaluating different detection methods. However, there is little literature on the statistical properties of these three measures. We develop a framework for conducting statistical inference on recall, precision and average precision through establishing their asymptotic properties. Simulations are used to illustrate the idea. The proposed methods can also be applied in other areas where ranking systems need to be evaluated, such as information retrieval.

are not fraudulent and we are interested in catching the frauds. 3) Direct Marketing: Here x𝑖 is a vector of descriptors for a potential customer and 𝑦𝑖 indicates whether the customer will respond to the advertisement or not. Most customers will not respond and we are interested in identifying those most likely to respond.
Rare target detection is analogous to information retrieval by treating the rare targets as relevant objects.

I. INTRODUCTION
A. Information Retrieval
Information retrieval is the science of searching for information relevant to a given topic from a huge database. The information can be in the form of text documents, photos, videos, audio clips or the combinations of these media. An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines (e.g., Google). The system then computes a numeric score on how well each object in the database matches the query. This score is called relevant score. The objects are ranked and shown to the user according to the relevant score. Objects with larger relevant score appear earlier in the list.
B. Rare Target Detection
In a typical rare target detection problem, we have data {𝑦𝑖, x𝑖}, where x𝑖 is the predictor vector of the 𝑖th observation and 𝑦𝑖 = {0, 1} is its class label. The objective is to identify the rare class-1 observations (targets) as early as possible. In general, those selected targets will be passed to the next stage for further investigation. Three typical applications of rare target detection are:
1) Drug Discovery: Here x𝑖 is a vector of descriptors for a chemical compound and 𝑦𝑖 indicates whether the compound is considered an active drug agent for a certain disease. Most compounds are inactive and we are interested in detecting the active ones.
2) Credit Card Fraud Detection [1]: Here x𝑖 is a vector of descriptors for a credit card transaction and 𝑦𝑖 indicates whether the transaction is fraudulent. Most transactions

C. Performance Measures

Many measures for evaluating the performance of informa-

tion retrieval systems have been proposed, among which recall

and precision are the most popular ones. Suppose there are

𝑛 objects in the database, among which 𝑚 are relevant. Let

𝑠1, ⋅ ⋅ ⋅ , 𝑠𝑛 be the relevant scores of the objects, 𝑠(1), ⋅ ⋅ ⋅ 𝑠(𝑛)

be the corresponding sorted scores in a descending order, and

𝑦(1), ⋅ ⋅ ⋅ , 𝑦(𝑛) be binary variables indicating whether the 𝑖th

ranked document is relevant to the query or not. That is,

{

𝑦(𝑖) =

1 the 𝑖th object is relevant; 0 the 𝑖th object is irrelevant.

A retrieved item is called a hit if it is a relevant object. Deﬁne

t𝑡heobhjeitctfsu, nic.eti.o, nℎ(ℎ𝑡()𝑡)=as∑th𝑡𝑖=e1n𝑦u(𝑖m).beRrecoafllhiatts

among rank 𝑡,

the top 𝑟(𝑡), is

deﬁned as the fraction of relevant objects that are successfully

retrieved among the top 𝑡 objects. Precision at rank 𝑡, 𝑝(𝑡),

is the fraction of the retrieved objects that are relevant to the

query. That is,

𝑟(𝑡)

=

ℎ(𝑡) 𝑚

,

𝑝(𝑡)

=

ℎ(𝑡) 𝑡

.

(1)

Recall and precision can be also used in rare target detection problems. Similar to information retrieval, the ﬁrst step to solve a rare target detection problem is to rank the items by a score function which can be modelled by 𝐾-nearest neighbours (KNN, e.g., [2], [3]), support vector machines (SVM, e.g., [4]), LAGO [5] or other methods. Then according to one’s budget, the top 𝑄 items with the largest scores are classiﬁed as class 1 and the remaining items are classiﬁed as class 0. The classiﬁcation result is usually presented in a confusion matrix presented in Table I.

Authori9ze7d8-li1ce-4ns6e7d3u-0se02lim4-it7e/d1t0o/:$T2e6c.h0n0isc©h2e0In1f2ormIEaEtioEnsbibliothek (TIB). Downl1o3ad4e8d on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.

True

classiﬁcation result

label

1

0

1 TP (true positive) FN (false negative)

0 FP (false positive) TN (true negative)

TABLE I CONFUSION MATRIX OF CLASSIFICATION

In the context of rare target detection,

recall

=

𝑇

𝑃

𝑇𝑃 +𝐹

𝑁

,

precision

=

𝑇𝑃 𝑇𝑃 +𝐹𝑃

.

Given 𝑄, the ranking system gives larger 𝑟(𝑄) or 𝑝(𝑄) is

the better. However, there is a trade-off between recall and

precision in such a way that recall 𝑟(𝑡) generally increases

with 𝑡 while precision 𝑝(𝑡) decreases with 𝑡. More details about

this relationship between recall and precision can be found in

[6] and [7]. Therefore, recall and precision should be taken

into account simultaneously when we evaluate or compare

different ranking systems. One common way to combine recall

and precision is the average precision (AP), which is deﬁned

as the average of the precisions of those hits. That is,

AP

=

1 𝑚

∑𝑛

𝑦(𝑡)

ℎ(𝑡) 𝑡

=

1 𝑚

∑𝑛

𝑦(𝑡)

∑𝑡
𝑗=1
𝑡

𝑦(𝑗)

.

(2)

𝑡=1

𝑡=1

Suppose we need to rank eight items, among which only
three belong to class 1. Table II shows the calculation of hit function ℎ(𝑡), recall 𝑟(𝑡) and precision 𝑝(𝑡) at each rank 𝑡 = 1, . . . , 8.

Result at Each Rank

Rank (𝑡)

12345678

Hit list

11010000

Hit function ℎ(𝑡) 1 2 2 3 3 3 3 3

Recall 𝑟(𝑡) Precision 𝑝(𝑡)

12233333 33333333 12233333 12345678

TABLE II CALCULATION OF HIT FUNCTION, RECALL AND PRECISION

Recall that average precision is calculated as the average of

precisions over the hits. Since hits appear at the ﬁrst, second

and fourth places, the average precision is

(

)

AP

=

1 3

1 1

+

2 2

+

3 4

.

Plotting 𝑝(𝑡) against 𝑟(𝑡) gives the recall-precision curve, average precision is an approximation to the area under the curve (e.g., [8]).

D. Existing methods
To our best knowledge, little has been done on making statistical inference on the performance measure. Yeh [9] suggested using pair-t, signed rank and Wilcoxon test for comparing recalls and using computationally intensive randomization test for precision, nothing mentioned about average precision.

Goutte & Gaussier [10] handled this problem from a Bayesian aspect, they argued that the joint distribution of (TP, FN, FP, TN) follows a multinomial distribution; and the conditional distribution of true positive given the total number of class-1 observation, TP∣TP+FP ∼ Binomial(TP+FP, 𝑝); the conditional distribution of true positive given the number of retrieved items, TP∣TP+FN ∼ Binomial(TP+FN, 𝑟). Adopting conjugate Beta priors to precision 𝑝 and recall 𝑟, inference on 𝑝 and 𝑟 can be made on the posterior distribution. For comparing two ranking systems, each object falls into one of the following categories:
∙ System 1 gives the correct assignment, system 2 fails; ∙ System 2 gives the correct assignment, system 1 fails; ∙ Both two systems yield the same assignment.
Let 𝜋1, 𝜋2 and 𝜋3 be the probabilities that the above events happen. A Dirichlet prior is assigned to (𝜋1, 𝜋2, 𝜋3), and the posterior distribution of 𝜋1, 𝜋2, 𝜋3 is another Dirichlet distribution. By drawing samples from the posterior distribution, one can calculate the probability that system 1 is superior to system 2, i.e., 𝑃 𝑟(𝜋1 > 𝜋2) empirically by Monte Carlo method.
The rest of the paper is organized as follows. Section II gives the distributions of recall, precision and average precision under random selection. Our approach to making inference on recall, precision and average precision is also presented. Section III presents a simulation study. We end with a discussion in section IV.

II. DISTRIBUTIONS OF RECALL, PRECISION AND AVERAGE PRECISION UNDER RANDOM SELECTION

A. Distributions of Recall and Precision

As given in Equation (1), recall and precision at rank 𝑡 are

deﬁned as

𝑟(𝑡)

=

ℎ(𝑡) 𝑚

,

𝑝(𝑡)

=

ℎ(𝑡) 𝑡

,

where the hit function ℎ(𝑡) is the number of hits among the top 𝑡 retrieved objects. Suppose there are totally 𝑛 objects, among which 𝑚 are targets (class 1). Under random selection, every
item has the same probability of being retrieved among the top-𝑡 list; therefore, the number of hits among the top 𝑡 items, ℎ(𝑡), follows a hypergeometric distribution with parameters (𝑛, 𝑚, 𝑡). As a result,

E{ℎ(𝑡)}

=

𝑡𝑚 𝑛

,

Var{ℎ(𝑡)}

=

𝑡𝑚 𝑛

(𝑛

− 𝑚)(𝑛 − 𝑛(𝑛 − 1)

𝑡)

.

Hence, the expected value of recall 𝑟(𝑡) and precision 𝑝(𝑡) are

E{𝑟(𝑡)}

=

𝑡 𝑛

,

E{𝑝(𝑡)}

=

𝑚 𝑛

;

(3)

and correspondingly the variance of 𝑟(𝑡) and 𝑝(𝑡) are

Var{𝑟(𝑡)}

=

𝑡(𝑛 − 𝑚)(𝑛 𝑚𝑛2(𝑛 −

− 1)

𝑡)

,

Var{𝑝(𝑡)}

=

𝑚(𝑛 − 𝑚)(𝑛 − 𝑡𝑛2(𝑛 − 1)

𝑡)

.

(4)

Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de4d9on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.

B. The Distribution of AP Under Random Selection

Under random selection, the ﬁrst two moments of aver-

age precision can be exactly calculated by establishing the

marginal and joint distributions of the location of hits. Recall

that average precision is calculated as the average of precisions

over the hits. Since hits appear at the ﬁrst, second and fourth

places, the average precision is

(

)

AP

=

1 3

1 1

+

2 2

+

3 4

.

Let 𝐿1, 𝐿2, ⋅ ⋅ ⋅ , 𝐿𝑚 be the ranks (locations) of the 𝑖th hits,

average precision given in (2) can be rewritten as

(

)

AP

=

1 𝑚

1 𝐿1

+

2 𝐿2

+⋅⋅⋅

+

𝑚 𝐿𝑚

.

(5)

In our example, 𝐿1 = 1, 𝐿2 = 2, 𝐿3 = 4. Viewing average

precision in this way, we can calculate the moments of

average precision by ﬁnding the marginal distributions of 𝐿𝑖’s.

Suppose there are totally 𝑛 observations among which 𝑚

objects belong to class 1. Fixing 𝐿𝑖 at the location 𝑘(, the tot)al

distinct ways to arrange these 𝑚 class-1 objects is

𝑛−1 𝑚−1

.

The ﬁrst hit 𝐿1 can take values of 1, 2, ⋅ ⋅ ⋅ , 𝑛 − 𝑚 + 1, the second hit 𝐿2 takes 2, 3, ⋅ ⋅ ⋅ , 𝑛 − 𝑚 + 2, the 𝑚th hit 𝐿𝑚 takes

𝑚, ⋅ ⋅ ⋅ , 𝑛.

In general, the probability that the 𝑗th hits appears at the

𝑘th position, 𝑃𝑗𝑘, is given by

(

)( )

𝑛−𝑘 𝑘−1

𝑃𝑗𝑘 = 𝑃 (𝐿𝑗 = 𝑘) =

𝑚 −( 𝑗 𝑗)− 1 𝑛−1

(6)

𝑚−1

This is due to the fact that the 𝑗th hits at the 𝑘th position

implies 𝑗 −1 hits appearing in the previous 𝑘 −1 locations and

the remaining 𝑗 − 1 hits appearing in the rest 𝑘 − 1 positions.

Similarly, the joint distribution of 𝐿𝑖 and 𝐿𝑗 for 𝑖 < 𝑗 can be

calculated as

( )(

)(

)

𝑠−1 𝑡−𝑠−1 𝑛−𝑡

𝑖−1 𝑃 (𝐿𝑖 = 𝑠, 𝐿𝑗 = 𝑡) =

𝑗(− 𝑖 − 1) 𝑛−2

𝑚−𝑗 , (7)

𝑚−2

which means 𝑖−1 hits appearing among the ﬁrst 𝑠−1 locations,

𝑗−𝑖−1 hits between the 𝑠th and 𝑡th positions, and 𝑚−𝑗 among

the last 𝑛−𝑡 positions. The denominator is the total number of

arranging 𝑚 − 2 hits among 𝑛 − 2 locations, ﬁxing 𝐿𝑖 and 𝐿𝑗.

Given the marginal distribution (6) and the joint distribution

(7), it is straight forward to calculate the expectation and

variance of 𝐴𝑃 given in (5). That is

𝐸(AP)

=

1 𝑚

∑ 𝑚 𝐸
𝑖=1

()

𝑖 𝐿𝑖

,

𝑉 𝑎𝑟(AP) = 𝐸(AP2) − 𝐸2(AP)

where 𝐸(AP2)

=

1 𝑚2

⎧ ⎨∑ 𝑚 ⎩𝐸
𝑖=1

(

𝑖2 𝐿2𝑖

)

+

2

𝑚∑−1
𝑖=1

𝑗

∑ 𝑚
=𝑖+1

𝐸

(

𝑖𝑗 𝐿𝑖𝐿𝑗

)⎫⎬ ⎭

with

𝐸 𝐸

() 1
𝐿𝑖 (
1
𝐿𝑖𝐿𝑗

= )

∑𝑛

1 𝑡

𝑃

(𝐿𝑖

=

𝑡),

𝑡=1

=

∑𝑛

∑𝑛

1 𝑠𝑡

𝑃

(𝐿𝑖

𝑠=1 𝑡=1

(

𝐸

1 𝐿2𝑖

= 𝑠, 𝐿𝑗

) =

∑𝑛 =
𝑡=1
𝑡).

1 𝑡2

𝑃

(𝐿𝑖

=

𝑡),

This method, however, is computationally expensive when 𝑚 and 𝑛 are large. The following approach based on an approx-

imate normal distribution is appropriate then. We present the

result here and leave the detailed derivation in an appendix.

With straight forward algebra we have,

𝐸(𝐿𝑖)

=

𝑖

𝑛 𝑚

,

(8)

𝐸(𝐿2𝑖 )

=

𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 − 𝑚(𝑚 + 1)

𝑚)

,

𝑉 𝑎𝑟(𝐿𝑖)

=

𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 − 𝑚(𝑚 + 1)

𝑚)

−

𝑖2𝑛2 𝑚2

,

𝐸 (𝐿𝑖 𝐿𝑗 )

=

𝑖(𝑛

−

1)(𝑛𝑗 + 𝑛 𝑚(𝑚 − 1)

−

𝑚)

,

(9)

𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) = 𝐸(𝐿𝑖𝐿𝑗) − 𝐸(𝐿𝑖)𝐸(𝐿𝑗).

(10)

where 𝐸(𝐿𝑖), 𝐸(𝐿𝑗) and 𝐸(𝐿𝑖𝐿𝑗) are given by equations (8)

and (9).

The

moments

of

1 𝐿𝑖

are

expansion of functions 𝑓

obtained

(𝑥)

=

1 𝑥

with and

a second 𝑔(𝑥) =

order

1 𝑥2

at

Taylor 𝑥0 =

𝐸(𝐿𝑖). The 𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) is obtained using delta method at

𝐸(𝐿𝑖), 𝐸(𝐿𝑗).

()

{

}

𝐸

(

1 𝐿𝑖

)

≈

1 𝐸(𝐿𝑖)

1 {

+

𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)

, }

(11)

𝐸

1 (𝐿2𝑖

)

≈

1 𝐸2(𝐿𝑖)

1 {

+

3𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)

, }

𝑉 𝑎𝑟 (

1 𝐿𝑖 )

≈

𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)

1

−

𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)

,

𝐶 𝑜𝑣

1 𝐿𝑖

,

1 𝐿𝑗

≈

𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) 𝐸2(𝐿𝑖)𝐸2(𝐿𝑗

)

.

where 𝐸(𝐿𝑖), 𝐸(𝐿𝑗) and 𝐶𝑜𝑣(𝐿𝑖, 𝐿𝑗) are given by equations

(8) and (10) {Because}

𝐴re𝑃spec=tivel𝑛1y.∑𝑚 𝑖=1

𝐸 (𝐿𝑖 ) 𝐿𝑖

by

equation

(8),

and

𝐸

𝐸 (𝐿𝑖 ) 𝐿𝑖

≈

1

+

𝑉 𝑎𝑟(𝐿𝑖) 𝐸 2 (𝐿𝑖 )

by

equation

(11),

which

goes

to 1 when 𝑚 goes to ∞, it can be shown, using central

limit theorem on weakly dependent variables, that 𝐴𝑃 is

approximately normally distributed, 𝐴𝑃 ∼𝐿 𝑁 (𝜇, 𝜎2) where

𝜇 𝜎2

= =

𝑚𝑚112∑ 𝑖=𝑚∑ 𝑖=𝑚11𝐸𝑉(𝑎𝑟𝐿𝑖(𝑖 )𝐿𝑖𝑖

)

+

2 𝑚2

∑
𝑖<𝑗

𝐶 𝑜𝑣

( 𝑖 𝐿𝑖

,

𝑗 𝐿𝑗

(12)
) (13)

III. SIMULATION STUDY

We carry out a simulation study to verify the validity of our approach on recall, precision and average precision presented in Section II. We choose two settings, with 𝑚 = 100, 𝑛 =

Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d0on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.

1000 and 𝑚 = 500, 𝑛 = 2000 and go through the following steps:
1) Generate a random permutation, 2) Calculate the value of recall, precision when 𝑡 = 𝑚
(they are the same) and average precision, 3) Repeat steps 1) and 2) 10000 times, obtain a sampling
distribution of recall (precision) and average precision, 4) Find the empirical mean and variance of those 10000
recalls and average precisions,
The theoretical means and variances of recall and precision are calculated with equations (3) and (4), which give 𝐸{𝑟(100)} = 𝐸{𝑝(100)} = 0.1 and 𝑉 𝑎𝑟{𝑟(100)} = 𝑉 𝑎𝑟{𝑝(100)} = 0.00081 for 𝑚 = 100, 𝑛 = 1000 and 𝐸{𝑟(500)} = 0.25 and 𝑉 𝑎𝑟{𝑟(500)} = 0.00028 for 𝑚 = 500, 𝑛 = 2000. The corresponding empirical means and variances, obtained from the simulation, are 0.099985, 0.000824 for 𝑚 = 100, 𝑛 = 1000 and 0.250023, 0.0002824 for 𝑚 = 500, 𝑛 = 2000, respectively. Table III lists the theoretical and empirical means and variances, as well as selected quantiles of the distribution of the simulated average precisions, with the ﬁrst two rows showing the ﬁrst setting and the next two rows presenting results for the second setting. The histogram of the simulated average precisions of the second setting is presented in Figure 1 with normal density curve of the limiting distribution overlaid. It is shown clearly that the result is satisfactory.

Empirical Theoretical Empirical Theoretical

mean .1037 .1037 .2527 .2522

variance .0001286 .0001392 .000096 .000142

2.5% .0876 .0806 .2347 .2288

50% .1044 .1037 .2521 .2522

97.5% .1321 .1268 .2731 .2755

TABLE III COMPARISON BETWEEN EMPIRICAL AND THEORETICAL DISTRIBUTIONS
OF SIMULATED AVERAGE PRECISIONS, ON MEANS, VARIANCES AND QUANTILES.

40

30

20

Density

10

0

0.22

0.24

0.26

0.28

0.30

Average Precision

Fig. 1. Histogram of simulated average precisions with theoretical density.

IV. CONCLUSION
Recall, precision and average precision are three performance measures in evaluating different ranking algorithms for rare target detection problems. Little has been done on the statistical properties of these three measures. In this article, we derived the sampling distribution for recall, precision and average precision under random selection. More important, we provided a theoretical way to calculate the limiting distribution for average precisions when the number of hits is large. Based on the limiting distribution, hypothesis testing can be conducted to test whether one ranking algorithm signiﬁcantly outperforms the other in detecting the rare targets.

APPENDIX 1) Derivation of moments of 𝐿𝑖:

(

)( )

𝐸(𝐿𝑖)

=

𝑛−∑𝑚+𝑖 𝑘

𝑛−𝑘 𝑚(− 𝑗

𝑘−1 )𝑗 − 1

𝑘=𝑖

𝑛−1 ( ) (𝑚 − 1 )

=

𝑛−∑𝑚+𝑖 𝑖

𝑘 𝑗

𝑛−𝑘 (𝑚)− 𝑗

𝑘=𝑖

𝑚𝑛 𝑛𝑚

=

𝑖𝑛 𝑚

(

)( )

𝐸(𝐿2𝑖 )

=

𝑛−∑𝑚+𝑖 𝑘2
𝑘=𝑖

𝑛−𝑘 𝑘−1 𝑚(− 𝑗 )𝑗 − 1
𝑛−1

𝑚−1

(

)( )

=

𝑛−∑𝑚+𝑖

𝑘(𝑘

+

1)

𝑛−𝑘 (𝑚 − 𝑗

)

𝑘−1 𝑗−1

𝑘=𝑖

𝑛−1

(

𝑚)−(1 )

𝑛−∑𝑚+𝑖 𝑘 −

𝑛−𝑘 𝑚(− 𝑗

𝑘−1 )𝑗 − 1

𝑛−1

𝑘=𝑖

(𝑚 − 1) (

)

=

𝑛−∑𝑚+𝑖 𝑖(𝑖 + 1)

𝑘+1 𝑗 +(1

𝑛−𝑘 𝑚)− 𝑗

𝑘=𝑖

𝑚(𝑚+1) 𝑛 + 1 𝑛(𝑛+1) 𝑚 + 1

− 𝐸(𝐿𝑖)

=

𝑖𝑛(𝑖 + 1)(𝑛 + 𝑚(𝑚 + 1)

1)

−

𝑖𝑛 𝑚

=

𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 − 𝑚) 𝑚(𝑚 + 1)

𝑉 𝑎𝑟(𝐿𝑖) = 𝐸(𝐿2𝑖 ) − 𝐸2(𝐿𝑖)

=

𝑖𝑛(𝑖𝑛 + 𝑖 + 𝑛 − 𝑚(𝑚 + 1)

𝑚)

−

𝑖2𝑛2 𝑚2

Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d1on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.

𝐸 (𝐿𝑖 𝐿𝑗 )

( )(

)(

)

∑ ∑ 𝑠𝑡

𝑠−1 𝑖−1

=

𝑡−𝑠−1 (𝑗 − 𝑖 −)1

𝑛−𝑡 𝑚−𝑗

𝑠𝑡

𝑛−2

𝑚−2

and

()

𝐸

1 𝐿𝑖

≈

1 𝐸(𝐿𝑖)

+ {

𝑉 𝑎𝑟(𝐿𝑖) 𝐸3(𝐿𝑖)

}

=

1 𝐸(𝐿𝑖)

1

+

𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)

.

= 𝐴+𝐵 −𝐶 ( )(

)( )

𝐴

=

∑ ∑ 𝑠(𝑡 − 𝑠)

𝑠−1 𝑖 − 1(

𝑡−𝑠−1 𝑗 − )𝑖 − 1

𝑛−𝑡 𝑚−𝑗

𝑠𝑡

𝑛−2

𝑚−2

( )( )( )

∑ ∑ 𝑖(𝑗 − 𝑖)

𝑠 𝑖

=

𝑡−𝑠 𝑛−𝑡 𝑗 −(𝑖 ) 𝑚 − 𝑗

𝑠𝑡

𝑚(𝑚−1) 𝑛

𝑛(𝑛−1) 𝑚

=

𝑖𝑛(𝑗 − 𝑖)(𝑛 − 1) 𝑚(𝑚 − 1)

( )(

)(

)

𝐵

=

∑ ∑ 𝑠(𝑠 + 1)

𝑠−1 𝑖 − 1(

𝑡−𝑠−1 𝑗 −)𝑖 − 1

𝑛−𝑡 𝑚−𝑗

𝑠𝑡

𝑛−2

𝑚−2

( )(

)(

)

∑ ∑ 𝑖(𝑖 + 1)

𝑠 𝑖

=

𝑡−𝑠−1 𝑛−𝑡 𝑗 −(𝑖 − 1 ) 𝑚 − 𝑗

𝑠𝑡

𝑚(𝑚−1) 𝑛 − 1

𝑛(𝑛−1) 𝑚 − 1

The

second

order

Taylor

expansion

of

𝑔(𝑥)

=

1 𝑥2

at

𝑥0 = 𝐸(𝐿𝑖) is given by

𝑔(𝑥)

≈

1 𝑥20

−

2(𝑥 − 𝑥0) 𝑥30

+

3(𝑥

− 𝑥0)2 𝑥40

.

Hence

1 𝐿2𝑖

≈

1 𝐸2(𝐿𝑖)

−

2{𝐿𝑖 − 𝐸(𝐿𝑖)} 𝐸3(𝐿𝑖)

+

3{𝐿𝑖 − 𝐸(𝐿𝑖)}2 𝐸4(𝐿𝑖)

and

()

𝐸

1 𝐿2𝑖

≈

1 𝐸2(𝐿𝑖)

+ {

3𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)

}

=

1 𝐸2(𝐿𝑖)

1

+

3𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)

.

Then

()

()

()

𝑉 𝑎𝑟

1 𝐿𝑖

=

𝐸

1 𝐿2𝑖

− 𝐸2 {

1 𝐿𝑖

}

=

𝑉 𝑎𝑟(𝐿𝑖) 𝐸4(𝐿𝑖)

1

−

𝑉 𝑎𝑟(𝐿𝑖) 𝐸2(𝐿𝑖)

REFERENCES

=

𝑖𝑛(𝑖 + 1)(𝑛 − 1) 𝑚(𝑚 − 1)

( )(

)(

)

∑∑ 𝑠

𝑠−1 𝑖−1

𝐶=

𝑡−𝑠−1 (𝑗 − 𝑖 −)1

𝑛−𝑡 𝑚−𝑗

𝑠𝑡

𝑛−2

𝑚−2

( )(

)(

)

∑∑ 𝑖

𝑠 𝑖

=

𝑡−𝑠−1 𝑛−𝑡 𝑗 − 𝑖(− 1 )𝑚 − 𝑗

𝑠𝑡

𝑚−1 𝑛 − 1

𝑛−1 𝑚 − 1

=

𝑖(𝑛 − 1) 𝑚−1

𝐸 (𝐿𝑖 𝐿𝑗 )

=

𝑖(𝑛 − 1)(𝑛𝑗 + 𝑛 − 𝑚(𝑚 − 1)

𝑚)

2)

Moments

of

1 𝐿𝑖

:

The second order

Taylor

expansion

of

𝑓 (𝑥)

=

1 𝑥

at

𝑥0

=

𝐸(𝐿𝑖) is given by

𝑓 (𝑥)

≈

1 𝑥0

−

𝑥

− 𝑥0 𝑥20

+

(𝑥

− 𝑥0 𝑥30

)2

.

[1] Bolton, R. J. and Hand, D. J. (2002), Statistical Fraud Detection: A Review, Statistical Science, 17, 235–255.
[2] Fix, E. and Hodges, J. L. (1951), Discriminatory AnalysisNonparametric Discrimination: Consistency Properties, Report Number 4, Project Number 21-49-004, USAF School of Aviation Medicine, Randolph Field, Texas.
[3] Cover, T. and Hart, P. (1967), Nearest Neighbor Pattern Classiﬁcation, IEEE Transactions on Information Theory, IT-13, 21–27.
[4] Cortes, C. and Vapnik, V. (1995), Support Vector Networks, Machine Learning, 20, 273–294.
[5] Zhu, M., Su, W. and Chipman, H. A. (2006), LAGO: A Computationally Efﬁcient Approach for Statistical Detection, Technometrics, 48, 193– 205.
[6] Michael Buckland and Fredric Gey (1994), The Relationship Between Recall and Precision, Journal of the American Society for Information Science, 45(1), 12–19.
[7] Zhu, Mu. (2004), Recall, Precision and Average Precision, unpublished technical report, Department of Statistics and Actuarial Science, University of Waterloo.
[8] Javed A. Aslam, Emine Yilmaz, and Virgiliu Pavlu (2005), A Geometric Interpretation of R-precision and Its Correlation with Average Precision, Proceedings of the 28th annual international ACM SIGIR conference on Research and Development in Information Retrieval, 573 - 574, Salvador, Brazil.
[9] Alexander Yeh (2000), More Accurate Tests for the Statistical Significance of Result Differences, Proceedings of the 18th conference on Computational Linguistics, Volume 2, 947–953.
[10] Cyril Goutte and Eric Gaussier (2005), A Probabilistic Interpretation of Precision, Recall and F-score, with Implication for Evaluation, Advances in Information Retrieval, 345–359, Springer.

Hence

1 𝐿𝑖

≈

1 𝐸(𝐿𝑖)

−

𝐿𝑖 − 𝐸(𝐿𝑖) 𝐸2(𝐿𝑖)

+

{𝐿𝑖 − 𝐸(𝐿𝑖)}2 𝐸3(𝐿𝑖)

Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downlo1a3de5d2on March 25,2025 at 12:44:19 UTC from IEEE Xplore. Restrictions apply.