This commit is contained in:
Johannes Paehr
2025-10-26 17:12:09 +01:00
parent c4354c0441
commit 376edbdfbc
19 changed files with 3256 additions and 253 deletions

Binary file not shown.

View File

@@ -1 +1 @@
{"pageIndex":0,"scale":"page-width","top":579,"left":-14,"scrollMode":0,"spreadMode":0}
{"pageIndex":0,"scale":"page-width","top":579,"left":-9,"scrollMode":0,"spreadMode":0}

View File

@@ -0,0 +1,473 @@
J. Mol. Biol. (1970) 48, 443453
A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins
SAUL B. NEEDLEMAN ~LNDCHRISTIAN D. WIJN~CH Department of Biochemistry, Northwestern University, and
Nuclear Medicine Service, V. A. Research Hospital Chicago, Ill. 60611, U.S.A.
(Received 21 July 1969)
A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed. From these findings it is possible to determine whether significant homology exists between the proteins. This information is used to trace their possible evolutionary development.
The maximum match is a number dependent upon the similarity of the sequences. One of its definitions is the largest number of amino acids of one protein that can be matched with those of a second protein allowing for all possible interruptions in either of the sequences. While the interruptions give rise to a very large number of comparisons, the method efficiently excludes from consideration those comparisons that cannot contribute to the maximum match.
Comparisons are made from the smallest unit of significance, a pair of amino acids, one from each protein. All possible pairs are represented by a two-dimensional array, and all possible comparisons are represented by pathways through the array. For this maximum match only certain of the possible pathways must, be evaluated. A numerical value, one in this case, is assigned to every cell in the array representing like amino acids. The maximum match is the largest number that would result from summing the cell values of every pathway.
1. Introduction The amino acid sequences of a number of proteins have been compared to determine whether the relationships existing between them could have occurred by chance. Generally, these sequences are from proteins having closely related functions and are so similar that simple visual comparisons can reveal sequence coincidence. Because the method of visual comparison is tedious and because the determination of the significance of a given result usually is left to intuitive rationalization, computerbased statistical approaches have been proposed (Fitch, 1966; Needleman & Blair, 1969).
Direct comparison of two sequences, based on the presence in both of corresponding amino acids in an identical array, is insufficient to establish the full genetic relationships between the two proteins. Allowance for gaps (Braunitzer, 1965) greatly multiplies the number of comparisons that can be made but introduces unnecessary and partial comparisons.
2. A General Method for Sequence Comparison The smallest unit of comparison is a pair of amino acids, one from each protein. The maximum match can be defined as the largest number of amino acids of one protein that can be matched with those of another protein while allowing for all possible deletions.,
443
444
S. B. NEEDLEMAN
ASD C. D. WUNSCH
The maximum match can be determined by representing in a two-dimensional array? all possible pair combinations that can be constructed from the amino acid sequences of the proteins, A and B, being compared. If the amino acids are numbered from the N-terminal end, Aj is the jth amino acid of protein A and Bi is the ith amino acid of protein B. The Aj represent the columns and the Bi the rows of the two-dimensional array, MAT. Then the cell, MATij, represents a pair combination that contains Aj and Bi.
Every possible comparison can now be represented by pathways through the array. An i or j can occur only once in a pathway because a particular amino acid cannot occupy more than one position at one time. Furthermore, if MATmn is part of a pathway including MATij, the only permissible relationships of their indices are m > i, n > j or m < i, n < j. Any other relationships represent permutations of one or both amino acid sequences which cannot be allowed since this destroys the significance of a sequence. Then any pathway can be represented by MATab . . . MATyz, where a 3 1, b 3 1, the i and j of all subsequent cells of MAT are larger than the running indices of the previous cell and y < K, x 4 M, the total number of amino acids comprising the sequences of proteins A and B, respectively. A pathway is signified by a line connecting cells of the array. Complete diagonals of the array contain no gaps. When MATij and MATmn are part of a pathway, i - m # j - n is a sufficient, but not. necessary condition for a gap to occur. A necessary pathway through MAT is defined as one which begins at a cell in the first column or the first row. Both i and j must increase in value; either i or j must increase by only one but the other index may increase by one or more. This leads to the next cell in a MAT pathway. This procedure is repeated until either i or j, or both, equal their limiting values, I{ and 111, respectively. Every partial or unnecessary pathway will be contained in at least one necessary pathway.
In the simplest method, MATij is assigned the value, one, if Aj is the same kind of amino acid as Bi; if they are different amino acids, MATij is assigned the value, zero. The sophistication of the comparison is increased if, instead of zero or one, each cell value is made a function of the composition of the proteins, the genetic code triplets representing the amino acids, the neighboring cells in the array, or any theory concerned with the significance of a pair of amino acids. A penalty factor, a number subtracted for every gap made, may be assessedas a barrier to allowing the gap. The penalty factor could be a function of the size and/or direction of the gap. No gap would be allowed in the operation unless the benefit from allowing that gap would exceed the barrier. The maximum-match pathway then, is that pathway for u,bich the sum of the assigned cell values (less any penalty factors) is largest. MAT can be broken up into subsections operated upon independently. The method also can be expanded to allow simultaneous comparison of several proteins using the amino acid sequences of n proteins to generate an n-dimensional array whose cells represent all possible combinations of n amino acids, one from each protein.
The maximum-match pathway can be obtained by beginning at the terminals of the sequences (i = y, j = Z) and proceeding toward the origins, first by adding to the value of each cell possessing indices i = y - 1 and/or j = z - 1, the maximum value from among all the cells which lie on a pathway to it. The process is repeated for indices i = y - 2 and/or j = z - 2. This increment in the indices is continued until all cells in the matrix have been operated upon. Each cell in this outer row or column will contain the maximum number of matches that can be obtained by originating
SIMILARITIES
IN AMINO ACID SEQUENCE
446
any pathway at that cell and the largest number in that row or column is equal to the maximum match; the maximum-match pathway in any row or column must begin at this number. The operation of successive summations of cell values is illustrated in Figures 1 and 2.
ABCNJROCLCRPM A 1
1
1
1
4332200
B121111
P000000
FIQ. 1. The maximum-match operation for necessary pathways. The number contained in each cell of the array is the largest number of identical pairs that can be found if that cell is the origin for a pathway which proceeds with increases in running indices. Identical pairs of amino acids were given the value of one. Blank cells which represent non-identical pairs have the value, zero. The operation of successive summations was begun at the last row of the array and proceeded row-by-row towards the first row. The operation has been partially completed in the R row. The enclosed cell in this row is the site of the cell operation which consists of a search along the subrow and subcolumn indicated by borders for the largest value, 4 in subrow C. This value is acldecl to the cell from which the search began.
ABCNJROCLCRPM
A
J
4332100
C
J
N
R
C
K333333 33
C
R 2 11112
111
B
P _0_0000-0.-0-000010
.____
FIG. 2. Contributors to the maximum match in the completed array. The alternative pathways that could form the maximum match are illustrated. The maximum match terminates rtt the largest number in the first row or first column, 8 in this case.
446
S. B. NEEDLEMAN
AND C. D. WUNSCH
It is apparent that the above array operation can begin at any of a number of points along the borders of the array, which is equivalent to a comparison of N-terminal residues or C-terminal residues only. As long as the appropriate rules for pathways are followed, the maximum match will be the same. The cells of the array which contributed to the maximum match, may be determined by recording the origin of the number that was added to each cell when the array was operated upon.
3. Evaluating the Significance of the Maximum Match
A given maximum match may represent the maximum number of amino acids matched, or it may just be a number that is a complex function of the relationship between sequences. It will, however, always be a function of both the amino acid compositions of the proteins and the relationship between their sequences. One may ask whether a particular result found differs significantly from a fortuitous match between two random sequences. Ideally,one would prefer to know the exact probability of obtaining the result found from a pair of random sequences and what fraction of the total possibilities are less probable, but that is prohibitively difficult, especially if a complex function were used for assigning a value to the cells.
As an alternative to determining the exact probabilities, it is possible to estimate the probabilities experimentally. To accomplish the estimate one can construct two sets of random sequences, a set from the amino acid composition of each of the proteins compared. Pairs of random sequences can then be formed by randomly drawing one member from each set. Determining the maximum match for each pair selected will yield a set of random values. If the value found for the real proteins is significantly different from the values found for the random sequences, the difference is a function of the sequences alone and not of the compositions. Alternatively, one can construct random sequences from only one of the proteins and compare them with the other to determine a set of random values. The two procedures measure different probabilities. The first procedure determines whether a significant relationship exists between the real sequences. The second procedure determines whether the relationship of the protein used to form the random sequences to the other proteins is significant. It bears reiterating that the integral amino acid composition of each random sequence must be equal to that of the protein it represents.
The amino acid sequence of each protein compared belongs to a set of sequences which are permutations. Sequences drawn randomly from one or both of these sets are used to establish a distribution of random maximum-match values which would include all possible values if enough comparisons were made. The null hypothesis, that any sequence relationship manifested by the two proteins is a random one, is tested. If the distribution of random values indicates a small probability that a maximum match equal to, or greater than, that found for the two proteins could be drawn from the random set, the hypothesis is rejected.
4. Cell Values and Weighting Factors
To provide a theoretical framework for experiments, amino acid pairs may be classified into two broad types, identical and non-identical pairs. From 20 different amino acids one can construct 180 possible non-identical pairs. Of these, 75 pairs of amino acids have codons (Marshall, Caskey & Nirenberg, 1967) whose bases differ at only one position (Eck & Dayhoff, 1966). Each change is presumably the result of a
SIMILARITIES IN AMINO ACID SEQUENCE
447
single-point mutation. The majority of non-identical pairs have a maximum of only one or zero corresponding bases. Due to the degeneracy of the genetic code, pair differences representing amino acids with no possible corresponding bases are uncommon even in randomly selected pairs. If cells are weighted in accordance with the maximum number of corresponding bases in codons of the represented amino acids, the maximum match will be a function of identical and non-identical pairs. For comparisons in general, the cell weights can be chosen on any basis.
If every possible sequence gap is allowed in forming the maximum match, the significance of the maximum match is enhanced by decreasing the weight of those pathways containing a large number of gaps. A simple way to accomplish this is to assign a penalty factor, a number which is subtracted from the maximum match for each gap used to form it. The penalty is assigned before the maximum match is formed. Thus the pathways will be weighted according to the number of gaps they contain, but the nature of the contributors to the maximum match will be affected as well. In proceeding from one cell to the next in a maximum-match pathway, it is necessary that the difference between each cell value and the penalty, be greater than the value for a cell in a pathway that contains no gap. If the value of the penalty were zero, all. possible gaps could be allowed. If the value were equal to the theoretical value for the maximum match between two proteins, it would be impossible to allow a gap and. the maximum match would be the largest of the values found by simply summing along the diagonals of the array; this is the simple frame-shift method.
5. Application of the Method
To illustrate the role of weighting factors in evaluating a maximum match, two proteins expected to show homology, whale myoglobin (Edmundson, 1965) and human /l-hemoglobin (Konigsberg, Goldstein & Hill, 1963), and two proteins not expected to exhibit homology, bovine pancreatic ribonuclease (Smyth, Stem t Moore, 1963) and hens egg lysozyme (Canfield, 1963) were chosen for comparisons.
The FORTRANprograms used in this study were written for the CDC3400 computer. The operations employed in forming the maximum match are those for the special case when none of the cells of the array have a value less than zero. Four types of amino acid pairs were distinguished and variable sets consisting of values to be assigned to each type of pair and a value for the penalty were established. The pair types are as follows:
Type 3. Identical pairs: those having a maximum of three corresponding bases in their codons.
Type 2. Pairs having a maximum of two corresponding bases in their codons. Type 1. Pairs having a maximum of one corresponding base in their codons. Type 0. Pairs having no possible corresponding base in their codons.
The value for type 3 pairs was 1.0 and the value for type 0 pairs was zero for all variable sets.
At program execution time, the a,mino acids (coded by two-digit numbers) of the sequences to be compared were read into the computer, and were followed by a twenty-by-twenty symmetrical array, the maximum correspondence array, analogous to one used by Fitch (1966), that contained all possible pairs of amino acids and identified each pair as to type. The RNA codons for amino acids used to construct the maximum-correspondence array were taken from a single Table (Marshall et al.,
448
S. B. NEEDLEMAN
AND C. D. WUNSCH
1967). The UGA, UAA and UAG codons were not used, but UUG was used as a codon for leucine. The subsequent data cards indicated the numerical values for a variable set.
The two-dimensional comparison array was generated row-by-row. The amino acid code numbers for Ai and Bj referenced the correspondence array to determine the type of amino acid pair constituted by Ai and Bj. The type number referenced a short array, the variable set, containing the type values, and the appropriate value from that set was assigned to the appropriate cell of the comparison array. The maximum match was then determined by the procedure of successive summations.
Following the determination of the maximum makh for the real proteins, the amino acid sequence of only one member of the protein pair was randomized and the match was repeated. The sequences of ,&hemoglobin and ribonuclease were the ones randomized. The randomization procedure was a sequence shuffling routine based on computer-generated random numbers. A cycle of sequence randomization-maximummatch determination was repeated ten times in all of the experiments in this report, giving the random values used for comparison wit,h the real maximum-match. The average and standard deviation for the random values of each variable set was estimated.
6. Results and Discussion
The use of a small random sample size (ten) was necessary to hold the computer time to a reasonable level. The maximum probable error in a standard deviation estimate for a sample this small is quite large and the results should be judged with this fact in mind. For each set of variables, it was assumed that the random values would be distributed in the fashion of the normal-error curve; therefore, the values of the first six random sets in the ,B-hemoglobin-myoglobin comparison were converted to standard measure, five was added to the result, and these values were plotted as one group against their calculated probit. The results of the plot are shown in Figure 3. The fit is good indicating the probable adequacy of the measured standard deviations for these variable sets in estimating distribution functions for random values through two standard deviations. The above fit indicates no bias in the randomization procedure. In other words, randomization of the sequence was complete before the maximum match was determined for any sequence in a random set.
The results obtained in the comparison of /3-hemoglobin with myoglobin are summarized in Table 1 and the results for the ribonuclease-lysozyme comparison are in Table 2. These Tables indicate the values assigned to the pair types, the penalty factor used in forming each of the maximum matches, and the statistical results obtained. The number of gaps roughly characterizes the nature of the pathway that formed the maximum match. A large number is indicative of a devious pathway through the array. One gap means that all of the pathway may be found on only two partial diagonals of the array.
The most important information is obtained from the standardized value of the maximum match for the real proteins, the difference from the mean in standarddeviation units. For this sample size all deviations greater than 3.0 were assumed to include less than 1% of the true random population and to indicate a significant difference. As might be expected, all matches of myoglobin and ,&hemoglobin show a significant deviation. Among the sets of variables, set 1, which results in a search for identical amino acid pairs while allowing for all deletions, indicates that 63
SIMILARITIES
IN SMINO ACID SEQUENCE
449
8 ----T------T--
_.._ --i.--.-.“-.-T---~i-..
7
6
I$ h
4
3
i
3
4
8
6
7
8
Variable in standardized measure
FIG. 3. Probit plot for six grouped random samples.
The solid line indicates the plot that would result from a probit analysis on an infinite number
of samples from a normally-distributed
population. The points represent the results of probit
calculations on 60 random maximum match values that were assumed to have come from one
population.
TABLE 1 /3-Hemoglobin-myoglobin maximum matches
Variable set
Match values for
pair types
2
1
Penalty
Maximum-match value sum
Real Random?
s
Real Minimum deletions
x
Real
Randomt
0
0
0
63.00
55.60 1.80 4.11
35
36.2
0
0
1.00
38.00
27.80 2.09 4.88
4
5.5
0.67
0.33
0
97.00
91.47 1.55 3.57
18
24.3
0.67
0.33
1.03 89.63
80.25 1.11 8.46
1
3.6
0.25
0.05
0
71.55
64.78 1 59 4.27
46
45.0
0.25
0.05
1.05 51.95
40.54 1.46 7.80
3
7.5
0.25
0.05
25
47.30
33.80 1.52 8.87
0
0
s is the estimated standard deviation; X, the standardized value, (real - random)/s, of the maximum match of the real proteins. The values for type 3 and type 0 pairs were 1-O and 0, respectively, in each variable set.
t An average value from 10 samples.
450
Variable set
8. B. NEEDLEMAN
AND C. D. WUNSCH
TABLE 2 Ribonuclease-lysoxyme maximum matches
Match values for
pair types
2
1
Penalty
Maximum-match value 8um
Real Randomt
9
Real Minimum deletions
X
Real
Random?
1
0
0
0
48.00
44.20 2.56 1.48 34
20.2
2
0
0
1.00 23.00
22.00 1.73 0.58
5
:i 2
3
0.67
0.33
0
78.33
76.17 0.82 2.64 21
18.8
4
0.67
0.33
1.03 67.93
67.37 1.27 0.43
2
2.2
5
0.25
0.05
0
56.00
52.26 2.12 1.77 35
33.5
6
0.25 0.05
1.05 33.70
33.02 1.66 0.41
8
6.8
7
0.25
0.05
25
28.15
27.67 1.75 0.22
0
0
8 is the estimated standard deviation; X, the maximum match of the real proteins. The values respectively in each variable set.
t An average value from 10 samples.
standardized value, for type 3 and type
(real-random)/a, of the 0 pairs were 1.0 and 0,
amino acids in ,&hemoglobin and myoglobin can be matched. To attain this match, however, it is necessary to permit at least 35 gaps. In contrast, when two gaps are allowed according to Braunitzer (1965), it is possible to match only 37 of the amino acids. Curiously, when this variable set was used for comparing human myoglobin (Hill, personal communication) with human /Lhemoglobin, the maximum match obtained was not significant. Differences between real and random values were highly significant, however, when other variable sets were used.
Variable set 2 attaches a penalty equal to the value of one identical amino acid pair to the search for identical amino acid pairs. This penalty will exclude from consideration any possible pathway that leaves and returns to a principal diagonal, thereby needing two gaps, in order to add only one or two amino acids to the maximum match. This set results in a total of 30 + 4 = 42 amino acids matched (the maximum-match value plus the number of gaps is reduced to four) and the significance of the result relative to set 1 appears to be increased. Braunitzers comparison would have a value of 37 - 2 = 35 using this variable set, hence it was not selected by the method.
Variable sets 3 and 4 have an interesting property. Their maximum-match values can be related to the minimum number of mutations needed to convert the selected parts of one amino acid sequence into the selected parts of the other. The minimum number of mutations concept in protein comparisons was first advanced by Fitch (1966). If the type values for these sets are multiplied by three, they become equal to their pair type and directly represent the maximum number of corresponding bases in the codons for a given amino acid pair. Thus the maximum match and penalty factors may be multiplied by three, making it possible to calculate the maximum number of bases matched in the combination of amino acid pairs selected by the maximum-match operation.
,%Hemoglobin, the smaller of the two proteins, contains 146 amino acids; consequently the highest possible maximum match (disregarding integral amino-acid composition data) with myoglobin is 146 x 3 = 438. Insufficient data are available
SIMILARITIES
IN AMINO ACID SEQUENCE
461
to analyze the result from set 3 on the basis of mutations. If it is assumed that the gap in set 4 does not exclude any part of/3-hemoglobin from the comparison, this set has a maximum of 3(89*63 + 1.03) = 272 bases matched, indicating a minimum of 438 272 = 166 point mutations in this combination. Using this variable set and placing gaps according to Braunitzer, a score of 88.6 was obtained, thus his match was not selected. Again it may be observed that the penalty greatly enhanced the significance of the maximum match.
Variable sets 5 and 6 have no intrinsic meaning and were chosen because the weight attached to type 2 and type 1 pairs is intermediate in value with respect to sets 1 and 2 and sets 3 and 4. The maximum match for set 6 is seen to have a highly significant value.
The data of set 7 are results that would be obtained from using the frame-shift method to select a maximum match; the penalty was large enough to prevent any gaps in the comparisons. The slight differences in significance found among the maximum-match values of /l-hemoglobin and myoglobin resulting from use of sets 4,6 and 7 are probably meaningless due to small sample size and errors introduced by the assumptions about the distribution functions of random values. Finding a value in set 7 that is approximately equal to those from sets 4 and 6 in significance is not surprising. A larger penalty factor would have increased the difference from the mean in sets 4 and 6 because almost every random value in each set was the result of more gaps than were required to form the real maximum match. Further, the gaps that are allowed are at the N-terminal ends so that about 85% of the comparison can be made without gaps. If an actual gap were present near the middle of one of the sequences, it would have caused a sharp reduction in the significance of the frame-shift type of match.
Set 3 is the only variable set in Table 2 that shows a possible difference. Assuming the value is accurate, other than chance, there is no simple explanation for the difference. A small but meaningful difference in any comparison could represent evolutionary divergence or convergence. It is generally accepted that the primary structure of proteins is the chief determinant of the tertiary structure. Because certain features of tertiary structure are common to proteins, it is reasonable to suppose that proteins will exhibit similarities in their sequences, and that these similarities will be sufficient to cause a significant difference between most protein pairs and their corresponding randomized sequences, being an example of submolecular evolutionary convergence. Further, the interactions of the protein backbone, side chains, and the solvent that determine tertiary structure are, in large measure, forces arising from the polarity and steric nature of the protein side-chains. There are conspicuous correlations in the polarity and steric nature of type 2 pairs. Heavy weighting of these pairs would be expected to enhance the significance of real maximum-match values if common structural features are present in proteins that are compared. The presence of sequence similarities does not always imply common ancestry in proteins. More experimentation will be required before a choice among the possibilities suggested for the result from set 3 can be made. If several short, sequences of amino acids are common to all proteins, it seems remarkable that the relationship of ribonuclease to lysozyme in six of the seven variable sets appears to be truly a random one. It should be noted, however, that the standard value of the real. maximum-match is positive in each variable set in this comparison.
This method was designed for the purpose of detecting homology and defining its nature when it can be shown to exist. Its usefulness for the above purposes depends in1
452
S. B. NEEDLEMAN
AND C. D. WUNSCH
part upon assumptions related to the genetic events that could have occurred in the evolution of proteins. Starting with the assumption that homologous proteins are the result of gene duplication and subsequent mutations, it is possible to construct several hypothetical amino-acid sequences that would be expected to show homology. If one assumes that following the duplication, point mutations occur at a constant, or variable rate, but randomly, along the genes of the two proteins, after a relatively short period of time the protein pairs will have nearly identical sequences. Detection of the high degree of homology present can be accomplished by several means. The use of values for non-identical pairs will do little to improve the significance of the results. If no, or very few, deletions (insertions) have occurred, one could expect to enhance the significance of the match by assigning a relatively high penalty for gaps. Later on in time the hypothetical proteins may have a sizable fraction of their codons changed by point mutations, the result being that an attempt to increase the significance of the maximum match will probably require attaching substantial weight to those pairs representing amino acids still having two of the three original bases in their codons. Further, if a few more gaps have occurred, the penalty should be reduced to a small enough value to allow areas of homology to be linked to one another. At a still later date in time more emphasis must be placed on non-identical pairs, and perhaps a very small or even negative penalty factor must be assessed. Eventually, it will be impossible to detect the remaining homology in the hypothetical example by using the approach detailed here.
From consideration of this simple model of protein evolution one may deduce that the variables which maximize the significance of the difference between real and random proteins gives an indication of the nature of the homology. In the comparison of human P-hemoglobin to whale myoglobin, the assignment of some weight to type 2 pairs considerably enhances the significance of the result, indicating substantial evolutionary divergence. Further, few deletions (additions) have apparently occurred.
It is known that the evolutionary divergence manifested by cytochrome (Margoliash, Needleman Bt Stewart, 1963) and other heme proteins (Zuckerkandl & Pauling, 1965) did not follow the sample model outlined above. Their divergence is the result of non-random mutations along the genes. The degree and type of homology can be expected to differ between protein pairs. As a consequence of the difference there is no a priori best set of cell and operation values for maximizing the significance of a maximum-match value of homologous proteins, and as a corollary to this fact, there is no best set of values for the purpose of detecting only slight homology. This is an important consideration, because whether the sequence relationship between proteins is significant depends solely upon the cell and operation values chosen. If it is found that the divergence of proteins follows one or two simple models, it may be possible to derive a set of values that will be most useful in detecting and defining homology.
The most common method for determining the degree of homology between protein pairs has been to count the number of non-identical pairs (amino acid replacements) in the homologous comparison and to use this number as a measure of evolutionary distance between the amino acid sequences. A second, more recent concept has been to count the minimum number of mutations represented by the non-identical pairs. This number is probably a more adequate measure of evolutionary distance because it utilizes more of the available information and theory to give some measure of the number of genetic events that have occurred in the evolution of the proteins. The approach outlined in this paper supplies either of these numbers.
SIMILARITIES
IN AMINO ACID SEQUENCE
463
This work was supported in part by grants to one of us (S.B.N.) from the U.S. Public Health Service (1 501 FR 05370 02) and from Merck Sharp & Dohme.
REFERENCES
Braunitzer, G. (1966). In Evolving Genes and Proteina, ed. by V. Bryson & H. J. Vogel, p. 183. New York: Academic Press.
Canfield, R. (1963). J. Biol. Chem. 238, 2698. Eck, R. V. & Dayhoff, M. 0. (1966). Atlas of Protein Sequence and Xtructure. Silver Spring,
Maryland: National Biomedical Research Foundation. Edmundson, A. B. (1965). Nature, 205, 883. Fitch, W. (1966). J. Mol. BioZ. 16, 9. Konigsberg, W., Goldstein, J. & Hill, R. J. (1963). J. BioZ. Chem. 238, 2028. Margoliash, E., Needleman, S. B. & Stewart, J. W. (1963). Acta Chem. Stand. 17, S 250. Marshall, R. E., Caskey, C. T. & Nirenberg, M. (1967). Science, 155, 820. Needleman, S. B. & Blair, T. H. (1969). Proc. Nat. Acad. Sci., Wash. 63, 1127. Smyth, D. G., Stein, W. G. & Moore, S. (1963). J. BioZ. Chem. 238, 227. Zuckerkandl, E. & Pauling, L. (1965). In Evolving Genes and Proteins, ed. by V. Bryson
& H. J. Vogel, p. 97. New York: Academic Press.

View File

@@ -0,0 +1,13 @@
Title: PII: 0022-2836(70)90057-4
Creator: Acrobat 4.05 Capture Plug-in for Windows
Producer: Acrobat 4.05 Import Plug-in for Windows
CreationDate: 08/06/03 23:58:18
ModDate: 09/17/03 19:37:56
Tagged: no
Form: none
Pages: 11
Encrypted: no
Page size: 468 x 684 pts (rotated 0 degrees)
File size: 865922 bytes
Optimized: yes
PDF version: 1.3

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,13 @@
Title: Beyond (Multi-) Media
Author: Peter Hoffmann
Producer: Springer-i
CreationDate: 08/09/25 10:24:41
ModDate: 08/09/25 17:31:57
Tagged: yes
Form: AcroForm
Pages: 249
Encrypted: no
Page size: 476.22 x 680.315 pts (rotated 0 degrees)
File size: 24690258 bytes
Optimized: yes
PDF version: 1.7

View File

@@ -0,0 +1,243 @@
J. Mol. Bid. (1981) 147, 195-197
Identification of Common Molecular Subsequences
The identification of maximally homologous subsequences among sets of long
sequences is an important problem in molecular sequence analysis. The problem is
straightforward
only if one restricts consideration to contiguous subsequences
(segments) containing no internal deletions or insertions. The more general problem
has its solution in an extension of sequence metrics (Sellers 1974; Waterman et al.,
1976) developed to measure the minimum number of “events” required to convert
one sequence into another.
These developments in the modern sequence analysis began with the heuristic
homology algorithm of Needleman & Wunsch (1970) which first introduced an
iterative matrix method of calculation. Numerous other heuristic algorithms have
been suggested including those of Fitch (1966) and Dayhoff (1969). More mathemat-
ically rigorous algorithms were suggested by Sankoff (1972), Reichert et al. (1973)
and Beyer et al. (1979) but these were generally not biologically satisfying or
interpretable. Success came with Sellers (1974) development of a true metric measure
of the distance between sequences. This metric was later generalized by Waterman
et al. (1976) to include deletions/insertions
of arbitrary length. This metric
represents the minimum number of “mutational events” required to convert one
sequence into another. It is of interest to note that Smith et al. (1980) have recently
shown that under some conditions the generalized Sellers metric is equivalent to the
original homology algorithm of Needleman & Wunsch (1970).
In this letter we extend the above ideas to find a pair of segments, one from each of
two long sequences, such that there is no other pair of segments with greater
similarity (homology). The similarity measure used here allows for arbitrary length
deletions and insertions.
Algorithm
The two molecular sequences will be h=alaz . . . an and IZj= blb,
b,. A
similarity a(a,b) is given between sequence elements a and b. Deletions of length k
are given weight Wt. To find pairs of segments with high degrees of similarity, we set up a matrix H. First set
Hto = Ho, = 0 for 0 I k I n and 0 I 1 I m.
Preliminary values of H have the interpretation of two segments ending in ai and bj, respectively. relationship
that H, is the maximum similarity These values are obtained from the
Hij=max{Hi-,,j-1+S(ai,bj),
~F,X {Hi-k,j- W,}, ~2" {Hi,j-,- W,}, 0}, (1)
1 li<n and 1 <j<m.
0922-2836/80/09019&03
$02.00/O
195 0 1980 Academic Press Inc. (London) Ltd.
196
T. P. SMITH
AND M. S. LVATER>lAS
The formula for H, follows by considering the possibilities
segments at any ai and b,. (1) If ai and bj are associated, the similarity is
for ending the
Hi-l,j-l +s(ai,bj).
(2) If ai is at the end of a deletion of length k, the similarity is
Hi-k,j-Wk
(3) If bj is at the end of a deletion of length I, the similarity is
Hi-k,j- cc',.
(4) Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up to ai and bj.t
The pair of segments with maximum similarity is found by first locating the
maximum element of H. The other matrix elements leading to this maximum value
are than sequentially determined with a traceback procedure ending with an
element of H equal to zero. This procedure identifies the segments as well as
produces the corresponding alignment. The pair of segments with the next best
similarity is found by applying the traceback procedure to the second largest,
element of H not associated with the first traceback.
A simple example is given in Figure 1. In this example the parameters s(aibj) and
W, required were chosen on an a priori statistical basis. A match, ai = bj, produced
an s(aibj) value of unity while a mismatch produced a minus one-third. These values have an average for long, random sequences over an equally probable four letter set
of zero. The deletion weight must be chosen to be at least equal to the difference
between a match and a mismatch. The value used here was Wk= 1=0-t-1/3*k.
A
0.0 0.0 0.0 0.0 04 0.0 0.0 0.0 0.0 04 0.0 04 04 04
A
0.0 0.0 1,o 0.0 04 04 04 0.0 04 0.0 0.0 0.0 1,o 04
A
0.0 0.0 1.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.7
L7
0.0 0.0 0.0 0.7 0.3 0.0 I.0 04 04 04 I.0 1.o 0.0 0.i
G
0.0 0.0 04
1.0 0.3 0.0 04
0.i
1.0 0.0 0.0 0.7 0.7 I .o
c
0.0 I .o 0.0 o-o 2.0 1.3 0.3 1.0 0.3 2.0 0.7 0.3 0.3 0.3
(
0.0 1.0 0.1 0.0 1 .o 3.0 1.7 1.3 I.0 1.3 1.i 0.3 0.0 04
A
0.0 0.0 2.0 0.1 0.3 1.7 2.7 I.3
1.0 0.i
1 4
1.3 I .3 0.0
u
0.0 0.0 0.7 1.7 0.3 E-
2.1 2.3 1.0 0.5 1.7 2.0 I 4
1.0
u
0.0 0.0 0.3 0.3 1.3 I .o 1.3 2.3 2.0 0.7 1.7 2.7 1.7 I .o
G
0.0 0.0 0.0 l-3 0.0 1 .o 1.0 G
3.3 2.0 1.7 1.3 1.3 2.i
A
0.0 0.0 1 .o 0.0 I.0 0.3 0.7 0.7 56 3.0 I .i I.3 2.3 2.1)
c
~ 0.0 1.0 0.0 0.7 1,O I.0
0.7 1 ,7 1.7 3.0 1.;
1.3 I.0 2.0
G
0.0 0.0 0.7 1.0 0.3 0.7 1.7 0.3 2.i
I.7 23
2.3 1.0 2.0
G
0.0 0.0 0.0 1.7 0.7 0.3 0.3 1.3 1 ,3 1.3 1.3 2.3 24 2.0
FK:, 1. Hij matrix generated from the application ofeqn (1) to the sequences A-4-U-G-(-(!-$-~-~~(~-.~~
C-G-G and C-A-G-C-C-U-C-G-C-U-U-A-G.
The underlined elements indicate the trackback path fkom the
maximal element 3.30.
t Zero need not be included unless there are negative values ofs(a.b)
LETTERS TO THE EDITOR
197
Note. in this simple example, that the alignment obtained:
-G-C-C-A-U-U-G-G-C-C-UU-C.G-
contains both a mismatch and an internal deletion. It is the identification of the latter which has not been previously possible in any rigorous manner.
This algorithm not only puts the search for pairs of maximally similar segments on a mathematically rigorous basis but it can be efficiently and simply programmed on a computer.
Northern Michigan University
T. F. SMITH
Los Alamos Scientific Laboratory P.0. Box 1663, Los Alamos N. Mex. 87545. U.S.A.
Received 14 July 1980
M. S. WATERMAN
REFERENCES
Beyer, W. A., Smith, T. F., Stein. M. L. & Ulam, S. M. (1979). Math. Biosci. 19, 9-25. Dayhoff. M. 0. (1969). Atlas of Protein Sequence and Structure, National Biomedical Research
Foundation, Silver Springs, Maryland. Fitch, W. M. (1966). J. Mol. Biol. 16, 9-13. Needleman, S. B. & Wunsch, C. D. (1970). J. Mol. Biol. 48, 443-453. Reich&. T. A., Cohen, D. N. & Wong, A. K. C. (1973). J. Theoret. Biol. 42, 245-261. Sankoff, D. (1972). Proc. Nat. Acud. Sci., U.S.A. 61, 44. Sellers. P. H. (1974). J. Appl. Math. (Siam), 26, 787-793. Smith, T. F., Waterman, M. S. & Fitch, W. M. (1981). J. Mol. Evol. In the press. Waterman. M. S., Smith, T. F. & Beyer, W. A. (1976). Advan. Math. 20, 367-387.
,Votp added in proof: A weighting similar to that given above was independently developed by Walter Goad of Los Alamos Scientific Laboratory.

View File

@@ -0,0 +1,13 @@
Title: PII: 0022-2836(81)90087-5
Creator: Acrobat 4.05 Capture Plug-in for Windows
Producer: Acrobat 4.05 Import Plug-in for Windows
CreationDate: 08/18/03 18:13:35
ModDate: 09/19/03 18:46:57
Tagged: no
Form: none
Pages: 3
Encrypted: no
Page size: 468 x 684 pts (rotated 0 degrees)
File size: 179287 bytes
Optimized: yes
PDF version: 1.3

View File

View File

@@ -0,0 +1,14 @@
Title: Binder3.pdf
Author: sschimke
Creator: PScript5.dll Version 5.2.2
Producer: Acrobat Distiller 6.0.1 (Windows)
CreationDate: 04/04/06 17:39:24
ModDate: 04/04/06 17:39:24
Tagged: no
Form: none
Pages: 4
Encrypted: no
Page size: 595 x 842 pts (A4) (rotated 0 degrees)
File size: 883254 bytes
Optimized: yes
PDF version: 1.4

View File

@@ -0,0 +1,676 @@
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/14269194
Disambiguating Complex Visual Information: Towards Communication of Personal Views of a Scene
Article in Perception · February 1996
DOI: 10.1068/p250931 · Source: PubMed
CITATIONS
118
3 authors, including:
Marc Pomplun University of Massachusetts Boston 169 PUBLICATIONS 3,799 CITATIONS
SEE PROFILE
READS
270
Boris M Velichkovsky Kurchatov Institute 191 PUBLICATIONS 4,282 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects: Human Factors and Ergonomics View project In search of the "I" View project
All content following this page was uploaded by Marc Pomplun on 27 May 2014.
The user has requested enhancement of the downloaded file.
Disambiguating Complex Visual Information: Towards Communication of Personal Views of a Scene
Marc Pompluna, Helge Rittera, Boris Velichkovskya;b
a Department of Neuroinformatics, University of Bielefeld, Germany b Unit of Applied Cognitive Research, Dresden University of Technology, Germany
email: impomplu@techfak.uni-bielefeld.de
Abstract. Two experiments on perception and eye-movement scanning of a set of
6 overtly ambiguous pictures are reported. In the rst experiment it was shown that speci c perceptual interpretations of an ambiguous picture usually correlate with parameters of the gaze-position distributions. In the second experiment these distributions were used for an image-processing of initial pictures in such a way that in regions which attracted less xations the brightness of all elements was lowered. The pre-processed pictures were then shown to a group of 150 na?ve subjects for an identi cation. The results of this experiment demonstrated that in 4 out of 6 pictures it was possible to in uence perception of other persons in the predicted way, i.e. to shift spontaneous reports of na?ve subjects in the direction of interpretations that accompanied gaze-position data used for the pre-processing of initial pictures. Possible reasons for a failure of such a communication of personal views in two cases are also discussed.
1 Introduction
Pictures and scenes are notoriously ambiguous. Culture, experience, attention, functional state and dozens of other factors determine that two persons may have completely di erent subjective perception of one and the same physical situation. For any educated psychologist this is a well-established basic fact which certainly deserves investigation but cannot be changed. There is a long tradition of illustrating this multistable and idiosyncratic character of individual perceptive consciousness with the help of ambiguous gures both in history of art (Gombrich, 1969; Chapman, 1987) and in experimental psychology where perception of ambiguous pictures became the goal of countless studies (Vicholkovska, 1906; Boring, 1942; Velichkovsky, Luria & Zinchenko, 1973; Cooper, 1994; Rock, Hall & Davis, 1994). In other disciplines like e.g. computer science pictures and scenes are processed and transformed but usually from a physicalist point of view, although active vision approach and neural computation paradigm can be regarded as signs of change in the tradition (see Lee & Bajcsy, 1992; Ritter, Martinetz & Schulten, 1992). In this paper we are going to demonstrate that an interdisciplinary convergence of these two lines of research is possible and welcomed on practical reasons: The transformation of physical pictures from the perspective of their perception by an active observer can support an unambiguous communication of the subjective views to other persons.
1
In order to approach this problem experimentally we used the most reliable (albeit certainly not ideal, see Zinchenko & Vergiles, 1972) objective index of visual perceptual activity, namely the data about eye movements of an observer. Since Yarbus' (1967) and other earlier investigation it is generally accepted that gaze position data are a fairly sensitive index of individual preferences and task attitudes. Our interest in eye movements was based on pragmatic considerations and had no direct relation to hypotheses about their possible casual role in detection, recognition and identi cation of visual information (e.g. Noton & Stark, 1971). In recent years there were several studies demonstrating the importance of clustering gaze position data (Nodine, Kungel, Toto & Krupinsky, 1992; Pillalamari, Barnette & Birkmire, 1993) for explication of subject's knowledge and strategies. However, these studies did not change the situation of a passive registration of eye movements in principle. The gaze-contingent change of local characteristics of visual displays in dependence on parameters of eye movements remained one of the paradigms of investigation that was predominantly used in the eld of reading research. (see e.g. Rayner, Well, Polatsek & Bertera, 1982).
Our intention was to make a further step: The use of the information about gaze position in order to process the picture/scene for reconstruction of its outlook as it could be available to the active observer who produced the eye-movements. In other words, we want to approach if not the famous question of a philosopher "What is it like to be a bat?" then at least "What is it like to be Mrs/Mr. Smith's (visual) perceptual homunculus?". The answer on this last question can be of practical importance as many forms of non-verbal expertise, for instance, in interpretation of medical images (Norman, Coblentz, Brooks & Babcook, 1992), are still hardly available to an objective analysis and public communication. Fluctuations in perception of classic ambiguous pictures seem to provide a suitable experimental model for the study, because variants of their perceptual interpretation are well known and, in addition, eye movements have been investigated in numerous previous studies. In particular, these studies have demonstrated that the eye movement parameters can be speci c to the di erent subjective interpretations of the ambiguous gures (e.g. Ellis & Stark, 1978; Gale & Findlay, 1983).
Recently Garcia-Perez (1992) proposed that eye movements during perception of ambiguous gures, such as the Necker cube and the Boring gure, can lead to a kind of spatial frequency ltration favoring the interpretation which corresponds to the location of gaze in a corresponding "focal area" of the gure (for other similar proposals, see Kawabata & Mori, 1992; Tsal & Kolbet, 1985). The method developed for the present investigation can be helpful in empirically proving hypotheses of this kind. The study was based on the use of an advanced imaging eye-tracker as well as on our previous work on eye-movement mediated communication of attention in cooperative problem solving (Velichkovsky, 1995).
2 Experiment 1: Eye-movement characteristics in perception of ambiguous pictures
This experiment was performed for collection of eye-movement data and evaluation of their speci city to di erent perceptual interpretations of ambiguous pictures.
2
2.1 Method
2.1.1 Apparatus
The system used in our experiments (Stampe, 1993) is an example of non-invasive imaging eye-trackers. It is based on the use of ISCAN RK-416PC pupil-tracking boards and two video cameras as inputs of information about the position of the head within the environment and the position of the pupil within the head. Fast calibration that remains stable over the whole period of study and does not su er from accidental blinks (which are detected and described as such), free head with permitted deviation from the straight-ahead
position up to 15o, and nally a practically unrestricted eld of view (80o in the horizontal dimension and 60o in the vertical) as well as the possibility to run experiments under nor-
mal illumination conditions made this eye-tracker to a perfect device for basic and applied studies. The average absolute precision of the gaze-position measurement with the help of
the eye-tracker lies within the range of 0:6 : : : 0:8o. By using a new calibration interface
based on parametrized arti cial neural networks, we improved the precision of measure-
ment by up to 0:4o. This made it possible to recruit even subjects wearing spectacles (see
Pomplun, Velichkovsky & Ritter, 1994).
2.1.2 Subjects
6 subjects na?ve about the purpose of the study participated in Experiment 1. They were students and co-workers of the computer science department at University of Bielefeld. 4 of them had a normal and 2 a corrected to normal vision.
2.1.3 Material and procedure
As the stimuli for this experiment we chose 6 pictures. These pictures are shown in Fig. 1 left. Two of them the Necker cube and the Boring gure are classical examples of ambiguous gures with a long history of investigation (Boring, 1942; Garcia-Perez, 1992, among many others). Two others were fragments of "Earth" by Giuseppe Arcimboldo and Maurits Cornelis Escher's "Circlelimit IV". Though popular as examples of perceptual bi-stability, these pictures were not used earlier in connection with eye-movement studies, as far as we know. Another picture was a fragment of Albrecht Duerer's "View of the Val d'Arco". Strictly speaking, this picture can hardly be considered as ambiguous, because the alternatives interpretations are unequal: Almost all observers rst see a landscape with the castle and discover only after long delay the possibility to see the rock on the left side as the pro le of a human face. It was included nevertheless, because of the realism of this situation (which made this picture especially interesting for a transfer of the method to such domains, as medical imaging). The last picture was the product of our-own morphing of a woman's and a man's faces.
We also prepared two unambiguous versions of each picture. They are shown in Fig. 1 in the middle and in the right row.
All stimuli were presented on a high-resolution 17" colour monitor (ViewSonic 7) with a screen resolution of 640 480 pixels. The distance of observation was 60 cm. The pictures were about 480 pixels high and from 330 (Boring gure) to 620 (Duerer painting) pixels wide. At the beginning of every session two unambiguous and one ambiguous version of the Necker cube were shown. They were used to introduce the task. After the explana-
3
? !"#$%&'
()*+,-./01 23456789:; <=>?@ABCDE FGHIJKLMNO PQRSTUVWXY Z \]^_`abc defghijklm
? nopqrstuvw
xyz{|}~
Figure 1: The original pictures (left column) and their unambiguous variants for interpretations A and B (middle and right column, respectively)
4
tion all pictures were shown in such a way that two unambiguous versions of a picture always preceded the corresponding ambiguous version. The presentation time of every unambiguous version was 20 sec; the time of presentation of an ambiguous picture was 60 sec. Intervals between the variants of the same picture were 10 sec, intervals between di erent classes of pictures - 60 sec. This time of 60 sec also included the time of re-calibrating of the eye-tracker which took less than 10 sec. The order of presentation of all pictures was counterbalanced across the subjects.
The subjects received the special task of manual reporting their perception while viewing the ambiguous version of each picture, and therefore had to put their hand on a two-button computer mouse. The task was to push the button on the left as soon as they saw a certain interpretation of the picture interpretation A and keep it pressed as long as this state of perception lasted. When seeing the interpretation B they had to push the button on the right. The experimenter told the subjects about the corresponding buttons for each interpretation shortly before the next ambiguous picture was shown.
After the experiment we divided the xations which were recorded during the presentation of the ambiguous pictures according to the button response data in two sets, namely the xations belonging to the interpretations A and B, respectively. With additional data obtained during the presentation of the unambiguous variants, there were four di erent xation sets derived from every thematic class of pictures: The sets A and B from the two unambiguous variants and the sets A' and B' which were obtained after dividing the pool of xation data from observation of the ambiguous picture.
2.2 Results
All subjects were able to di erentiate the perceptual states of all the pictures without apparent di culties. The transitions from one perceptual state to another as manifested in manual responses of the subjects were almost instantaneous, i.e. with temporal gaps or overlaps of less than 500 msec, in about 90% of cases. The intermediate state "not one/not other" extended over less than 5% of the observation time of ambiguous pictures. The duration of perceiving a constant interpretation was found between 3 and 13 seconds varying signi cantly between subjects, but not between pictures. In the following analysis we used as a reference point the moment of a button pushing signifying the transmission into the corresponding perceptual state.
In the few cases with extensive history of previous investigation with the help of eyemovement recording, our results partially replicated previous data (Ellis & Stark, 1978; Gale & Findlay, 1983). Thus, the perception of the Necker cube was mostly connected with the saccades along its main diagonal. The change of a perceptual state correlated with a shift of the xations to another "core area" of the picture. However, we could not con rm the previous suggestion that these phenomenal changes coincide with longer, socalled "organizational xations" or other parameters of individual xations or saccades (cf. Ellis & Stark, 1978). Corresponding data for xation length as well as average size of pupil are shown in Fig. 2 and Fig. 3, respectively. The same lack of correspondence between parameters of separate xations and the instants of phenomenal changes was typical also in the case of all other pictures. In order to evaluate stability and possible speci city of eye-movement patterns to perceptual interpretation of pictures in a more objective way a measure of similarity s between two xation sets was used, which yields similarity values
in the interval 0; 1]. This is described in detail in appendix A.
5
800 700 600 500 tF (ms) 400 300 200 100
-04000 -3000 -2000 -1000 0 1000 2000 3000 4000 tC (ms)
Figure 2: Average duration tF of xations as a function of the time tC relative to changes of interpretations
3000
2500
2000
AP (pixel) 1500
1000
500
-04000 -3000 -2000 -1000 0 1000 2000 3000 4000 tC (ms)
Figure 3: Average the digital picture
pouf pthileseizyeeAcaPmaesraa)function
of
tC
(measured
as
the
number
of
pixels
in
6
Comparison Necker Cube Duerer Boring Escher Faces Arcimboldo
A vs. B
64.3 45.4 76.1 15.2 72.2 40.5
A' vs. B' 77.5 27.7 48.5 28.6 34.9 67.5
A vs. A'
61.4 78.6 87.7 72.0 78.2 66.1
B vs. B'
37.5 93.5 88.1 88.2 64.8 83.0
A vs. B'
59.7 39.8 51.4 24.3 48.2 29.9
B vs. A'
48.8 35.6 71.3 15.8 50.3 62.9
Table 1: Similarities of xation sets in %
The degree of similarity of xation data for both variants of perception of the same picture was computed and compared between themselves and with the corresponding parameters for every unambiguous variant of the pictures. The computation was performed for individual and group data. The similarity of individual data for identical pictures was in the interval between 76% and 95%. The results of the comparison of group data (i.e. cumulated xation data across subjects) are presented in Table 1.
For almost all pictures the computed similarity coe cients demonstrate systematic changes which become more prominent when visualized as gray values in matrices shown in Fig. 4. Each 4 4 matrix shows the subset of similarity coe cients pertaining to the 4 4 pairings of the four variants A, B, A', B' of each picture (these matrices are symmetric since the order in a pair is irrelevant, and the main diagonal represents the pairings of each pattern with itself, which is not relevant for our discussion). The brightness of each matrix element increases with the similarity coe cient of the corresponding comparison.
If there was no signi cant di erence between the "statically" (A, B) and "dynamically" (A', B') derived xation patterns, but between xation patterns for di erent interpretations, the comparisons A vs. A' and B vs. B' should demonstrate higher similarity than all others. Obviously, in this case the matrices would present checker-board . patterns And in fact, the checker-board patterns can be easily seen in every box, with the sole exception of the box with data of the Necker cube. They are exactly those which can be expected on the basis of hypothesis about speci city of xation distributions to the type of phenomenal interpretation of a picture.
The e ect of higher similarities of A vs. A' and B vs. B' on the whole data set can be visualized also by a cumulative plot (Fig. 5). Here, the similarity coe cients for each class of comparison are presented in ascending order.
The computing of similarity coe cients nally allowed to approach the classic problem about objective indices of the phenomenal changes and the temporal relationships between a change of phenomenal state and the moment of manual report. For all the subjects and all the pictures, the minimal values of the similarity coe cient for xation patterns A' and B' during a speci c perceptual interpretation of an ambiguous picture were achieved if one takes into account a certain "response time" of approximately 900 1000 ms. In order to investigate the subject's response time, we changed the way of deriving the xation
7
?
$-%.&/'0(1)2!*3?"+#,
Figure 4: The similarity matrices of the six pictures
100
2
80 60
2
2
2
2 3
3+2?
3+? 4
s (%) 40
20
4+ 3?
43+?
43+?
4+?
4
AAAABB'
vvvvvvssssss......
ABBBBB'''''
3+2 4?
00
1
2
3
4
5
6
7
Position
Figure 5: Cumulative plot of similarity values s for the six classes of comparisons
8
100
80
60
s (%)
40
20
NAecrkciemDrBEbFCuosocareuhlicrbdneeeogerrs
-04000 -3500 -3000 -2500 -2000 -1500 -1000 -500 0 500 1000
tS (ms)
Figure 6: Similarity s of A' and B' for individual pictures and di erent "time shifts" tS
sets A' and B' from showing the ambiguous picture. We added a constant "time shift" to every manual report of all subjects, pretending the reports happenend earlier (negative time shift) or later (positive time shift) than registrated. Now the similarity of the xation sets A' and B' for individual pictures was computed using di erent time shifts.
Fig. 6 shows the average similarity function of the xation patterns A' and B' as a function of the time shift used for the separation of the xations. The Necker cube values demonstrate no signi cant dependence from the underlying time shift, but the other pictures indicate a more or less distinct "U"-shape. If one takes as a temporal reference point the moment located about 900 ms before the subject's manual report, then the similarity functions reach their absolute minimum.
The empirical data on the xation patterns A' and B' that corresponded to the di erent perceptual interpretations of the same pictures were further used in Experiment 2 of the study.
3 Experiment 2: Visualization and transfer of subjective views of the pictures
The aim of this experiment was to attempt an objective reconstruction of di erent subjective views of the ambiguous pictures on the basis of the eye-movement data collected in Experiment 1.
9
3.1 Method
3.1.1 Subjects
150 subjects na?ve about the purpose participated in this study All of them were students of natural sciences and mathematics at University of Bielefeld.
3.1.2 Material and procedure
In order to process pictures in a gaze-dependent way, one should decide what the form of the visibility function connected with such xations is. Three lines of research can be of relevance for the answer of this question: "useful eld of view" and "useful resolution" studies (Ball, Beard, Roenker, Miller & Griggs, 1988; Mackworth, 1976; Shioiri & Ikeda, 1989), investigation of asymmetry in dynamic distribution of attention in dependence on the direction of eye movements in reading (Rayner, Well, Polatsek & Bertera, 1982) and experiments with images stabilized on the retina that demonstrate a kind of dissociation between anatomical and "functional" fovea (Zinchenko & Vergiles, 1972). Unfortunately, it is impossible to use these data directly, because all of them were obtained under rather speci cconditions.Thereforewe assumedthe relativelyrestrictedand conservativehypothesis that the average e cient eld of view coincides with the idealized anatomical fovea (Hood & Finkelstein, 1986). According to this working hypothesis the visibility function is a two-dimensional Gaussian distribution with the center at the registered xation point and the standard deviation of one degree of visual angle.
Our further hypothesis was that the visibility functions of individual xations can be collapsed without taking into account their temporal order. The aim of our image processing is to emphasize the regions of a picture which received the highest attention from the subject. There are many methods to achieve this, for example:
lowering of brightness, enhancing of brightness, reduction of contrast, reduction of optical resolution in the "valleys" of attentional landscapes, i.e. outside of the highs of the gaze-position clusters. More information about our image processing can be found in appendix B. In this experiment we only used the method "lowering of brightness". The processing was based on the corresponding xation sets A' and B', respectively, derived from showing the ambiguous pictures in Experiment 1. The resulting pictures are presented in Fig. 7. These 12 pictures were used together with 6 originals as material in Experiment 2. Subjects were individually presented with counterbalanced subsets of 6 pictures which included 2 originals and 4 processed pictures representing all di erent thematic classes of pictures used in current study one and only one time. Subjects were asked to describe the content of the pictures. The descriptions were then subjected to a blind forced choice evaluation, so that a consistent " rst sight interpretation" for every stimulus and every subject was agreed between three experts.
10
?
#*18?FMTbi$+29@GNU\cj%,3:AHOV]dk&-4;BIPW^el'.5<CJQX_fm!(/6=DKRY`gn")07>ELSZaho
Figure 7: The "highlighted" pictures for each interpretation
11
Necker Cube Duerer Boring Escher Faces Arcimboldo
IA(O)
39
IA(A)
37
IA(B)
38
49 23 41 11 38
50 15 44 39 50
23 7 11 4
8
IB(O)
11
IB(A)
13
IB(B)
12
1 27 9 39
12
0 35 6 11
0
27 43 39 46
42
Table 2: Results of Experiment 2
3.2 Results
The results of this experiment are summarized in Table 2. The value of IA(O), e.g., tells us
how many of the subjects came to interpretation A when the original picture was shown
to them. Due to the fact that the decision always was either for A or B, the equation
IA(x) + IB(x) = 50 is true for each of the 18 presented pictures. Statistical analysis of
the data was performed with the help of a one-sided four elds 2 test (Lienert, 1973).
The analysis demonstrates that the processing of initial pictures in terms of distribution
of xations had a signi cant and predicted in uence on their further perception, although
this in uence was not documented in all cases. In particular, both variants of processing
(i.e. in the direction of the interpretations A or B) had no in uence on the perception of
the Necker cube (uA = 0:58, p > 0:05; uB = 0:23, p > 0:05). In the case of the Boring
gure the e ect was only signi cant for the enhancing of the interpretation "old woman"
(uB = woman
3:49, p rather
< 0; 001). Paradoxically, the diminished the frequency of
processing towards perception of the young this interpretation (uA = 1:65). In all other
cases, the e ect of transfer of perceptual experience was fairly strong. When the base-line
frequency was initially shifted towards one of the interpretations an appropriate processing
either made the hidden version obvious or, at least in a tendency, additionally enhanced
the dominating version of perception: Duerer's painting (uA = 1:43, :0 05 < p < 0:10; uB = 5:51, p < 0:001), Escher (uA = 1:11, :0 05 < p < 0:10; uB = 5:80, p < 0:001). For
the remaining p < 0:001; uB
two pictures the results = 1:96, p < 0:05) and
were even more homogeneous: Arcimboldo (uA = 3:87, p <
Faces 0:001;
(uuBA
= =
5:61, 5:81,
p < 0:001).
4 Discussion
The present study brought about some old as well as some new results. In the line with earlier work we were able to testify in Experiment 1 that in the case of several pictures allowing more than one interpretation there are speci c "focal areas" whose xation correlates with speci c perceptual interpretations (Gale & Findlay, 1983). In addition, we have demonstrated that a general change in distribution of gaze position patterns, as evalu-
12
ated with the help of a new wholistic measure of similarity, usually preceded the manual response about the change of phenomenal perception by a time of about 900 ms. For experiments on perceptual identi cation (see e.g. Posner, 1978) this is a reasonably long reaction time to suppose that the manual report indeed is a reaction on the phenomenal changes. The result in general corresponds to the introspective observation that phenomenal changes, while being expected, often slightly astonished observers. The phenomenal changes themselves, of course, can coincide, precede or perhaps follow the changes in eye movements. In contrast with one previous report, characteristics of separate saccades or xations (as well as uctuations of the pupil size) were insu cient for a di erentiation of alternative perceptual interpretations of pictures from our set (cf. Ellis & Stark, 1978).
In Experiment 2 we attempted to use the data about eye xation patterns of ambiguous pictures for the visualization of actual perception. This processing was done in such a way that the regions of the pictures which attracted the gaze xations during speci c interpretations were highlighted. In the present study this processing was based on the simplest assumption about the form and size of visibility function whereby we equated the parameters of the "functional fovea" with the idealized anatomical fovea, i.e. the Gaussian function with the standard deviation of one degree of visual angle. Despite this oversimpli cation the experiment was basically successful: In four out of six pictures we found a clear transfer e ect of such a processing on the perceptual interpretation of na?ve subjects. All the pictures that demonstrated this transfer were relatively complex, colorful stimuli with several levels of contrast.
From these data it seems to follow that both line drawings in our set the Necker cube and the Boring gure have a special status. Although exactly these gures were considered earlier from the perspective of their dependence on the eye movement based ltering of spatial frequencies (Garcia-Perez, 1992), the rather similar transformation used in the present study did not lead to the expected control of phenomenal experience. What are possible reasons for such a failure?
In the case of the Necker cube, for instance, there seems to be a built-in deceit: The very shift of the focus of attention to the "focal area" of an alternative interpretation creates a higher probability of reversal in the opposite direction. Indeed, in the middle of both "focal areas" one nds a vertex which has to be perceived as a component of the back(ground) plane of the corresponding 3D-interpretation. The xation of the vertex can however provide it with a gure status and therefore provoke the reversal of the whole con guration. In the Boring gure there was an unexpected shift of initially more or less evenly distributed probabilities of both interpretations to one of them as a result of the image processing. The shading-out of the surrounding information obviously limits the possibility to see a young woman. This is the perceptual interpretation which is mostly conveyed by global information about the posture of the body as a whole and not so much by details like eye or mouth. An additional reason for the relative failure of our procedure in the case of black-and-white line-drawings may lay in the fact that the introduced modulation of brightness was too weak to be integrated into the main graphical elements of such pictures.
This study is only a rst attempt of elicitation of perceptual experience on the basis of eye movement data. Several additional problems should be solved before the outlined approach could become a more reliable method. First of all, the shape of the visibility function has to be considered anew with the possibility that it can vary depending on objective and subjective factors. The second in the list is the problem of temporal char-
13
acteristics of processing to what extent can the temporal order information be ignored in such studies (cf. Hacisalihzade, Stark & Allen, 1992) and what is the possible window size of accumulation of xations to the "attention landscapes"? Another problem to be solved is an adjustment of our processing algorithms to the spatial frequency characteristics of pictures and to the corresponding perceptual attitudes of observers, e.g. as it would be necessary in the case of the Boring gure (for an investigation of related issues, see Caelli, 1988). A combination of our approach with methods of visual scene parsing and depth planes analysis from computer vision research (Ballard & Brown, 1982) could also be fruitful. Finally, one should of course be aware that not every xation is " lled with attention", so states of "empty gaze" have to be di erentiated. It seems that this problem could be solved on the basis of an analysis of eye movements themselves. However, the possible key to the solution may be situated in a slightly di erent domain, namely in the domain of micro eye movements (Gippenreiter & Romanov, 1972).
Non-verbal visual expertise plays an important role in everyday life, technology and medicine (see e.g. Norman, Coblentz, Brooks & Babcook, 1992; Velichkovsky, Pomplun & Rieser, 1995). The demonstrated fact that it is possible to convey to other persons a speci c perceptual interpretation made by other people even in the case of relatively complex pictures gives grounds to believe in an applied signi cance of the gaze-dependent processing approach. Being well aware of shortcomings of the present study, we believe that future methods which like ours unite traditional perceptual research with contemporary image processing possibilities will support a more subject-oriented phase in the development of information and communication technologies. This will in turn open the way to communication of not only declarative knowledge but also practical expertise.
Acknowledgements. We wish to thank Larry Stark, Vladimir Zinchenko, and last not
least Richard Gregory for discussing the results of this study and for encouraging us to present them to PERCEPTION. Two anonymous reviewers helped us in improving the nal version of the text. Thomas Clermont and Peter Munsche participated in supervision and running the experiments. Our special thanks are due to Eyal Reingold and Dave Stampe for the development of the eye-tracker used in our experiments. This study was supported by a grant from the German Science Foundation (DFG SFB 360/B4).
14
dAipstpreibnudtixioAns: oAf sixmaitliaornitsy measure for the comparison of
In order to derive a similarity measure for two xation patterns which depends on the
holistic distribution of attention and not on separate eye movements, we rst subdivided
the monitor screen which were located
iinnttohnexsqnuyarseqnu,ahreasd.
Then to be
cthalecsuulamtesdnfoorf
total duration each squaren,
nof=x1a:t:io: n(nsx2nFy)0.,
We obtained distribution vector ~v0 consisting of the values sn and therefore having nx ny
dimensions. Our hope was that this vector contained su cient information about how
much attention or at least dwell time was spent in each of the squares.
~v1
To and
compute the similarity ~v2, respectively, had to
of two xation be determined
sets F1 and F2, rst the distribution vectors in the way described above. Then the cosine
of the angle between these two vectors was calculated according to the following simple
equation:
cos
=
~v1 ~vj 1j
~v2 ~vj 2j
The value of cos was taken as the similarity measure. In fact, it has several important
features. It yields similarity values in the interval 0; 1], since both ~v1 and ~v2 have only
nonnegative components. It does not take into account the number of xations, but only
their distribution over the screen. In addition, it can be easily weighted or corrected for
duration of xations.
Nevertheless,thismeasurestillhas an unpleasantproperty:Its valuesdependon the size
and position of the squares on the picture. This dependency on position could be nearly
removed by calculating similarity coe cients for di erent x- and y-o sets of the whole
square grid and by taking the average similarity as the result. We used 10 equidistant
x-o sets, which were chosen in order to allow a maximum global shift of the length of one
square's side. These x-o sets were combined with 10 analogous y-o sets, so there have to
be computed 100 "elementary" similarities on the whole to derive the position-invariant
measure.
And how can we avoid the dependency on the square's size? Fig. 8 illustrates the
functional relationship between square size (or "granularity") and calculated similarity
coe cients for di erent xation patterns. As an example the data for the Boring and
the Arcimboldo picture are displayed. The Boring picture causes di erences between the
xation sets A' and B' on a small scale, the Arcimboldo picture on a large scale. This
fact will be discussed later in the text; at this point these sets should be considered as
"technical" examples.
Obviously all similarity values generally increase with the underlying square's size. This
fact can easily be explained by two extreme cases: If we used a square size of only one
pixel, the similarity would be very low, because only very few xations would be located
in corresponding squares. On the other hand, if we used squares as large as the screen, the
similarity value would always be 100%, because all xations would lay in the same (single)
square. Fig. 8 demonstrates another two important facts: First, the order of similarities
remains invariant for the four xation patterns with respect to granularity, at least for
the investigated range from 5 to 300 pixels. This con rms the stability of our measure.
Second, the maximum di erence between similarities A vs. A' and A' vs. B' varies with
the pictures. The Boring picture causes a maximum di erence at a granularity of about
25 pixels (small scale), the Arcimboldo picture at about 60 pixels (large scale). To derive
15
100
80
60 s (%)
40 20
AArrcciimmBBbboooorrililndndgogo AAAA vvvvssss AABB''
0 0 20 40 60 80 100 120 140
granularity of measure (pixels)
Figure 8: Similarity of speci c xation sets as a function of the granularity used for the similarity measure
a "fair" measure we decided to use the average similarity coe cient for comparison on 25, 40, and 64 pixel granularity, which is a geometric series in the relevant range. The use of smaller squares is not sensible, since the human foveal vision has a certain extent and, in addition, the eye-tracker accuracy itself is limited (see Pomplun, Velichkovsky & Ritter, 1994). Larger squares are not capable to improve the measure, because no further important information can be found on the scale of the presented pictures. Our de nite similarity measure now uses 300 elementary distribution vector comparisons. It has all desired properties and its stability was proved in various tests.
16
!2<1;0:/!(32<1;0!()432<1!()+5432()+,6543)+,-7654+,-.8765,-./9876-./0:987./01;:98 Figure 9: The original picture
Aprpopceenssdiinxg B: Methods of gaze-dependent image
The important precondition for the gaze-contingent image processing is a continuous "attention function" a(x; y) which is de ned all over the picture and is built on the basis of the recorded xations. In order to nd a suitable function we de ne a two-dimensional Gaussian distribution centered at the current xation point, where the standard deviation is one degree of visual angle. Then we simply sum up these Gaussian distributions for all recorded xations weighted for their durations to receive the desired function a(x; y).
In order to illustrate the procedure, let us consider a test picture (Fig. 9). Its accumulative "attentional landscape" is shown in Fig. 10. This form of representation is derived from empirical xations, and the peaks of this function corresponding to the eyes and the mouth in the woman's picture are signi cant.
The gaze-contingent processing can be realized in several di erent ways, depending on the chosen type of image processing function fP : A O ! P which combines the attentional landscape A and the original picture O to the resulting picture P. The e ect of four di erent functions is illustrated in Fig. 11 to 14, where the gaze-contingent processing of a prototype picture was coupled with (a) lowering of brightness, (b) enhancing of brightness, (c) reduction of contrast, or (d) reduction of optical resolution in regions with lower values of attentional landscape. The last of these possibilities was already considered as a prospective method of disambiguation of ambiguous pictures, however, without considering eye movements (see Shiori & Ikeda, 1989).
Many di erent combinations of these procedures are easily realizable either beetween themselves or with di erent modes of processing. For the present study we used the
17
?
!"#$%&'
Figure 10: "Attentional landscape" distributed over the monitor screen as obtained from a subject watching the picture shown in Fig. 9
rst of the described procedures: The brightness outside of attended regions was reduced
according to the following transformational equation (1), which is applied on every pixel
(x,y) of the picture:
~pxy = xy ~oxy
(1)
Here, ~pxy and ~oxy are the RGB-vectors (i.e. the red, , green and blue component of a colour) of pixel (x,y) in the processed and the original picture, respectively. The transformation factors xy can be calculated by equation (2):
xy
=
m
+
(1
?
m)
a(x; y)
amax
;
(2)
where a(x; y) is the value of "attentional landscape" for pixel (x,y), amax is the maximal
value in the whole picture, and m is a constant which determines the minimum brightness remaining in the processed picture. If m, e.g., is set to 0.1, the regions of the picture with
attention value 0 will keep 10% of their initial brightness, if m = 1 the picture will not change at all. In this experiment we always set m = 0:1.
18
!2<1;0:/!(32<1;0!()432<1!()+5432()+,6543)+,-7654+,-.8765,-./9876-./0:987./01;:98
Figure 11: After a partial decrease of brightness one of the face regions seems to be "highlighted" (variant a).
!2<1;0:/!(32<1;0!()432<1!()+5432()+,6543)+,-7654+,-.8765,-./9876-./0:987./01;:98
Figure 12: The less inspected areas seem to disappear behind a veil of mist after enhancing brightness (variant b).
19
!2<1;0:/!(32<1;0!()432<1!()+5432()+,6543)+,-7654+,-.8765,-./9876-./0:987./01;:98
Figure 13: The di erences in colour decrease in the peripheral regions after reducing contrast (variant c).
0:/!2<1;!1;0(32<!(2<1)43!()32+54()+43,65)+,54-76+,-65.87,-.76/98-./870:9./0981;:
Figure 14: The areas of lower attention are blurred, so the Figure looks like a camera picture focussing the woman's face (variant d).
20
Appendix C: The picture series
C.1 The Necker cube series
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
CUBE1.GIF
21
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
CUBE2.GIF
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
CUBE3.GIF
22
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
CUBE4.GIF
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
CUBE5.GIF
23
C.2 The Duerer series
!2<1;0:!(32<1;!()432<!()+543()+,654)+,-765+,-.876,-./987-./0:98./01;:9
DUERER1.GIF
24
!2<1;0:!(32<1;!()432<!()+543()+,654)+,-765+,-.876,-./987-./0:98./01;:9
DUERER2.GIF
!2<1;0:!(32<1;!()432<!()+543()+,654)+,-765+,-.876,-./987-./0:98./01;:9
DUERER3.GIF
25
!2<1;0:!(32<1;!()432<!()+543()+,654)+,-765+,-.876,-./987-./0:98./01;:9
DUERER4.GIF
!2<1;0:!(32<1;!()432<!()+543()+,654)+,-765+,-.876,-./987-./0:98./01;:9
DUERER5.GIF
26
C.3 The Boring series
!-27<,1!(.38-2()/49.3)+05:/4+,16;05
BORING1.GIF
27
!-27<,1!(.38-2()/49.3)+05:/4+,16;05
BORING2.GIF
!-27<,1!(.38-2()/49.3)+05:/4+,16;05
BORING3.GIF
28
!-27<,1!(.38-2()/49.3)+05:/4+,16;05
BORING4.GIF
!-27<,1!(.38-2()/49.3)+05:/4+,16;05
BORING5.GIF
29
C.4 The Escher series
+3;.!08,4</(19!-50)2:(.61+3;)/72,4<!+083-5(,194.6)-2:5/7
ESCHER1.GIF
30
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
ESCHER2.GIF
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
ESCHER3.GIF
31
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
ESCHER4.GIF
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5
ESCHER5.GIF
32
C.5 The faces series
+3;!08.,4<(19/!-5)2:0(+3;.61),4</72!+-5083(,.6194)-/72:5 FACES1.GIF
33
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5 FACES2.GIF
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5 FACES3.GIF
34
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5 FACES4.GIF
!08+3;.(19,4</!)2:-50(+3;.61),4</72!+-5083(,.6194)-/72:5 FACES5.GIF
35
C.6 The Arcimboldo series
+3;!08.6,4<(19/7!-5)2:08(+3;.619),4</72:!+-5083;(,.6194<!)-/72:5 ARCIMB1.GIF
36
!08+3;.6(19,4</7!)2:-508(+3;.619),4</72:!+-5083;(,.6194<!)-/72:5 ARCIMB2.GIF
!08+3;.6(19,4</7!)2:-508(+3;.619),4</72:!+-5083;(,.6194<!)-/72:5 ARCIMB3.GIF
37
!08+3;.6(19,4</7!)2:-508(+3;.619),4</72:!+-5083;(,.6194<!)-/72:5 ARCIMB4.GIF
!08+3;.6(19,4</7!)2:-508(+3;.619),4</72:!+-5083;(,.6194<!)-/72:5 ARCIMB5.GIF
38
References
Ball K K, Beard B L, Roenker D L, Miller R L, Griggs D S, 1988 "Age and visual search: Expanding the useful eld of view" Journal of the Optical Society of America A, 5 2210 2219 Ballard D, Brown C, 1982 Computer vision (Engliwood-Cli s: Prentice Hall) Boring E G, 1942 Sensation and perception in the history of experimental psychology (New York: Irvington) Caelli T M, 1988 "An adaptive computational model for texture segmentation " IEEE Transactions on Systems, Man, and Cybernetics 18 9 17 Chapman J (Ed), 1987 The Arcimboldo E ect: Transformation of the face from the sixteenth to twentieth century (Milano: Fratelli Fabri Editori) Cooper L, 1994 "Mental representation of visual objects and events" in International perspectives on psychological science: The state of the art Eds G d'Ydewalle, P Eelen & P Bertelson (Hove, UK/Hillsdale, NJ: Lawrence Erlbaum Associates) Ellis S R, Stark L, 1978 "Eye movements during the viewing of Necker cubes" Perception, 7 575 581 Gale A G, Findlay J M, 1983 "Eye movement patterns in viewing ambiguous gures" in Eye movements and psychological functions: International views Eds R Groner, C Menz, D F Fisher & R A Monty (Hillsdale, NJ: Lawrence Erlbaum Associates) Garcia-Perez M A, 1992 "Eye movements and perceptual multistability" in The role of eye movements in perceptual processes Eds E Chekaluk & K R Llowellyn (Amsterdam: North Holland) Gippenreiter Yu B, Romanov V Ya, 1972 "A method of investigation of the internal form of visual activity" in R MacLeod and H L Pick, Jr, 1974 Perception: Essays in honor of James J Gibson (Ithaca: Cornell University Press) Gombrich E H, 1969 Art and illusion: A study in the psychology of pictorial representation 2nd ed (Princeton, NJ: Princeton University Press) Hacisalihzade S S, Stark L W, Allen J S, 1992 "Visual perception and sequences of eye movement xations" IEEE Transactions on Systems, Man, and Cybernetics 22 474 481 Hood D C, Finkelstein M A, 1986 "Sensitivity to light" in Handbook of perception and human performance, Vol 1: Sensory processes and perception Eds K R Bo , L Kaufman, J P Thomas (New York: John Wiley and Sons)
39
Lee S W, Bajcsy R, 1992 "Detection of specularity using color and multiple views" Image and Vision Computing 10 643 653 Lienert G, 1973 Verteilungsfreie Methoden in der Biostatistik 2 Au age, Bd 1 (Meisenheim/Glan: Verlag Anton Hain) Mackworth N H, 1976 "Stimulus density limits the useful eld of view" in Eye movements and psychological processes Eds R A Monty, J W Senders (New York: John Wiley and Sons) Nodine C F, Kungel H L, Toto L C, Krupinsky E A, 1992 "Recording and analysing eye-position data using a microcomputer workstation" Behavioral Research Methods, Instruments, and Computers 24 475 485 Norman G R, Coblentz C L, Brooks L R, Babcook C J, 1992 "Expertise in visual diagnostics: A review of the literature" Academic Medicine Rime Supplement 67 78 83 Noton D, Stark L W, 1971 "Scanpaths in eye movements during pattern perception" Science 171 308 311 Pillalamari R S, Barnette B D, Birkmire D, 1993 "Cluster: A program for the identi cation of eye- xation-cluster characteristics" Behavioral Research Methods, Instruments, and Computers 25 9 15 Pomplun M, Velichkovsky B M, Ritter H, 1994 "An arti cial neural network for high precision eye movement tracking" in Lecture notes in arti cial intelligence: AI-94 Proceedings Eds B Nebel & L Dreschler-Fischer (Berlin: Springer Verlag) Posner M, 1978 Chronometric exploration of mind (Hillsdale, NJ: Lawrence Erlbaum Associates) Rayner K, Well A D, Polatsek A, Bertera J H, 1982 "The availability of useful information to the right of xation in reading" Perception and Psychophysics 31 537 550 Ritter H, Martinetz T, Schulten K, 1992 Neural computation and self-organizing maps (Reading, MA: Addison-Wesley) Rock I, Hall S, Davis J, 1994 "Why do ambiguous gures reverse?" Acta Psychologica 87 33 59 Stampe D M, 1993 "Heuristic ltering and reliable calibration methods for video-based pupil-tracking systems" Behavioral Research Methods, Instruments, and Computers 25 137 142 Stampe D M, Reingold E, 1993 "Eye movement as a response modality in psychological research" in Proceedings of the Seventh European Conference on Eye Movements, Durham University of Durham, 31st of August 3rd of September
40
Velichkovsky B M, 1995 "Communicating attention: Gaze-position transfer in cooperative problem solving" Pragmatics and Cognition 3(2), 199 222. Velichkovsky B M, Luria A R, Zinchenko V P, 1973 Psychology of perception (Moscow: Moscow University Press in Russian]) Velichkovsky B M, Pomplun M, Rieser J, 1995 in press "Attention and Communication: Eye-Movement-Based Research Paradigms" Visual Attention and Cognition Eds W H Zangemeister, H S Stiehl & C Freksa (Amsterdam: Elsevier Science Publishers) Vicholkovska A, 1906 "Illusion of reversible perspective" Psychological Review 13 276 290 Yarbus A, 1967 Eye movements and vision (New York: Plenum Press) Zinchenko V P, Vergiles N Yu, 1972 Formation of visual image: Studies of stabilized retinal images (New York: Plenum Press)
View publication stats
41

View File

@@ -0,0 +1,11 @@
Producer: ESP Ghostscript 815.02
CreationDate: 11/07/07 09:49:41
ModDate: 11/07/07 09:49:41
Tagged: no
Form: none
Pages: 42
Encrypted: no
Page size: 612 x 792 pts (letter) (rotated 0 degrees)
File size: 2896995 bytes
Optimized: no
PDF version: 1.3

View File

@@ -1,15 +1,15 @@
{
"translatorID": "f20f91fe-d875-47e7-9656-0abb928be472",
"translatorType": 4,
"label": "HAL Archives Ouvertes",
"creator": "Sebastian Karcher",
"target": "^https://(hal\\.archives-ouvertes\\.fr|hal\\.science)\\b",
"label": "HAL",
"creator": "Sebastian Karcher and Abe Jellinek",
"target": "^https://([^/.]+\\.)?hal\\.science/",
"minVersion": "3.0",
"maxVersion": null,
"priority": 100,
"inRepository": true,
"browserSupport": "gcsibv",
"lastUpdated": "2023-07-12 09:15:00"
"lastUpdated": "2025-10-23 17:10:00"
}
/*
@@ -36,11 +36,30 @@
*/
function detectWeb(doc, url) {
if (/\/search\/index\//.test(url)) return "multiple";
if (/\/hal-\d+/.test(url)) return findItemType(doc, url);
if (getSearchResults(doc, true)) {
return 'multiple';
}
else if (doc.querySelector('.typdoc')) {
return findItemType(doc, url);
}
return false;
}
function getSearchResults(doc, checkOnly) {
var items = {};
var found = false;
var rows = doc.querySelectorAll('.results-table td > a:first-child');
for (let row of rows) {
let href = row.href;
let title = ZU.trimInternal(row.textContent);
if (!href || !title) continue;
if (checkOnly) return true;
found = true;
items[href] = title;
}
return found ? items : false;
}
function findItemType(doc, url) {
var itemType = text(doc, '.typdoc')
// do some preliminary cleaning
@@ -82,73 +101,64 @@ function findItemType(doc, url) {
else return "journalArticle";
}
function doWeb(doc, url) {
var articles = [];
if (detectWeb(doc, url) == "multiple") {
var items = {};
var titles = doc.evaluate('//strong/a[@data-original-title="Display the resource" or @data-original-title="Voir la ressource"]', doc, null, XPathResult.ANY_TYPE, null);
var title;
while ((title = titles.iterateNext())/* assignment */) {
items[title.href] = title.textContent;
async function doWeb(doc, url) {
if (detectWeb(doc, url) == 'multiple') {
let items = await Zotero.selectItems(getSearchResults(doc, false));
if (!items) return;
for (let url of Object.keys(items)) {
await scrape(await requestDocument(url));
}
Zotero.selectItems(items, function (items) {
if (!items) {
return true;
}
for (var i in items) {
articles.push(i);
}
Zotero.Utilities.processDocuments(articles, scrape);
return true;
});
}
else if (/\/document$/.test(url)) { // work on PDF pages
var articleURL = url.replace(/\/document$/, "");
// Z.debug(articleURL)
ZU.processDocuments(articleURL, scrape);
else {
await scrape(doc, url);
}
else scrape(doc, url);
}
function scrape(doc, url) {
async function scrape(doc, url = doc.location.href) {
if (/\/document$/.test(url)) { // work on PDF pages
var articleURL = url.replace(/\/document$/, "");
// Z.debug(articleURL)
await scrape(await requestDocument(articleURL));
return;
}
var bibtexUrl = url.replace(/#.+|\/$/, "") + "/bibtex";
var abstract = text(doc, '.abstract-content');
var pdfUrl = attr(doc, "#viewer-detailed a[download]", "href");
// Z.debug("pdfURL " + pdfUrl)
ZU.doGet(bibtexUrl, function (bibtex) {
// Z.debug(bibtex)
var translator = Zotero.loadTranslator("import");
translator.setTranslator("9cb70025-a888-4a29-a210-93ec52da40d4");
translator.setString(bibtex);
translator.setHandler("itemDone", function (obj, item) {
if (abstract) {
item.abstractNote = abstract.replace(/^(Abstract|Résumé)\s*:/, "");
}
if (pdfUrl) {
item.attachments = [{
url: pdfUrl,
title: "HAL PDF Full Text",
mimeType: "application/pdf"
}];
}
else {
item.attachments = [{
document: doc,
title: "HAL Snapshot",
mimeType: "text/html"
}];
}
let detectedType = detectWeb(doc, url);
if (detectedType == "artwork" || detectedType == "presentation") {
item.itemType = detectedType;
}
if (detectedType == 'presentation' && text(doc, 'div.label-POSTER')) {
item.presentationType = 'Poster';
}
item.complete();
});
translator.translate();
let bibtex = await requestText(bibtexUrl);
// Z.debug(bibtex)
var translator = Zotero.loadTranslator("import");
translator.setTranslator("9cb70025-a888-4a29-a210-93ec52da40d4");
translator.setString(bibtex);
translator.setHandler("itemDone", function (obj, item) {
if (abstract) {
item.abstractNote = abstract.replace(/^(Abstract|Résumé)\s*:/, "");
}
if (pdfUrl) {
item.attachments = [{
url: pdfUrl,
title: "HAL PDF Full Text",
mimeType: "application/pdf"
}];
}
else {
item.attachments = [{
document: doc,
title: "HAL Snapshot",
mimeType: "text/html"
}];
}
let detectedType = detectWeb(doc, url);
if (detectedType == "artwork" || detectedType == "presentation") {
item.itemType = detectedType;
}
if (detectedType == 'presentation' && text(doc, 'div.label-POSTER')) {
item.presentationType = 'Poster';
}
item.complete();
});
await translator.translate();
}
/** BEGIN TEST CASES **/
@@ -197,19 +207,20 @@ var testCases = [
"creatorType": "author"
},
{
"firstName": "F.",
"firstName": "Fernand",
"lastName": "Karcher",
"creatorType": "author"
}
],
"date": "March 2006",
"date": "2006-03",
"DOI": "10.5194/acp-6-1033-2006",
"abstractNote": "The MOZAIC programme collects ozone and water vapour data using automatic equipment installed on board five long-range Airbus A340 aircraft flying regularly all over the world since August 1994. Those measurements made between September 1994 and August 1996 allowed the first accurate ozone climatology at 912 km altitude to be generated. The seasonal variability of the tropopause height has always provided a problem when constructing climatologies in this region. To remove any signal from the seasonal and synoptic scale variability in tropopause height we have chosen in this further study of these and subsequent data to reference our climatology to the altitude of the tropopause. We define the tropopause as a mixing zone 30 hPa thick across the 2 pvu potential vorticity surface. A new ozone climatology is now available for levels characteristic of the upper troposphere (UT) and the lower stratosphere (LS) regardless of the seasonal variations of the tropopause over the period 19942003. Moreover, this new presentation has allowed an estimation of the monthly mean climatological ozone concentration at the tropopause showing a sine seasonal variation with a maximum in May (120 ppbv) and a minimum in November (65 ppbv). Besides, we present a first assessment of the inter-annual variability of ozone in this particular critical region. The overall increase in the UTLS is about 1%/yr for the 9 years sampled. However, enhanced concentrations about 1015 % higher than the other years were recorded in 1998 and 1999 in both the UT and the LS. This so-called \"19981999 anomaly\" may be attributed to a combination of different processes involving large scale modes of atmospheric variability, circulation features and local or global pollution, but the most dominant one seems to involve the variability of the North Atlantic Oscillation (NAO) as we find a strong positive correlation (above 0.60) between ozone recorded in the upper troposphere and the NAO index. A strong anti-correlation is also found between ozone and the extremes of the Northern Annular Mode (NAM) index, attributing the lower stratospheric variability to dynamical anomalies. Finally this analysis highlights the coupling between the troposphere, at least the upper one, and the stratosphere, at least the lower one.",
"issue": "4",
"itemID": "thouret:hal-00328427",
"libraryCatalog": "HAL Archives Ouvertes",
"pages": "1051",
"publicationTitle": "Atmospheric Chemistry and Physics",
"url": "https://hal.archives-ouvertes.fr/hal-00328427",
"url": "https://hal.science/hal-00328427",
"volume": "6",
"attachments": [
{
@@ -233,7 +244,7 @@ var testCases = [
"creators": [
{
"firstName": "Henry",
"lastName": "De Lumley",
"lastName": "de Lumley",
"creatorType": "author"
},
{
@@ -248,7 +259,7 @@ var testCases = [
"libraryCatalog": "HAL Archives Ouvertes",
"numPages": "637 p.",
"publisher": "Éditions Recherche sur les Civilisations",
"url": "https://hal.archives-ouvertes.fr/hal-00472553",
"url": "https://hal.science/hal-00472553",
"attachments": [
{
"title": "HAL Snapshot",
@@ -280,11 +291,12 @@ var testCases = [
"creatorType": "author"
}
],
"date": "March 2014",
"date": "2014-03",
"abstractNote": "It seems that the Caisse des Dépôts et Consignations in partnership with the Conference of University Presidents have well taken the measure of this inexorable trend. That is why it \"is committed to supporting higher education institutions\" in the definition and implementation of their digital strategy and wider support them in their efforts to modernize. \" It is indeed in this modernization process that the University of Haute Alsace is committed to registration by engaging in a project to build a Learning Centre. The objective of this project is the modernization and rationalization of these support teaching and research services. There has to work at UHA innovation process its accompanying device in teaching learning and research which it is likely that this change will not be without effect on profit actors are students but also teachers. This research report aims to provide some ideas for reflection to support accompanying the opening of the Learning Centre to encourage future users to operate the premises.",
"itemID": "coulibaly:hal-00973502",
"libraryCatalog": "HAL Archives Ouvertes",
"shortTitle": "Learning Centre de l'UHA",
"url": "https://hal.archives-ouvertes.fr/hal-00973502",
"url": "https://hal.science/hal-00973502",
"attachments": [
{
"title": "HAL PDF Full Text",
@@ -292,14 +304,30 @@ var testCases = [
}
],
"tags": [
"Bibliothèque universitaire",
"ICT appropriation",
"Learning Centre",
"Pedagogy",
"University Library",
"appropriation TICE",
"innovation",
"pédagogie universitaire"
{
"tag": "Bibliothèque universitaire"
},
{
"tag": "ICT appropriation"
},
{
"tag": "Learning Centre"
},
{
"tag": "Pedagogy"
},
{
"tag": "University Library"
},
{
"tag": "appropriation TICE"
},
{
"tag": "innovation"
},
{
"tag": "pédagogie universitaire"
}
],
"notes": [
{
@@ -324,23 +352,33 @@ var testCases = [
"creatorType": "author"
}
],
"date": "March 2012",
"abstractNote": "Description : Children performing for a crowd of passersby in a park in Kunming. (Enfants jouant dans un parc à Kunming Photo d'enfants jouant dans un parc à Kunming",
"date": "2012-03",
"abstractNote": "Children performing for a crowd of passersby in a park in Kunming. (Enfants jouant dans un parc à Kunming Photo d'enfants jouant dans un parc à Kunming",
"itemID": "gipouloux:medihal-00772952",
"libraryCatalog": "HAL Archives Ouvertes",
"url": "https://medihal.archives-ouvertes.fr/medihal-00772952",
"url": "https://media.hal.science/medihal-00772952",
"attachments": [
{
"title": "HAL PDF Full Text",
"mimeType": "application/pdf"
"title": "HAL Snapshot",
"mimeType": "text/html"
}
],
"tags": [
"China",
"Kunming",
"children",
"park",
"town"
{
"tag": "China"
},
{
"tag": "Kunming"
},
{
"tag": "children"
},
{
"tag": "park"
},
{
"tag": "town"
}
],
"notes": [],
"seeAlso": []
@@ -349,7 +387,7 @@ var testCases = [
},
{
"type": "web",
"url": "https://hal.archives-ouvertes.fr/search/index/q/%2A/docType_s/THESE/",
"url": "https://hal.science/search/index?q=test",
"items": "multiple"
},
{
@@ -370,8 +408,7 @@ var testCases = [
"abstractNote": "First results about[i] in vitro[/i] bud neoformation on haploid apple leaves. The impact of biotechnology in agriculture. The meeting point between fundamental and applied in vitro culture research",
"extra": "Published: The impact of biotechnology in agriculture. The meeting point between fundamental and applied in vitro culture research",
"itemID": "duron:hal-01600136",
"presentationType": "Poster",
"url": "https://hal.archives-ouvertes.fr/hal-01600136",
"url": "https://hal.science/hal-01600136",
"attachments": [
{
"title": "HAL PDF Full Text",
@@ -424,6 +461,76 @@ var testCases = [
"seeAlso": []
}
]
},
{
"type": "web",
"url": "https://theses.hal.science/tel-05056628",
"items": [
{
"itemType": "thesis",
"title": "Modélisation et simulation des impacts de gouttes et de sprays sur des surfaces liquides",
"creators": [
{
"firstName": "Syphax",
"lastName": "Fereka",
"creatorType": "author"
}
],
"date": "2025-01",
"abstractNote": "Sprays are ubiquitous in various industrial applications, such as combustion, surface coating, and system cooling. Understanding the dynamics associated with these phenomena is crucial for energy optimization and industrial safety. Several aspects warrant further study and comprehension. However, this thesis primarily focuses on the interactions between sprays and deep liquid substrates, with particular attention to the disturbance of the liquid substrate (deposition and re-ejection).Classical modeling approaches for spray/surface interactions, often based on experimental data or empirical extrapolations from isolated droplet impact data, have limitations when applied across a wide range of regimes (substrate quality, We, Re, etc.). To overcome these constraints, we use multiphase numerical simulations with the in-house code Fugu, employing direct numerical simulation (DNS) at the droplet scale. This methodology provides precise control over key parameters (impact velocity, polydispersity, etc.) and enables an in-depth analysis of the associated physical and statistical phenomena. This thesis presents: (1) a literature review on droplet and spray impacts, (2) details of the numerical methods used, (3) validation of simulations for cases involving isolated or multiple droplet impacts, and (4) results on spray impacts on thick liquid films under various impact regimes. This work paves the way for a deeper understanding of spray/substrate interaction phenomena and advances in their numerical modeling",
"itemID": "fereka:tel-05056628",
"libraryCatalog": "HAL Archives Ouvertes",
"thesisType": "Theses",
"university": "Université Gustave Eiffel",
"url": "https://theses.hal.science/tel-05056628",
"attachments": [
{
"title": "HAL PDF Full Text",
"mimeType": "application/pdf"
}
],
"tags": [
{
"tag": "Bubbles"
},
{
"tag": "Bulles"
},
{
"tag": "Drops"
},
{
"tag": "Gouttes"
},
{
"tag": "Multi-Scale"
},
{
"tag": "Multi-Échelle"
},
{
"tag": "Multiphase flow"
},
{
"tag": "Spray"
},
{
"tag": "Spray"
},
{
"tag": "Vof"
},
{
"tag": "Volume of fluid"
},
{
"tag": "Écoulement polyphasique"
}
],
"notes": [],
"seeAlso": []
}
]
}
]
/** END TEST CASES **/

View File

@@ -1,21 +1,21 @@
{
"translatorID": "1b052690-16dd-431d-9828-9dc675eb55f6",
"translatorType": 4,
"label": "Papers Past",
"creator": "Philipp Zumstein and Abe Jellinek",
"creator": "Philipp Zumstein, Abe Jellinek, and Jason Murphy",
"target": "^https?://(www\\.)?paperspast\\.natlib\\.govt\\.nz/",
"minVersion": "3.0",
"maxVersion": "",
"minVersion": "5.0",
"maxVersion": null,
"priority": 100,
"inRepository": true,
"translatorType": 4,
"browserSupport": "gcsibv",
"lastUpdated": "2021-07-12 17:17:15"
"lastUpdated": "2025-10-21 16:40:00"
}
/*
***** BEGIN LICENSE BLOCK *****
Copyright © 2017-2021 Philipp Zumstein and Abe Jellinek
Copyright © 2025 Philipp Zumstein, Abe Jellinek, and Jason Murphy
This file is part of Zotero.
@@ -35,15 +35,14 @@
***** END LICENSE BLOCK *****
*/
function detectWeb(doc, url) {
if (/\/newspapers\/.+\.\d+\.\d+/.test(url)) {
return "newspaperArticle";
}
if (/[?&]query=/.test(url) && getSearchResults(doc, true)) {
return "multiple";
}
else if (ZU.xpathText(doc, '//h3[@itemprop="headline"]')) {
if (url.includes('/newspapers/')) {
return "newspaperArticle";
}
if (url.includes('/periodicals/')) {
return "journalArticle";
}
@@ -57,14 +56,13 @@ function detectWeb(doc, url) {
return false;
}
function getSearchResults(doc, checkOnly) {
var items = {};
var found = false;
var rows = doc.querySelectorAll('.search-results .article-preview__title a');
for (var i = 0; i < rows.length; i++) {
var href = rows[i].href;
var title = ZU.trimInternal(rows[i].textContent);
for (let row of rows) {
var href = row.href;
var title = ZU.trimInternal(row.textContent);
if (!href || !title) continue;
if (checkOnly) return true;
found = true;
@@ -73,31 +71,108 @@ function getSearchResults(doc, checkOnly) {
return found ? items : false;
}
function doWeb(doc, url) {
async function doWeb(doc, url) {
if (detectWeb(doc, url) == "multiple") {
Zotero.selectItems(getSearchResults(doc, false), function (items) {
if (!items) {
return;
}
var articles = [];
for (var i in items) {
articles.push(i);
}
ZU.processDocuments(articles, scrape);
});
let items = await Zotero.selectItems(getSearchResults(doc, false));
if (!items) return;
for (let url of Object.keys(items)) {
scrape(await requestDocument(url));
}
}
else {
scrape(doc, url);
}
}
function scrape(doc, url = doc.location.href) {
var type = detectWeb(doc, url);
if (type == "newspaperArticle") {
scrapeNewspaper(doc, url);
}
else if (type) {
scrapeLegacy(doc, url);
}
}
function scrape(doc, url) {
function scrapeNewspaper(doc, url) {
var item = new Zotero.Item("newspaperArticle");
var ld = getJSONLD(doc);
var news = null;
for (var i = 0; i < ld.length; i++) {
if (/NewsArticle|Article/i.test(ld[i]['@type'])) {
news = ld[i];
break;
}
}
var meta = collectMeta(doc);
var titles = [];
if (news && news.headline) {
titles.push(ZU.trimInternal(news.headline));
}
if (meta.hw.citation_title) {
titles.push(ZU.trimInternal(meta.hw.citation_title));
}
if (meta.dc["DC.title"]) {
titles.push(ZU.trimInternal(meta.dc["DC.title"]));
}
var rawTitle = dedupeFirst(titles);
item.title = fixTitleCase(rawTitle);
item.publicationTitle = (news && news.isPartOf && news.isPartOf.name)
|| meta.hw.citation_journal_title
|| meta.dc["DC.publisher"]
|| meta.dc["DC.source"]
|| "";
item.date = ZU.strToISO((news && news.datePublished) || meta.hw.citation_date || meta.dc["DC.date"] || "");
var pageStart = (news && news.pageStart) || meta.hw.citation_firstpage || "";
var pageEnd = (news && news.pageEnd) || meta.hw.citation_lastpage || "";
var pagesMeta = meta.hw.citation_pages || "";
item.pages = pagesFrom(pageStart, pageEnd, pagesMeta);
item.language = (news && news.inLanguage) || meta.hw.citation_language || meta.dc["DC.language"] || "";
item.rights = (news && news.copyrightNotice) || meta.dc["DC.rights"] || "";
var cleanUrl = canonicalURL(doc) || (news && news.url) || meta.hw.citation_fulltext_html_url || meta.dc["DC.source"] || url;
item.url = cleanUrl.split('?')[0].split('#')[0];
var bib = parseBibliographicDetails(doc);
if (!item.publicationTitle && bib.publicationTitle) {
item.publicationTitle = bib.publicationTitle;
}
if (!item.date && bib.date) {
item.date = ZU.strToISO(bib.date);
}
if (!item.pages && bib.pages) {
item.pages = bib.pages;
}
var vol = (news && news.isPartOf && news.isPartOf.volumeNumber ? String(news.isPartOf.volumeNumber) : "") || meta.hw.citation_volume || bib.volume || "";
var iss = (news && news.isPartOf && news.isPartOf.issueNumber ? String(news.isPartOf.issueNumber) : "") || meta.hw.citation_issue || bib.issue || "";
var extraParts = [];
if (vol) extraParts.push("Volume: " + vol);
if (iss) extraParts.push("Issue: " + iss);
if (extraParts.length > 0) {
item.extra = extraParts.join("\n");
}
item.creators = [];
item.attachments = [{
title: "Snapshot",
document: doc
}];
item.libraryCatalog = "Papers Past";
item.complete();
}
function scrapeLegacy(doc, url) {
var type = detectWeb(doc, url);
var item = new Zotero.Item(type);
var title = ZU.xpathText(doc, '//h3[@itemprop="headline"]/text()[1]');
item.title = ZU.capitalizeTitle(title.toLowerCase(), true);
var title = doc.querySelector('[itemprop="headline"]').firstChild.textContent;
item.title = fixTitleCase(title);
if (type == "journalArticle" || type == "newspaperArticle") {
var nav = doc.querySelectorAll('#breadcrumbs .breadcrumbs__crumb');
@@ -126,7 +201,6 @@ function scrape(doc, url) {
if (type == "letter") {
var author = ZU.xpathText(doc, '//div[@id="researcher-tools-tab"]//tr[td[.="Author"]]/td[2]');
// e.g. 42319/Mackay, James, 1831-1912
if (author && !author.includes("Unknown")) {
author = author.replace(/^[0-9/]*/, '').replace(/[0-9-]*$/, '').replace('(Sir)', '');
item.creators.push(ZU.cleanAuthor(author, "author"));
@@ -136,29 +210,32 @@ function scrape(doc, url) {
recipient = recipient.replace(/^[0-9/]*/, '').replace(/[0-9-]*$/, '').replace('(Sir)', '');
item.creators.push(ZU.cleanAuthor(recipient, "recipient"));
}
item.date = ZU.xpathText(doc, '//div[@id="researcher-tools-tab"]//tr[td[.="Date"]]/td[2]');
item.language = ZU.xpathText(doc, '//div[@id="researcher-tools-tab"]//tr[td[.="Language"]]/td[2]');
}
item.abstractNote = text(doc, '#tab-english');
item.url = ZU.xpathText(doc, '//div[@id="researcher-tools-tab"]/input/@value');
if (!item.url) item.url = text('#researcher-tools-tab p');
if (!item.url || !item.url.startsWith('http')) item.url = url;
if (!item.url) {
item.url = text(doc, '#researcher-tools-tab p');
}
if (!item.url || !item.url.startsWith('http')) {
item.url = url;
}
item.libraryCatalog = "Papers Past";
item.attachments.push({
title: "Snapshot",
document: doc
});
let imagePageURL = attr(doc, '.imagecontainer a', 'href');
var imagePageURL = attr(doc, '.imagecontainer a', 'href');
if (imagePageURL) {
ZU.processDocuments(imagePageURL, function (imageDoc) {
item.attachments.push({
title: 'Image',
mimeType: 'image/jpeg',
title: "Image",
mimeType: "image/jpeg",
url: attr(imageDoc, '.imagecontainer img', 'src')
});
item.complete();
@@ -169,6 +246,111 @@ function scrape(doc, url) {
}
}
function getJSONLD(doc) {
var out = [];
var nodes = doc.querySelectorAll('script[type="application/ld+json"]');
for (var i = 0; i < nodes.length; i++) {
try {
var data = JSON.parse(nodes[i].textContent);
if (Array.isArray(data)) {
for (var j = 0; j < data.length; j++) {
out.push(data[j]);
}
}
else if (data) {
out.push(data);
}
}
catch (e) {}
}
return out;
}
function collectMeta(doc) {
var hw = {};
var dc = {};
var metas = doc.querySelectorAll("meta[name]");
for (var i = 0; i < metas.length; i++) {
var name = metas[i].getAttribute("name");
var content = metas[i].getAttribute("content") || "";
if (!name) continue;
if (/^citation_/i.test(name)) {
if (name === "citation_author") {
if (!hw[name]) hw[name] = [];
hw[name].push(content);
}
else {
hw[name] = content;
}
continue;
}
if (/^DC\./.test(name) || /^dc\./.test(name)) {
dc[name.replace(/^dc\./, "DC.")] = content;
}
}
return { hw: hw, dc: dc };
}
function parseBibliographicDetails(doc) {
var textContent = text(doc, '#researcher-tools-tab .citation, .tabs-panel .citation, p.citation') || "";
var out = { publicationTitle: "", volume: "", issue: "", date: "", pages: "" };
if (!textContent) return out;
var pubMatch = textContent.match(/^\s*([^,]+),/);
if (pubMatch) out.publicationTitle = ZU.trimInternal(pubMatch[1]);
var volMatch = textContent.match(/Volume\s+([^,]+),/i);
if (volMatch) out.volume = ZU.trimInternal(volMatch[1]);
var issMatch = textContent.match(/Issue\s+([^,]+),/i);
if (issMatch) out.issue = ZU.trimInternal(issMatch[1]);
var dateMatch = textContent.match(/Issue\s+[^,]+,\s*([^,]+),\s*Page/i) || textContent.match(/,\s*([^,]+),\s*Page/i);
if (dateMatch) out.date = ZU.trimInternal(dateMatch[1]);
var pageMatch = textContent.match(/Page\s+([0-9A-Za-z-]+)/i);
if (pageMatch) out.pages = ZU.trimInternal(pageMatch[1]);
return out;
}
function dedupeFirst(arr) {
return arr.find(Boolean) || "";
}
function fixTitleCase(str) {
if (!str) return str;
var letters = str.replace(/[^A-Za-z]/g, "");
if (!letters) return str;
var uppers = (letters.match(/[A-Z]/g) || []).length;
var upperRatio = uppers / letters.length;
if (upperRatio > 0.6) {
return ZU.capitalizeTitle(str.toLowerCase(), true);
}
return str;
}
function pagesFrom(start, end, meta) {
var s = ZU.trimInternal(start);
var e = ZU.trimInternal(end);
var m = ZU.trimInternal(meta);
if (m) return m;
if (s && e && s !== e) return s + "-" + e;
if (s) return s;
return "";
}
function canonicalURL(doc) {
var link = doc.querySelector('link[rel="canonical"]');
if (link && link.href) return link.href;
var og = doc.querySelector('meta[property="og:url"]');
if (og && og.content) return og.content;
return "";
}
/** BEGIN TEST CASES **/
var testCases = [
{
@@ -185,9 +367,11 @@ var testCases = [
"title": "Coup in Argentina",
"creators": [],
"date": "1944-02-18",
"extra": "Volume: CXXXVII\nIssue: 41",
"libraryCatalog": "Papers Past",
"pages": "5",
"publicationTitle": "Evening Post",
"rights": "Stuff Ltd is the copyright owner for the Evening Post. You can reproduce in-copyright material from this newspaper for non-commercial use under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International licence (CC BY-NC-SA 4.0). This newspaper is not available for commercial use without the consent of Stuff Ltd. For advice on reproduction of out-of-copyright material from this newspaper, please refer to the Copyright guide.",
"url": "https://paperspast.natlib.govt.nz/newspapers/EP19440218.2.61",
"attachments": [
{
@@ -203,17 +387,19 @@ var testCases = [
},
{
"type": "web",
"url": "https://paperspast.natlib.govt.nz/newspapers/NZH19360721.2.73.1?query=argentina",
"url": "https://paperspast.natlib.govt.nz/newspapers/MT19390701.2.6.3",
"items": [
{
"itemType": "newspaperArticle",
"title": "La Argentina",
"title": "Inter-School Basketball And Rugby Football",
"creators": [],
"date": "1936-07-21",
"date": "1939-07-01",
"extra": "Volume: 64\nIssue: 153",
"libraryCatalog": "Papers Past",
"pages": "9",
"publicationTitle": "New Zealand Herald",
"url": "https://paperspast.natlib.govt.nz/newspapers/NZH19360721.2.73.1",
"pages": "2",
"publicationTitle": "Manawatu Times",
"rights": "Stuff Ltd is the copyright owner for the Manawatu Times. You can reproduce in-copyright material from this newspaper for non-commercial use under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International licence (CC BY-NC-SA 4.0). This newspaper is not available for commercial use without the consent of Stuff Ltd. For advice on reproduction of out-of-copyright material from this newspaper, please refer to the Copyright guide.",
"url": "https://paperspast.natlib.govt.nz/newspapers/MT19390701.2.6.3",
"attachments": [
{
"title": "Snapshot",
@@ -259,7 +445,7 @@ var testCases = [
"items": [
{
"itemType": "letter",
"title": "1 Page Written 19 Jun 1873 by James Mackay in Hamilton City to Sir Donald Mclean in Wellington",
"title": "1 page written 19 Jun 1873 by James Mackay in Hamilton City to Sir Donald McLean in Wellington",
"creators": [
{
"firstName": "Mackay",
@@ -292,6 +478,32 @@ var testCases = [
"seeAlso": []
}
]
},
{
"type": "web",
"url": "https://paperspast.natlib.govt.nz/parliamentary/AJHR1899-I.2.4.2.3",
"items": [
{
"itemType": "report",
"title": "Rabbits and Rabbitskins, Exported from Colony During Years 1894 to 1898, and Number and Value Thereof.",
"creators": [],
"libraryCatalog": "Papers Past",
"url": "https://paperspast.natlib.govt.nz/parliamentary/AJHR1899-I.2.4.2.3",
"attachments": [
{
"title": "Snapshot",
"mimeType": "text/html"
},
{
"title": "Image",
"mimeType": "image/jpeg"
}
],
"tags": [],
"notes": [],
"seeAlso": []
}
]
}
]
/** END TEST CASES **/

View File

@@ -3,12 +3,12 @@
"translatorType": 4,
"label": "Prime 9ja Online",
"creator": "VWF",
"target": "^https?://(www\\.)?prime9ja\\.com\\.ng/\\d{4}/\\d{2}/[^/]+\\.html",
"target": "^https?://(www\\.|pidgin\\.)?prime9ja\\.com\\.ng/",
"minVersion": "5.0",
"maxVersion": null,
"priority": 100,
"inRepository": true,
"lastUpdated": "2025-08-08 17:40:00"
"lastUpdated": "2025-10-24 19:45:00"
}
/*
@@ -34,38 +34,66 @@
***** END LICENSE BLOCK *****
*/
function detectWeb(doc, _url) {
let jsonLdNodes = doc.querySelectorAll('script[type="application/ld+json"]');
for (let node of jsonLdNodes) {
function meta(doc, nameOrProp) {
let m = doc.querySelector('meta[property="' + nameOrProp + '"]')
|| doc.querySelector('meta[name="' + nameOrProp + '"]');
return m ? m.getAttribute('content') : '';
}
function parseJSONLD(doc) {
let nodes = doc.querySelectorAll('script[type="application/ld+json"]');
for (let node of nodes) {
let txt = node.textContent.trim();
if (!txt) continue;
try {
let data = JSON.parse(node.textContent);
let type = data['@type'];
if (typeof type === 'string' && type.endsWith('NewsArticle')) {
return 'newspaperArticle';
let parsed = JSON.parse(txt);
let candidates = [];
if (Array.isArray(parsed)) {
candidates = parsed;
}
if (Array.isArray(type) && type.some(t => typeof t === 'string' && t.endsWith('NewsArticle'))) {
return 'newspaperArticle';
else if (parsed['@graph'] && Array.isArray(parsed['@graph'])) {
candidates = parsed['@graph'];
}
else if (parsed.mainEntity) {
candidates = [parsed.mainEntity, parsed];
}
else {
candidates = [parsed];
}
for (let cand of candidates) {
if (!cand) continue;
let t = cand['@type'] || cand.type;
if (!t) continue;
if (typeof t === 'string') {
if (t.includes('NewsArticle')) {
return cand;
}
}
else if (Array.isArray(t)) {
for (let tt of t) {
if (typeof tt === 'string' && tt.includes('NewsArticle')) {
return cand;
}
}
}
}
}
catch (e) {
// ignore JSON parsing errors
// ignore malformed JSON-LD
}
}
if (getSearchResults(doc, true)) {
return 'multiple';
}
return false;
return null;
}
function getSearchResults(doc, checkOnly) {
let items = {};
let found = false;
let rows = doc.querySelectorAll('a.entry-title[href*="/202"]');
// generic year pattern in path for article links
let rows = doc.querySelectorAll('a[href*="/20"]');
for (let row of rows) {
let href = row.href;
let title = ZU.trimInternal(row.textContent);
let title = ZU.trimInternal(row.textContent || row.title || '');
if (!href || !title) continue;
if (checkOnly) return true;
found = true;
@@ -74,66 +102,186 @@ function getSearchResults(doc, checkOnly) {
return found ? items : false;
}
async function doWeb(doc, url) {
if (detectWeb(doc, url) === 'multiple') {
let items = await Zotero.selectItems(getSearchResults(doc, false));
if (!items) return;
for (let url of Object.keys(items)) {
await scrape(await requestDocument(url));
}
}
else {
await scrape(doc, url);
}
function isIndexURL(url) {
return url && url.includes('/search/label/');
}
async function scrape(doc, url = doc.location.href) {
let item = new Zotero.Item('newspaperArticle');
let jsonLdNodes = doc.querySelectorAll('script[type="application/ld+json"]');
let data = null;
function detectWeb(doc, url) {
url = url || doc.location.href;
for (let node of jsonLdNodes) {
try {
let parsed = JSON.parse(node.textContent);
let type = parsed['@type'];
if (
type && typeof type === 'string' && type.endsWith('NewsArticle')
|| Array.isArray(type) && type.some(t => typeof t === 'string' && t.endsWith('NewsArticle'))
) {
data = parsed;
break;
}
}
catch (e) {}
// 1) JSON-LD NewsArticle -> single article
let j = parseJSONLD(doc);
if (j) {
return 'newspaperArticle';
}
if (data) {
item.title = ZU.unescapeHTML(data.headline || text(doc, 'h1.entry-title'));
item.ISSN = '3092-8907';
item.abstractNote = ZU.unescapeHTML(data.description || '');
item.date = data.datePublished || '';
item.language = data.inLanguage || 'en';
item.url = data.url || url;
item.publicationTitle = (data.publisher && data.publisher.name) || 'Prime 9ja Online';
item.place = 'Nigeria';
// 2) explicit index/list URL
if (isIndexURL(url)) {
return 'multiple';
}
// 3) Use the standard getSearchResults() heuristic for listing pages
if (getSearchResults(doc, true)) {
// If page also clearly looks like an article, prefer article
if (meta(doc, 'article:published_time') || meta(doc, 'og:type') || text(doc, 'h1.entry-title') || doc.querySelector('[itemprop="articleBody"]')) {
return 'newspaperArticle';
}
return 'multiple';
}
// 4) meta-based hints
if (meta(doc, 'article:published_time')) {
return 'newspaperArticle';
}
let ogType = (meta(doc, 'og:type') || '').toLowerCase();
if (ogType === 'article') {
return 'newspaperArticle';
}
// 5) fallback selectors
if (text(doc, 'h1.entry-title')
|| text(doc, 'h1.s-title')
|| doc.querySelector('[itemprop="articleBody"]')
|| doc.querySelector('article.post')) {
return 'newspaperArticle';
}
return false;
}
async function doWeb(doc, url) {
url = url || doc.location.href;
let mode = detectWeb(doc, url);
if (mode === 'multiple') {
let items = getSearchResults(doc, false);
if (!items) return;
let selected = await Zotero.selectItems(items);
if (!selected) return;
for (let u of Object.keys(selected)) {
await scrape(await requestDocument(u));
}
}
else if (mode === 'newspaperArticle') {
await scrape(doc, url);
}
// else do nothing
}
async function scrape(doc, url) {
url = url || doc.location.href;
let item = new Zotero.Item('newspaperArticle');
let data = parseJSONLD(doc);
// If JSON-LD present, prefer it
if (data) {
item.title = ZU.unescapeHTML(
data.headline
|| data.name
|| meta(doc, 'og:title')
|| text(doc, 'h1.entry-title')
|| text(doc, 'h1.s-title')
|| ''
);
item.abstractNote = ZU.unescapeHTML(
data.description
|| meta(doc, 'og:description')
|| ''
);
item.url = data.url || meta(doc, 'og:url') || url;
item.language = data.inLanguage || meta(doc, 'og:locale') || 'en';
// --- date: use ZU.strToISO() to normalize if possible ---
let rawJsonDate = data.datePublished || data.dateCreated || '';
if (rawJsonDate) {
// Prefer Zotero's normalization (handles many formats and keeps timezone when present)
let isoFromZU = ZU.strToISO(rawJsonDate);
if (isoFromZU) {
item.date = isoFromZU;
}
else {
// if ZU couldn't parse, keep raw (often already ISO with TZ)
item.date = rawJsonDate;
}
}
// --- authors from JSON-LD (skip organisations) ---
if (data.author) {
if (Array.isArray(data.author)) {
for (let author of data.author) {
if (author.name) {
item.creators.push(ZU.cleanAuthor(author.name, 'author'));
let authors = Array.isArray(data.author) ? data.author : [data.author];
for (let a of authors) {
let name = (a && (a.name || a['@name'] || a)) || '';
if (name) {
let lower = name.toString().toLowerCase();
if (/news agency|agency|news desk|publish desk|prime 9ja|prime9ja|online media|media|staff|bureau/i.test(lower)) {
// skip org-like bylines
}
else {
item.creators.push(ZU.cleanAuthor(name.toString(), 'author'));
}
}
}
else if (data.author.name) {
item.creators.push(ZU.cleanAuthor(data.author.name, 'author'));
}
}
// DOM/meta fallbacks for anything missing
if (!item.title || !item.title.trim()) {
item.title = ZU.unescapeHTML(
meta(doc, 'og:title')
|| text(doc, 'h1.entry-title')
|| text(doc, 'h1.s-title')
|| text(doc, 'title')
|| ''
);
}
if (!item.abstractNote || !item.abstractNote.trim()) {
item.abstractNote = ZU.unescapeHTML(
meta(doc, 'og:description')
|| meta(doc, 'description')
|| ''
);
}
// If date still empty, try article:published_time meta (often ISO)
if (!item.date || !item.date.trim()) {
let metaDate = meta(doc, 'article:published_time');
if (metaDate) {
let isoDate = ZU.strToISO(metaDate);
if (isoDate) {
item.date = isoDate;
}
else {
item.date = metaDate;
}
}
else {
let authorText = text(doc, 'span[itemprop="name"]');
if (authorText) {
item.creators.push(ZU.cleanAuthor(authorText, 'author'));
}
}
if (!item.url || !item.url.trim()) {
item.url = meta(doc, 'og:url') || url;
}
if (!item.publicationTitle) {
item.publicationTitle = 'Prime 9ja Online';
}
if (!item.ISSN) {
item.ISSN = '3092-8907';
}
// If no creators yet, try common DOM byline selectors (skip org-like)
if (item.creators.length === 0) {
let cand = meta(doc, 'article:author')
|| text(doc, '.meta-author-author')
|| text(doc, '.meta-author')
|| text(doc, '.author-name')
|| text(doc, '.byline a')
|| text(doc, '.meta-el.meta-author a');
if (cand && !/news agency|agency|news desk|publish desk|prime 9ja|prime9ja|online media|media|staff|bureau/i.test(cand.toLowerCase())) {
item.creators.push(ZU.cleanAuthor(cand, 'author'));
}
}
@@ -142,6 +290,8 @@ async function scrape(doc, url = doc.location.href) {
title: 'Snapshot'
});
item.place = 'Nigeria';
item.complete();
}
@@ -161,10 +311,9 @@ var testCases = [
"creatorType": "author"
}
],
"date": "2025-05-24T18:10:00+01:00",
"date": "2025-05-24",
"ISSN": "3092-8907",
"abstractNote": "AKURE —  The Ondo State Governorship Election Petitions Tribunal will deliver its verdict on June 4 in the series of suits challenging the election of Governor Lucky Aiyedatiwa, who emerged victorious in the last gubernatorial poll. Justice Benson Ogbu, wh…",
"language": "en",
"abstractNote": "AKURE —  The Ondo State Governorship Election Petitions Tribunal will deliver its verdict on June 4 in the series of suits challenging the e...",
"libraryCatalog": "Prime 9ja Online",
"place": "Nigeria",
"publicationTitle": "Prime 9ja Online",
@@ -195,10 +344,9 @@ var testCases = [
"creatorType": "author"
}
],
"date": "2025-05-27T01:11:00+01:00",
"date": "2025-05-27",
"ISSN": "3092-8907",
"abstractNote": "On “CFMF” — the fourth track from Davidos 2025 album 5ive —\n the artist trades club-ready bravado for inward reflection. Featuring\n songwriting contributions from DIENDE and Victony, the track is a slow,\n measured entry in the Afro-R&B lane, b…",
"language": "en",
"abstractNote": "On “CFMF” — the fourth track from Davidos 2025 album 5ive the artist trades club-ready bravado for inward reflection. Featuri...",
"libraryCatalog": "Prime 9ja Online",
"place": "Nigeria",
"publicationTitle": "Prime 9ja Online",
@@ -230,10 +378,9 @@ var testCases = [
"creatorType": "author"
}
],
"date": "2025-05-23T22:38:00+01:00",
"date": "2025-05-23",
"ISSN": "3092-8907",
"abstractNote": "ABUJA — A major network of cybercriminals allegedly responsible for infiltrating the Computer-Based Testing (CBT) infrastructure of Nigerias national examinations has been dismantled, with over 20 suspects currently in custody, security officials have c…",
"language": "en",
"abstractNote": "ABUJA — A major network of cybercriminals allegedly responsible for infiltrating the Computer-Based Testing (CBT) infrastructure of Nigeria...",
"libraryCatalog": "Prime 9ja Online",
"place": "Nigeria",
"publicationTitle": "Prime 9ja Online",
@@ -250,40 +397,6 @@ var testCases = [
"seeAlso": []
}
]
},
{
"type": "web",
"url": "https://www.prime9ja.com.ng/2025/03/china-begins-trial-of-mrna-tb-vaccine.html",
"items": [
{
"itemType": "newspaperArticle",
"title": "China Begins Trial of mRNA TB Vaccine",
"creators": [
{
"firstName": "News Agency of",
"lastName": "Nigeria",
"creatorType": "author"
}
],
"date": "2025-03-24T16:58:00+01:00",
"ISSN": "3092-8907",
"abstractNote": "A newly developed mRNA vaccine for tuberculosis, created in China, has entered clinical trials at Beijing Chest Hospital. The trial, which commenced on Monday, marks a significant step in the countrys efforts to combat tuberculosis, according to the Bei…",
"language": "en",
"libraryCatalog": "Prime 9ja Online",
"place": "Nigeria",
"publicationTitle": "Prime 9ja Online",
"url": "https://www.prime9ja.com.ng/2025/03/china-begins-trial-of-mrna-tb-vaccine.html",
"attachments": [
{
"title": "Snapshot",
"mimeType": "text/html"
}
],
"tags": [],
"notes": [],
"seeAlso": []
}
]
}
]
/** END TEST CASES **/

View File

@@ -9,13 +9,13 @@
"priority": 100,
"inRepository": true,
"browserSupport": "gcsibv",
"lastUpdated": "2024-07-22 19:10:00"
"lastUpdated": "2025-10-20 16:20:00"
}
/*
***** BEGIN LICENSE BLOCK *****
Copyright © 2010-2023 Jonas Schrieb and contributors
Copyright © 2010-2025 Jonas Schrieb and contributors
This file is part of Zotero.
@@ -154,7 +154,7 @@ async function scrape(doc, url = doc.location.href) {
case "ps":
// There are entries where a format button is present, but the URL points to the ePrint home page
if (format.href.slice(-3) != ".ps") continue;
attachment.mimeType = "application/ps";
attachment.mimeType = "application/postscript";
break;
default:
// For security reasons, avoid adding unknown formats (allowlist approach)
@@ -520,7 +520,7 @@ var testCases = [
"url": "https://eprint.iacr.org/2002/195",
"attachments": [
{
"mimeType": "application/ps",
"mimeType": "application/postscript",
"title": "Full Text PS"
}
],
@@ -580,7 +580,7 @@ var testCases = [
"title": "Full Text PDF"
},
{
"mimeType": "application/ps",
"mimeType": "application/postscript",
"title": "Full Text PS"
}
],

Binary file not shown.

Binary file not shown.

Binary file not shown.