Open access peer-reviewed chapter

Progress of Studies of Citations and PageRank

Written By

Wataru Souma and Mari Jibu

Submitted: 27 February 2018 Reviewed: 21 April 2018 Published: 18 July 2018

DOI: 10.5772/intechopen.77389

From the Edited Volume

Scientometrics

Edited by Mari Jibu and Yoshiyuki Osabe

Chapter metrics overview

1,049 Chapter Downloads

View Full Metrics

Abstract

A number of citations have been used to measure the value of paper. However, recently, Google’s PageRank is also extensively applied to quantify the worth of papers. In this chapter, we summarize the recent progress of studies on citations and PageRank. We also show our latest investigations of the citation network consisting of 34,666,719 articles and 591,321,826 citations. We propose the generalized beta distribution of the second kind to explain the distribution of citation and introduce the stochastic model with aging effect and super preferential attachment. Furthermore, we clarify the positive linear relation between citations and Google’s PageRank. By using this relationship as the benchmark to classify papers, we extract extremely prestigious papers, popular papers, and rising papers.

Keywords

  • citation
  • PageRank
  • SCI-E
  • fat tail
  • stochastic model
  • prestigious papers
  • popular papers
  • rising papers

1. Introduction

Citation analysis has a long history. Recently, Hou [1] applied the new method called the reference publication year spectroscopy (RPYS) to 2543 papers including 56,392 references regarding citation analysis in Science Citation Index Expand (SCI-E) and Social Science Citation Index (SSCI) data from 1970 to July 2016. This investigation clarified that the development of citation analysis is divided into five periods: before 1990, 1901–1950, 1951–1970, 1971–2000, and 2001–2016. In this chapter, we focused on the distribution of citations which were introduced by Price [2] and extensively investigated in the third period, that is, 1950s–1970s. In this chapter, we consider that the number of citations expresses the popularity of papers.

The fifth period, that is, 2001–2016, is characterized by a period of rapid expansion and diversified directions. In this period, many conceptions have been introduced, for example, scientific evaluation indices, citation networks, information visualization, and citing behaviors. A variety of new impact measures has been proposed based on social network analysis in sociology and of network science originated from physics, mathematics, and information science. Bollen [3] summarized 39 impact measures and investigated the correlation between them by using the principal component analysis. Then, Bollen [3] indicated that the notion of scientific impact is a multidimensional construct that cannot be adequately measured by any single indicator, although some measures are more suitable than others.

In this chapter, we focus on the Google’s PageRank which is first proposed by Brin and Page [4] to obtain the list of useful web pages for queries by users. Thus, if we define the usefulness of web page as the number of links cited by the other web pages, the search engine should propose the list of portal sites, that is, popular web pages. Hence, this list is useless for web users. To overcome this problem, based on the concept of vote, Brin and Page [4] defined the usefulness of web pages as the number of votes from the linking web pages. In the algorithm of Google’s PageRank, the number of ballets is proportional to the usefulness of the web page, that is, the useful web page has many ballets. As a result, the useful web page collects votes from the useful web pages. Thus, the Google’s PageRank expresses the prestige of web pages. We consider that this characteristic of Google’s PageRank is valid for the case of citation network.

This chapter is organized as follows. In Section 2, we explain characteristics of dataset used in this chapter. The distribution of citation and the stochastic model of citation network are elucidated in Section 3. In Section 4, we introduce Google’s PageRank and calculate it. We consider the correlation between citation and PageRank in Section 5. Section 6 is devoted to conclusions.

Advertisement

2. Data

In this chapter, we use Science Citation Index Expand (SCI-E) provided by Clarivate Analytics Co., Ltd. This dataset contains bibliographic information of scientific papers published from 1900 to the present. However, due to limited research budget of authors, we use the dataset from 1981 to 2015 in this chapter. This dataset contains 34,666,719 papers and 591,321,826 citations.

In this chapter, we denote the number of papers published in the year t as nt. Figure 1 depicts the change of nt. In this figure, nt almost monotonically increased from 1981 to 2013 and decreased after 2013. However, this behavior of nt is fake. This is because the dataset was made at the beginning of 2016 and it partially contains papers published in 2014 and 2015. It takes a few years for all the papers to be included in SCI-E.

Figure 1.

Yearly change of the number of e-articles.

If we consider papers as nodes and regard citations from a citing paper to a cited paper as directed links, we can consider the dataset of citations as a directed network. We call such a network as the citation network. The citation network consists of many connected components. We denote the number of nodes contained in connected components as c and represent a frequency of c as Fc. Figure 2 depicts Fc. We can find that there is the largest connected component. This largest connected component consists of 34,428,322 nodes which are 99.3% of the total number of papers contained in the dataset, and of 591,177,607 links which are 99.98% of the total number of citations contained in the dataset. In the following section, we focus on the largest connected component.

Figure 2.

Distribution of the size of connected components.

Advertisement

3. Distribution and dynamics of citations

In this chapter, we argue for the distribution of the citations and stochastic models which lead to the citation network.

3.1. Distribution

The number of citations is represented by the number of in-degree, k, of the corresponding nodes. Figure 3 is a double-logarithmic scale plot of the rank size distribution, Rk, of citations. The right-tail part of the distribution decreases almost monotonically. This means that this part follows a power-law distribution, that is, Rkkμ. Here, the exponent μ is called Pareto exponent originated in the name of Italian economist Vilfredo Pareto. The dashed line in Figure 3 is the reference line which is the power law distribution with μ=2, that is, Rμk2.

Figure 3.

Rank size distribution, Rk, of the number of citations, k.

Pareto [5] first investigated the fat-tail behavior of the right-tail part of personal income and wealth distributions. After Pareto, many types of distribution functions have been mainly proposed in the field of economics, especially in the investigation of personal income distribution (e.g., see [6, 7]). On the other hand, in the field of scientometrics, Price [2] first applied the power law distribution to the citation network and found that the distribution of the number of citing (the number of out-going degree in terms of network science) follows the power law distribution with μ=1 and that of the number of citations (the number of incoming degree in terms of network science) obeys the power law distribution with μ=1.5 or μ=2. The latter result is same as the reference line in Figure 3.

Rednar [8] investigated papers published in 1981 and cataloged by the Institute for Science Information (783,339 papers) and 20 years of publications in Physical Review D, vols. 11–50 (24,296 papers) and found that the right-tail part of both distributions of citation follows the power law distribution with μ=2. This result is same as Price [2] and the reference line in Figure 3. Rednar [9] investigated 110 years (from July 1893 through June 2003) of publications in Physical Review, the topical journals Physical Review A-E, Physical Review Letters, Review of Modern Physics, and Physical Review Special Topics: Accelerators and Beam (353,268 papers and 3,110,839 citations) and found that the entire distribution of the number of citation follows a log-normal distribution.

Albarrán and Ruiz-Castillo [10] studied 5 years (1998–2002) of publications in Web of Science (3.7 million papers) and found that the power law distributions of the right-tail part of the distribution of citation are not rejected for 17 of the 22 scientific fields of Web of Science. Albarrán et al. [11] investigated same dataset of Albarrán and Ruiz-Castillo [10] and found that the power law distributions of the right-tail part of the distribution of citation are not rejected for 140 of the 219 scientific sub-fields of Web of Science. Recently, Brzezinski [12] investigated scientific papers published between 1998 and 2002 drawn from Scopus and found that the power law hypothesis is rejected for half of the Scopus field of science.

Although there are many researches besides the studies stated above, there are no studies that used vast amounts of data to approach the overall picture of citation distribution, like this chapter. The light gray line in Figure 3 is the best fit by the generalized Beta distribution of the second kind (GB2) (or called the beta prime distribution) (e.g., see [13, 14]) with the probability density function:

fkabμν=ak1bBμν1+kbaμ+ν,E1

with a=0.7, b=15.2, μ=2.0, ν=3.0. Here, Bμν is the Beta function.

Table 1 depicts the top 20 papers of citation. In this table, rk is the rank of citation, k is the number of citations at the beginning of 2016, and k', which is enclosed in parentheses, is the number of citations at the beginning of January 2018. The characteristics of this list are that the subjects of papers are almost Biochemistry & Molecular Biology and that the publication years of papers are relatively old.

rkkk'First authorTitleJournal, YearSubject
160,967
(62,404)
P. ChomczynskiSingle-step method of RNA isolation by …Analytical Biochemistry, 1987Biochemistry & Molecular Biology;
Chemistry
255,143
(65,452)
A.D. BeckeDensity-functional thermochemistry. 3…Journal of Chemical Physics, 1993Chemistry;
Physics
352,035
(61,637)
C.T. LeerDevelopment of the Colle-Salvetti correlation…Physical Review B, 1988Physics
445,349
(64,127)
G.M. SheldrickA short history of SHELXActa Crystallographica Section A, 2008Chemistry; Crystallography
544,915
(64,682)
J.P. PerdewGeneralized gradient approximation…Physical Review Letters, 1996Physics
642,407
(46,286)
J.D. ThompsonClustal-W – Improving the sensitivity of …Nucleic Acids Research, 1994Biochemistry & Molecular Biology
739,281
(44,765)
S.F. AltschulGapped BLAST and PSI-BLAST: a new…Nucleic Acids Research, 1997Biochemistry & Molecular Biology
837,133
(48,832)
S.F AltschulBasic local alignment search toolJournal of Molecular Biology, 1990Biochemistry & Molecular Biology
936,988
(56,581)
K.J. LivakAnalysis of relative gene expression data…Methods, 2001Biochemistry & Molecular Biology
1032,657
(37,653)
N. SaitouThe neighbor-joining method—A new …Molecular Biology and Evolution, 1987Biochemistry & Molecular Biology;
Evolutionary Biology; Genetics & Heredity
1130,032
(33,046)
Z. Otwinowski,Processing of X-ray diffraction data collected…Macromolecular Crystallography,
1997
Biochemistry & Molecular Biology
1229,615
(34,235)
A.D. BeckeadDensity-functional exchange-energy …Physical Review A, 1988Physics
1325,987
(29,094)
J.D. Thompson,The CLUSTAL_X windows interface: flexible…Nucleic Acids Research, 1997Biochemistry & Molecular Biology
1425,880
(33,287)
R.M. BaronThe moderator mediator variable distinction…Journal of Personality and Social Psychology, 1986Psychology
1525,696
(29,809)
J.M. BlandStatistical methods for assessing agreement…Lancet, 1986General & Internal Medicine
1625,340
(30,673)
T. MosmannRapid colorimetric assay for cellular …Journal of Immunological Methods, 1983Biochemistry & Molecular Biology;
Immunology
1724,308
(28,923)
S. IijimaHelical microtubules of graphitic carbonNature, 1991Science & Technology - Other Topics
1823,894
(34,400)
G. KresseEfficient iterative schemes for ab initio total-energy calculations using …Physical Review B, 1996Physics
1923,294
(27,062)
J. FelsensteinConfidence-limits on phylogenies – an approach using the bootstrapEvolution, 1985Environmental Sciences & Ecology;
Evolutionary Biology; Genetics & Heredity
2021,456
(21,529)
A.P. FeinbergA technique for radiolabeling DNA restriction endonuclease fragments …Analytical Biochemistry, 1983Biochemistry & Molecular Biology;
Chemistry

Table 1.

Top 20 papers of citation.

3.2. Stochastic models

Simon [15] proposed the stochastic model, the so-called Simon’s model, to elucidate the empirical distributions: distribution of words in prose samples by their frequency of occurrence, distributions of scientists by number of papers published, distributions of cities by population, distributions of income by size, and distributions of biological genera by number of species. Although assumptions of Simon’s model are written in terms of word frequencies, we can express them in terms of network science as follows: assumption I—The probability that a node gets new link is proportional to the number of its degrees, that is, rich get richer or Matthew effect (e.g., see [16]), and assumption II—We add a new node with a constant probability γ. Simon’s model elucidates the fact that the right-tail part of the distribution follows the power law distribution with μ=1/1γ.

Price [17] generalized Simon’s model, the so-called Price’s model, to explain the growth of the citation networks. Barabáshi and Albert [18] introduced the stochastic model, the so-called BA model, based on two concepts: preferential attachment and growth, which corresponds to assumptions I and II of Simon’s model, respectively. BA model is the case of γ=1/2 of Simon’s model and derives the power law distribution with μ=2. Jeong et al. [19] extended BA model to include an aging effect and a class of homogeneous connection kernels. Golosovsky and Solomon [20, 21] further extended to include an effect of initial attractivity.

Here, we use the model proposed by Jeong et al. [19] and check the aging effect and homogeneity of the growth of citation network. If we denote the number of degree of node i as ki, the time evolution of ki is obtained by

dkidt=Aitkiα.E2

Here, Ait is an aging factor and α>0 is an unknown scaling exponent. Krapivsky et al. [22] have shown, for the case without the aging factor, for α=1 (linear preferential attachment) the model is just same as BA model and derives the power law distribution with μ=2. For α<1, the model derives the stretched exponential distribution, and for α>1 (super preferential attachment) a single node connects to nearly all other nodes, akin to gelation.

If we discretize the model and consider Δt=1 year, Eq. (2) is written by

Δki=Aikiα,E3

We investigate the dynamics of growth for 44,932 papers published in 1985. The left panel of Figure 4 depicts the double-logarithmic scale scatter plot of the number of citations, kii=1244932, as of 1988 and the change of the number of citations, Δki, from 1988 to 1999. If we divide ki into bins with logarithmically equal separation, k¯ and calculate the average value of Δki for each bin, k¯, we obtain the red dots which are depicted in the right pane of Figure 4. By these manipulations, Eq. (3) is written by

Δk¯=Atk¯α.E4

Figure 4.

Left: Correlation between the number of citations and increase of the number of citations. Right: Change of the relation between mean citation and mean difference of citation.

The red and solid line in the right panel of Figure 4 corresponds to the linear regression of red dots by Eq. (4). The slope of this line corresponds to α and the intercept of it corresponds to At. In Figure 4, blue, green, and magenta dots are analysis for the year 1993, 2003, and 2010, respectively.

The left panel of Figure 5 depicts the change of At. The solid line in this figure corresponds to the regression by the power law function given by Att1.15. The right panel of Figure 5 depicts the change of α. This figure shows that α>1 for the entire period in which we investigated. From this analysis, we realize that the citation network has the characteristics of super preferential attachment; therefore, it is expected that a single node connects to nearly all other nodes. However, the aging effect prevents the citation network from an oligopolistic network.

Figure 5.

Left: Change of the aging effect. Right: Change of homogeneous factor.

Advertisement

4. Distribution of PageRank

Google’s PageRank is proposed by Brin and Page [4]. The Google number, Gi, of paper i is defined by the recursion formula (from Chen et al. [23]):

Gi=1dinnjGjkj+dN.E5

Here, N=34428322 is the total number of articles contained in the largest connected component of the citation network. The sum is over the neighboring nodes j in which a link points to node i. In Eq. (5), d is a free parameter that controls the convergence and effectiveness of the recursion calculation. In the original Google’s PageRank [4], d=0.15 is adopted and appropriate for the case of world wide web. On the other hand, d=0.5 is adopted in [23] and appropriate for the case of citation network.

Figure 6 depicts the double-logarithmic scale plot of the rank size distribution of Google number, RG. In this figure, filled circles correspond to the case of d=0.5 and open squares correspond to that of d=0.15. The dashed line in this figure is the reference line and represents the power law distribution with μ=2. This value of exponent is same as the case of distribution of citation as depicted in Figure 3. Although the rank size distribution of Google number depends on d, the Google’s PageRank, rG, is almost the same as depicted in Figure 7. This figure is the double-logarithmic scale plot of rG, and the abscissa is rG in the case of d=0.5, and the ordinate is rG in the case of d=0.15.

Figure 6.

Rank size distribution, RG, of the Google number, G.

Figure 7.

Correlation between the PageRank, rG, in the case of d=0.5 and d=0.15.

Table 2 depicts the top 20 lists of the Google’s PageRank. The characteristics of this list are that papers belong to many subjects and that the publication years of papers are relatively old.

rGG105rkkk'rk/rGFirst authorTitleJournal, YearSubject
17.1314445,349
(64,127)
4G.M. SheldrickA short history of SHELXActa Crystallographica Section A, 2008Chemistry; Crystallography
23.4074160,967
(62,404)
0.5P. ChomczynskiSingle-step method of RNA isolation by acid…Analytical Biochemistry,
1987
Biochemistry & Molecular Biology;
Chemistry
33.12102618,109
(18,789)
8.67G.M. SheldrickPhase annealing in SHELX-90 – direct methods for…Acta Crystallographica Section A, 1990Chemistry; Crystallography
42.8852255,143
(65,452)
0.5A.D. BeckeDensity-functional thermochemistry. 3…Journal of Chemical Physics, 1993Chemistry; Physics
52.85786412,824
(14,640)
12.8J. KennedyParticle swarm optimizationIEEE International Conference, 1995Computer Science
62.78791525,696
(29,809)
2.5J.M. BlandStatistical methods for assessing agreement…Lancet, 1986General & Internal Medicine
72.6547352,035
(61,637)
0.43C.T. LeeDevelopment of the Colle-Salvetti correlation…Physical Review B, 1988Physics
82.57457611,685
(18,640)
9.5D.G. LoweDistinctive image features from scale-invariant…International Journal of computer Vision, 2004Computer Science
92.4425544,915
(64,682)
0.56J.P PerdewGeneralized gradient approximation made…Physical Review Letters, 1996Physics
102.38904614,128
(17,990)
4.6S. KirkpatrickOptimization by simulated annealingScience, 1983Science & Technology - Other Topics
112.34301130,032
(33,046)
1Z. OtwinowskiProcessing of X-ray diffraction data…Macromolecular Crystallography, 1997Biochemistry & Molecular Biology
122.32369710,368
(11,590)
8.08F.H. AllenTable of bond lengths determined by X-RAY…Journal of the Chemical Society-Perkin Transactions 2, 1987Chemistry
132.2868642,407
(46,286)
0.56J.D. ThompsonClustal-W - improving the sensitivity of…Nucleic Acids Research,
1994
Biochemistry & Molecular Biology
142.1787837,133
(48,832)
0.57S.F. AltschulBasic local alignment search toolJournal of Molecular Biology, 1990Biochemistry & Molecular Biology
152.1481739,281
(44,765)
0.47S.F. AltschulGapped BLAST and PSI-BLAST: a new…Nucleic Acids Research, 1997Biochemistry & Molecular Biology
162.03191032,657
(37,653)
0.63N. SaitouThe neighbor-joining method – a new method…Molecular Biology and Evolution, 1987Biochemistry & Molecular Biology;
Evolutionary Biology; Genetics & Heredity
171.90811724,308
(28,923)
1S. IijimaHelical microtubules of graphitic carbonNature, 1991Science & Technology - Other Topics
181.86851079775
(10,827)
5.94H.D. FlackOn enantiomorph-polarity estimationActa Crystallographica Section A, 1983Chemistry; Crystallography
191.80018211,242
(12,850)
4.32A.L. SpekSingle-crystal structure validation with the…Journal of Applied Crystallography, 2003Chemistry; Crystallography
201.77961298818
(8849)
6 s.45N. WalkerAn empirical-method for correcting…Acta Crystallographica Section A, 1983Chemistry; Crystallography

Table 2.

Top 20 papers of Google’s PageRank.

Advertisement

5. Correlation between citation and PageRank

Bollen and Rodriquez [24] described that the Institute for Scientific Information (ISI) Impact factor (IF) which is defined as the mean number of citations a journal receives over a two-year period is a metric of popularity and that the Google’s PageRank is a metric of prestige. This concept is also proposed by Chen et al. [23] and Maslov and Redner [25] which investigated all publications in the Physical Review family of journals from 1893 to 2003 and found the linear relation between the Google number and the number of citations. Furthermore, [23, 25] found that some outliers from this linear relation, especially the papers of which the ranking of PageRank is remarkably high and that of citation is slightly high, are universally familiar to physicists [23, 25] called such papers scientific “gems.” Ma et al. [26] applied the concept of [23, 24, 25] to the field of biochemistry and molecular biology from 2000 to 2005. Though these studies investigated the citation network of some selected scientific field, this chapter investigates the citation network consisting of all scientific fields.

Figure 8 depicts the double-logarithmic scale plot of the correlation between the number of citations, k, and the Google number, G. In this figure, the solid gray line represents the mean value G calculated for bins of k with logarithmically equal width. This figure shows that G versus k is smooth and increases linearly with k for k500. Thus, the Google number and citations are almost similar measures characterizing the importance of papers. This result means that prestige (Google number) is proportional to popularity (citations) in many cases.

Figure 8.

Correlation between the number of citations, k, and the Google number, G.

However, there are outliers which have high prestige comparing to popularity. These papers are located above the solid gray line in Figure 8 and are regarded as extremely prestigious papers. If we denote the citation rank as rk and the Google’s PageRank as rG, these extremely prestigious papers are extracted by the order of Google’s PageRank with the constraint given by the ratio rk/rG. Table 3 depicts the top 20 extremely prestigious papers selected by using the constraint rk/rG>10. The characteristic of this list is that the subjects of papers are almost information science.

rGG105rkkk'rk/rGFirst authorTitleJournal, YearSubject
52.85786412,824
(14,640)
12.8J. KennedyParticle swarm optimizationProceedings of IEEE International Conference, 1995Computer Science
221.68612406500 (7458)10.91S.M. AlamoutiA simple transmit diversity technique for wireless…IEEE Journal on Selected Areas in Communications, 1998Engineering; Telecommunications
251.51035164465 (6605)20.64I.F. AkyildizWireless sensor networks: a surveyComputer Networks, 2002Computer Science;
Engineering; Telecommunications
331.41604814611 (6276)14.58Z. PawlakRough setsInternational Journal of Computer & Information Sciences, 1982Information Science & Library Science
361.31697843740 (5402)21.78I.F. AkyildizA survey on sensor networksIEEE Communications Magazine, 2002Engineering; Telecommunications
431.21559983309 (4707)23.21T.R. GruberA translation approach to portable ontology…Knowledge Acquisition, 1993Computer Science;
Information Science & Library Science
481.14328283656 (4463)17.25P. GuptaThe capacity of wireless networksIEEE Transactions on Information Theory, 2000Computer Science;
Engineering
491.138719162441 (2839)39.10S. FloydRandom early detection gateways for congestion…IEEE-ACM Transactions on Networking, 1993Computer Science;
Engineering; Telecommunications
531.110212472991 (3879)23.53G. BianchiPerformance analysis of the IEEE 802.11 distributed…IEEE Journal on Selected Areas in Communications, 2000Engineering; Telecommunications
601.06266084149 (5968)10.13S. HaykinCognitive radio: Brain-empowered wireless…IEEE Journal on Selected Areas in Communications, 2005Engineering; Telecommunications
760.94319673360 (3961)12.72T. MurataPetri nets - properties, analysis and applicationsProceedings of the IEEE, 1989Engineering
790.938817582535 (3702)22.25W.B. Heinzel-manAn application-specific protocol architecture for…IEEE Transactions on Wireless Communications, 2002Engineering; Telecommunications
900.888411903048 (4075)13.22R. AhlswedeNetwork information flowIEEE Transactions on Information Theory, 2000Computer Science;
Engineering
930.876715652691 (3401)16.83T. WiegandOverview of the H.264/AVC video coding standardIEEE Transactions on Circuits and Systems for Video Technology, 2003Engineering
970.859810453245 (4674)10.77M. DorigoAnt system: Optimization by a colony of…IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, 1996Automation & Control Systems;
Computer Science
1160.792327362052 (2426)23.59D. HARELStatecharts - a visual formalism for…Science of Computer Programming, 1987Computer Science
1200.783840591705 (2982)33.83M. WEISERThe Computer for the 21st-centuryScientific American, 1991Science & Technology - Other Topics
1210.779614062840 (4011)11.62S. Deerwester;Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990Computer Science;
Information Science & Library Science
1280.758431651914 (1948)24.73A.E. LevitonStandards in herpetology and ichthyology…Copeia, 1985Zoology
1290.758274091274 (1478)57.43X.Y. WangRoom-temperature all-semiconduc-ting…Physical Review Letters, 2008Physics

Table 3.

Top 20 extremely prestigious papers.

On the other hand, there are also outliers which have low prestige comparing to popularity. These articles are located below the solid gray line in Figure 8 and are regarded as extremely popular papers. These articles are extracted by the order of citation rank with the constraint given by the ratio rG/rk. Table 4 depicts the top 20 extremely popular papers selected by using the constraint rG/rk>5. These articles are divided into two groups. One group contains papers which are published in Nature, Science, and the Proceedings of the National Academy of Science of the United State of America (PNAS). Besides, publication year of these papers are approximately over 10 years ago. Furthermore, the growth rate of citations, k'/k, of those papers are low. The other group includes papers which are mainly published in Cell and are published relatively recently. What is more, the growth rate of citations, k'/k, of those papers are extremely high. Thus, we can regard these papers as rising papers.

rkkk'rGG105rG/rkFirst authorTitleJournal, YearSubject
1258890
(17,192)
6270.32505.02D. HanahanHallmarks of Cancer: The Next GenerationCell, 2011Biochemistry & Molecular Biology;
Cell Biology
2975817
(10,877)
15800.20425.32D.W. HuangSystematic and integrative analysis of large gene list…Nature Protocols, 2008Biochemistry & Molecular Biology
3045747
(9681)
16080.20235.29Y. ZhaoThe M06 suite of density functionals for main…Theoretical Chemistry Accounts, 2008Chemistry
3275533
(8874)
18100.18975.54D.P. BartelMicroRNAs: Target Recognition and…Cell, 2009Biochemistry & Molecular Biology;
Cell Biology
3755147
(6894)
21280.17575.67B.P. LewisConserved seed pairing, often flanked by…Cell, 2005Biochemistry & Molecular Biology;
Cell Biology
4144912
(5825)
25060.16196.05T. JenuweinTranslating the histone codeScience 2001Science & Technology - Other Topics
4194895
(5350)
21230.17595.07P. LiCytochrome c and dATP-dependent formation…Cell, 1997Biochemistry & Molecular Biology;
Cell Biology
5354382
(4604)
28020.15345.24Z.G. XIAOpposing effects of ERK and JNK-P38 map…Science, 1995Science & Technology - Other Topics
5434343
(5864)
31200.14475.75R.C. LEEThe C. elegans heterochronic geneG…Cell, 1993Biochemistry & Molecular Biology;
Cell Biology
5474327
(4633)
28650.15175.24A. HallRho GTPases and the actin cytoskeletonScience, 1998Science & Technology - Other Topics
6004164
(5479)
32690.14115.45S. AkiraPathogen recognition and innate immunityCell, 2006Biochemistry & Molecular Biology;
Cell Biology
6114144
(4888)
35850.13485.87B.D. StrahlThe language of covalent histone modificationsNature, 2000Science & Technology - Other Topics
6404063
(5604)
33590.13905.25M.E. RaichleA default mode of brain functionPNAS, 2001Science & Technology - Other Topics
6454054
(5303)
33260.13985.16E.K. MillerAn integrative theory of prefrontal cortex functionAnnual Review of Neuroscience, 2001Neurosciences & Neurology
6574026
(4967)
35720.13515.44R.O. HynesIntegrins: Bidirectional, allosteric signaling…Cell, 2002Biochemistry & Molecular Biology;
Cell Biology
6614005
(4335)
40960.12626.20S.R. DattaAkt phosphorylation of BAD couples survival…Cell, 1997Biochemistry & Molecular Biology;
Cell Biology
7063912
(5288)
48250.11666.83T. KouzaridesChromatin modifications and their functionCell, 2007Biochemistry & Molecular Biology;
Cell Biology
7513806
(5109)
40960.12625.45M. CorbettaControl of goal-directed and stimulus-driven…Nature Reviews Neuroscience, 2002Neurosciences & Neurology
7523805
(4313)
44460.12135.91A. BrunetAkt promotes cell survival by phosphorylating and…Cell, 1999Biochemistry & Molecular Biology;
Cell Biology
7853739
(4363)
39720.12805.06J.D. FontenotFoxp3 programs the development and…Nature Immunology, 2003Immunology

Table 4.

Top 20 extremely popular papers.

Advertisement

6. Conclusions

We investigated papers published from 1981 to 2015 and contained in SCI-E. The total number of papers is 34,666,719 and that of citations is 591,321,826. We extracted the largest connected component from this dataset. The obtained citation network consists of 34,428,322 nodes (articles) and 591,177,607 links (citations).

The right-tail part of the rank size distribution of citations follows the power law distribution with exponent μ=2, that is, Rkk2. Furthermore, we introduced the generalized beta distribution of the second kind (GB2) as the best-fit function to the whole range of citation distribution. We introduced the stochastic model with growth, preferential attachment, and aging effect. Through the numerical analysis, we obtained the value of the parameter set.

Although the number of citations represent the popularity of papers, Google’s PageRank reflects the prestige of papers. We evaluated Google’s PageRank for the largest connected component which consists of 34,428,322 articles and 591,177,607 link citations. We found that the citations and Google numbers have a positive linear relation. We consider this positive linear relation as a benchmark and selected extremely prestigious and extremely popular papers. We found that the subject of extremely prestigious papers is almost information science. Furthermore, we found that extremely popular papers are divided into popular papers and rising papers.

We conclude this chapter by describing two remaining issues. One concerns the stochastic model. Though we introduce GB2 as the best-fit function to the whole range of citation distribution, there is no stochastic model that explains GB2. The other concerns the weight of links in the citation network. Almost all studies have investigated citation networks as unweighted networks. However, it is possible to define weight of links, for example, similarity between papers.

Advertisement

Acknowledgments

This work is supported by Nihon University College of Science and Technology Grants-in Aid 2012 and 2016. The authors thank the Yukawa Institute of Theoretical Physics at Kyoto University. Discussions during the YITP workshop YITP-W-17-14 on ”Econophysics 2017“ were useful to complete this work.

References

  1. 1. Hou J. Exploration into the evolution and historical roots of citation analysis by referenced publication year spectroscopy. Scientometrics. 2017;110:1437-1452. DOI: 10.1007/s11192-016-2206-9
  2. 2. de Solla Price DJ. Networks of Scientific Papers. Science. 1965;149:510-515. DOI: 10.2307/1716232
  3. 3. Bollen J, Van de Sompel H, Hagberg A, Chute R. A principal component analysis of 39 scientific impact measures. PLoS One. 2009;4:e6022. DOI: 10.1371/journal.pone.0006022
  4. 4. Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems. 1998;30:107-117. DOI: 10.1016/S0169-7552(98)00110-X
  5. 5. Pareto V. Cours d'économie politique: professé a l'université de lausanne - tone second. Rouge: Lausanne F; 1897
  6. 6. Arnold BC. Pareto Distributions. 2nd ed. US: CRC Press; 2015. p. 456. ISBN: 9781466584846
  7. 7. Aoyama H, Fujiwara Y, Ikeda Y, Iyetomi H, Souma W, Yoshikawa H. Macro-Econophysics: New Studies on Economics Networks and Synchronization. UK: Cambridge University Press; 2017. pp. 53-96. ISBN: 9781107198951
  8. 8. Rednar S. How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B. 1998;4:131-134. DOI: 10.1007/s100510050359
  9. 9. Redner S. Citation statistics from 110 years of physical review. Physics Today. 2005;58:49-54. DOI: 10.1063/1.1996475
  10. 10. Albarrán P, Ruiz-Castillo J. References made and citations received by scientific articles. Journal of the American Society for Information Science and Technology. 2011;62:40-49. DOI: 10.1002/asi.21448
  11. 11. Albarrán P, Crespo JA, Ortuño I, Ruiz-Castillo J. The skewness of science in 219 sub-fields and a number of aggregates. Scientometrics. 2011;88:385-397. DOI: 10.1007/s11192-011-0407-9
  12. 12. Brzezinski M. Power laws in citation distributions: Evidence from Scopus. Scientometrics. 2015;103:213-228. DOI: 10.1007/s11192-014-1524-z
  13. 13. McDonald JB. Some generalized functions for the size distribution of income. Econometrica. 1984;52:647-663. DOI: 10.2307/1913469
  14. 14. Kleiber C, Kotz S. Macro-Econophysics: Statistical Size Distributions in Economics and Actuarial Sciences. John Wiley and Sons; 2003. DOI: 10.1002/0471457175.ch2
  15. 15. Simon HA. On a class of skew distribution functions. Biometrika. 1955;42:425-440. DOI: 10.2307/2333389
  16. 16. Merton RK. The Matthew effect in science. Science. 1968;159:56-63. DOI: 10.1126/science.159.3810.56
  17. 17. de Solla Price DJ. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science. 1976;27:292-306. DOI: 10.1002/asi.4630270505
  18. 18. Barabási A-L, Albert R. Emergence of scaling in random networks. Science. 1999;286:509-512. DOI: 10.1126/science.286.5439.509
  19. 19. Jeong H, Néda Z, Barabási AL. Measuring preferential attachment in evolving networks. Europhysics Letters. 2003;61:567-572. DOI: 10.1209/epl/i2003-00166-9
  20. 20. Golosovsky M, Solomon S. Stochastic dynamical model of a growing citation network Based on a Self-Exciting Point Process. Physical Review Letters. 2012;109:098701. DOI: 10.1103/PhysRevLett.109.098701
  21. 21. Golosovsky M, Solomon S. Growing complex network of citations of scientific papers: Modeling and measurements. Physical Review E. 2017;95:012324. DOI: 10.1103/PhysRevE.95.012324
  22. 22. Krapivsky P, Redner S, Leyvraz F. Connectivity of Growing Random Networks. Physical Review Letters. 2000;85:4629-4632. DOI: 10.1103/PhysRevLett.85.4629
  23. 23. Chen P, Xie H, Maslov S, Redner S. Google PageRank algorithm, scientific gems, physical review Citations. Journal of Informetrics. 2007;1:8-15. DOI: 10.1016/j.joi.2006.06.001
  24. 24. Bollen J, Rodriquez MA, Van de Sompel H. Journal status. Scientometrics. 2006;69:669-687. DOI: 10.1007/s11192-006-0176-z
  25. 25. Maslov S, Redner S. Promise and pitfalls of extending Google’s PageRank algorithm to citation networks. Society for Neuroscience. 2008;28:11103-11105. DOI: 10.1523/JNEUROSCI.0002-08.2008
  26. 26. Ma N, Guan J, Zhao Y. Bringing PageRank to the citation analysis. Information Processing & Management. 2008;44:800-810. DOI: 10.1016/j.ipm.2007.06.006

Written By

Wataru Souma and Mari Jibu

Submitted: 27 February 2018 Reviewed: 21 April 2018 Published: 18 July 2018