Top 20 papers of citation.
A number of citations have been used to measure the value of paper. However, recently, Google’s PageRank is also extensively applied to quantify the worth of papers. In this chapter, we summarize the recent progress of studies on citations and PageRank. We also show our latest investigations of the citation network consisting of 34,666,719 articles and 591,321,826 citations. We propose the generalized beta distribution of the second kind to explain the distribution of citation and introduce the stochastic model with aging effect and super preferential attachment. Furthermore, we clarify the positive linear relation between citations and Google’s PageRank. By using this relationship as the benchmark to classify papers, we extract extremely prestigious papers, popular papers, and rising papers.
- fat tail
- stochastic model
- prestigious papers
- popular papers
- rising papers
Citation analysis has a long history. Recently, Hou  applied the new method called the reference publication year spectroscopy (RPYS) to 2543 papers including 56,392 references regarding citation analysis in Science Citation Index Expand (SCI-E) and Social Science Citation Index (SSCI) data from 1970 to July 2016. This investigation clarified that the development of citation analysis is divided into five periods: before 1990, 1901–1950, 1951–1970, 1971–2000, and 2001–2016. In this chapter, we focused on the distribution of citations which were introduced by Price  and extensively investigated in the third period, that is, 1950s–1970s. In this chapter, we consider that the number of citations expresses the popularity of papers.
The fifth period, that is, 2001–2016, is characterized by a period of rapid expansion and diversified directions. In this period, many conceptions have been introduced, for example, scientific evaluation indices, citation networks, information visualization, and citing behaviors. A variety of new impact measures has been proposed based on social network analysis in sociology and of network science originated from physics, mathematics, and information science. Bollen  summarized 39 impact measures and investigated the correlation between them by using the principal component analysis. Then, Bollen  indicated that the notion of scientific impact is a multidimensional construct that cannot be adequately measured by any single indicator, although some measures are more suitable than others.
In this chapter, we focus on the Google’s PageRank which is first proposed by Brin and Page  to obtain the list of useful web pages for queries by users. Thus, if we define the usefulness of web page as the number of links cited by the other web pages, the search engine should propose the list of portal sites, that is, popular web pages. Hence, this list is useless for web users. To overcome this problem, based on the concept of vote, Brin and Page  defined the usefulness of web pages as the number of votes from the linking web pages. In the algorithm of Google’s PageRank, the number of ballets is proportional to the usefulness of the web page, that is, the useful web page has many ballets. As a result, the useful web page collects votes from the useful web pages. Thus, the Google’s PageRank expresses the prestige of web pages. We consider that this characteristic of Google’s PageRank is valid for the case of citation network.
This chapter is organized as follows. In Section 2, we explain characteristics of dataset used in this chapter. The distribution of citation and the stochastic model of citation network are elucidated in Section 3. In Section 4, we introduce Google’s PageRank and calculate it. We consider the correlation between citation and PageRank in Section 5. Section 6 is devoted to conclusions.
In this chapter, we use Science Citation Index Expand (SCI-E) provided by Clarivate Analytics Co., Ltd. This dataset contains bibliographic information of scientific papers published from 1900 to the present. However, due to limited research budget of authors, we use the dataset from 1981 to 2015 in this chapter. This dataset contains 34,666,719 papers and 591,321,826 citations.
In this chapter, we denote the number of papers published in the year as . Figure 1 depicts the change of . In this figure, almost monotonically increased from 1981 to 2013 and decreased after 2013. However, this behavior of is fake. This is because the dataset was made at the beginning of 2016 and it partially contains papers published in 2014 and 2015. It takes a few years for all the papers to be included in SCI-E.
If we consider papers as nodes and regard citations from a citing paper to a cited paper as directed links, we can consider the dataset of citations as a directed network. We call such a network as the citation network. The citation network consists of many connected components. We denote the number of nodes contained in connected components as and represent a frequency of as . Figure 2 depicts . We can find that there is the largest connected component. This largest connected component consists of 34,428,322 nodes which are 99.3% of the total number of papers contained in the dataset, and of 591,177,607 links which are 99.98% of the total number of citations contained in the dataset. In the following section, we focus on the largest connected component.
3. Distribution and dynamics of citations
In this chapter, we argue for the distribution of the citations and stochastic models which lead to the citation network.
The number of citations is represented by the number of in-degree, , of the corresponding nodes. Figure 3 is a double-logarithmic scale plot of the rank size distribution, , of citations. The right-tail part of the distribution decreases almost monotonically. This means that this part follows a power-law distribution, that is, . Here, the exponent is called Pareto exponent originated in the name of Italian economist Vilfredo Pareto. The dashed line in Figure 3 is the reference line which is the power law distribution with , that is, .
Pareto  first investigated the fat-tail behavior of the right-tail part of personal income and wealth distributions. After Pareto, many types of distribution functions have been mainly proposed in the field of economics, especially in the investigation of personal income distribution (e.g., see [6, 7]). On the other hand, in the field of scientometrics, Price  first applied the power law distribution to the citation network and found that the distribution of the number of citing (the number of out-going degree in terms of network science) follows the power law distribution with and that of the number of citations (the number of incoming degree in terms of network science) obeys the power law distribution with or . The latter result is same as the reference line in Figure 3.
Rednar  investigated papers published in 1981 and cataloged by the Institute for Science Information (783,339 papers) and 20 years of publications in Physical Review D, vols. 11–50 (24,296 papers) and found that the right-tail part of both distributions of citation follows the power law distribution with . This result is same as Price  and the reference line in Figure 3. Rednar  investigated 110 years (from July 1893 through June 2003) of publications in Physical Review, the topical journals Physical Review A-E, Physical Review Letters, Review of Modern Physics, and Physical Review Special Topics: Accelerators and Beam (353,268 papers and 3,110,839 citations) and found that the entire distribution of the number of citation follows a log-normal distribution.
Albarrán and Ruiz-Castillo  studied 5 years (1998–2002) of publications in Web of Science (3.7 million papers) and found that the power law distributions of the right-tail part of the distribution of citation are not rejected for 17 of the 22 scientific fields of Web of Science. Albarrán et al.  investigated same dataset of Albarrán and Ruiz-Castillo  and found that the power law distributions of the right-tail part of the distribution of citation are not rejected for 140 of the 219 scientific sub-fields of Web of Science. Recently, Brzezinski  investigated scientific papers published between 1998 and 2002 drawn from Scopus and found that the power law hypothesis is rejected for half of the Scopus field of science.
Although there are many researches besides the studies stated above, there are no studies that used vast amounts of data to approach the overall picture of citation distribution, like this chapter. The light gray line in Figure 3 is the best fit by the generalized Beta distribution of the second kind (GB2) (or called the beta prime distribution) (e.g., see [13, 14]) with the probability density function:
with , , , . Here, is the Beta function.
Table 1 depicts the top 20 papers of citation. In this table, is the rank of citation, is the number of citations at the beginning of 2016, and , which is enclosed in parentheses, is the number of citations at the beginning of January 2018. The characteristics of this list are that the subjects of papers are almost Biochemistry & Molecular Biology and that the publication years of papers are relatively old.
|First author||Title||Journal, Year||Subject|
|P. Chomczynski||Single-step method of RNA isolation by …||Analytical Biochemistry, 1987||Biochemistry & Molecular Biology;|
|A.D. Becke||Density-functional thermochemistry. 3…||Journal of Chemical Physics, 1993||Chemistry;|
|C.T. Leer||Development of the Colle-Salvetti correlation…||Physical Review B, 1988||Physics|
|G.M. Sheldrick||A short history of SHELX||Acta Crystallographica Section A, 2008||Chemistry; Crystallography|
|J.P. Perdew||Generalized gradient approximation…||Physical Review Letters, 1996||Physics|
|J.D. Thompson||Clustal-W – Improving the sensitivity of …||Nucleic Acids Research, 1994||Biochemistry & Molecular Biology|
|S.F. Altschul||Gapped BLAST and PSI-BLAST: a new…||Nucleic Acids Research, 1997||Biochemistry & Molecular Biology|
|S.F Altschul||Basic local alignment search tool||Journal of Molecular Biology, 1990||Biochemistry & Molecular Biology|
|K.J. Livak||Analysis of relative gene expression data…||Methods, 2001||Biochemistry & Molecular Biology|
|N. Saitou||The neighbor-joining method—A new …||Molecular Biology and Evolution, 1987||Biochemistry & Molecular Biology;|
Evolutionary Biology; Genetics & Heredity
|Z. Otwinowski,||Processing of X-ray diffraction data collected…||Macromolecular Crystallography,|
|Biochemistry & Molecular Biology|
|A.D. Beckead||Density-functional exchange-energy …||Physical Review A, 1988||Physics|
|J.D. Thompson,||The CLUSTAL_X windows interface: flexible…||Nucleic Acids Research, 1997||Biochemistry & Molecular Biology|
|R.M. Baron||The moderator mediator variable distinction…||Journal of Personality and Social Psychology, 1986||Psychology|
|J.M. Bland||Statistical methods for assessing agreement…||Lancet, 1986||General & Internal Medicine|
|T. Mosmann||Rapid colorimetric assay for cellular …||Journal of Immunological Methods, 1983||Biochemistry & Molecular Biology;|
|S. Iijima||Helical microtubules of graphitic carbon||Nature, 1991||Science & Technology - Other Topics|
|G. Kresse||Efficient iterative schemes for ab initio total-energy calculations using …||Physical Review B, 1996||Physics|
|J. Felsenstein||Confidence-limits on phylogenies – an approach using the bootstrap||Evolution, 1985||Environmental Sciences & Ecology;|
Evolutionary Biology; Genetics & Heredity
|A.P. Feinberg||A technique for radiolabeling DNA restriction endonuclease fragments …||Analytical Biochemistry, 1983||Biochemistry & Molecular Biology;|
3.2. Stochastic models
Simon  proposed the stochastic model, the so-called Simon’s model, to elucidate the empirical distributions: distribution of words in prose samples by their frequency of occurrence, distributions of scientists by number of papers published, distributions of cities by population, distributions of income by size, and distributions of biological genera by number of species. Although assumptions of Simon’s model are written in terms of word frequencies, we can express them in terms of network science as follows: assumption I—The probability that a node gets new link is proportional to the number of its degrees, that is, rich get richer or Matthew effect (e.g., see ), and assumption II—We add a new node with a constant probability . Simon’s model elucidates the fact that the right-tail part of the distribution follows the power law distribution with .
Price  generalized Simon’s model, the so-called Price’s model, to explain the growth of the citation networks. Barabáshi and Albert  introduced the stochastic model, the so-called BA model, based on two concepts: preferential attachment and growth, which corresponds to assumptions I and II of Simon’s model, respectively. BA model is the case of of Simon’s model and derives the power law distribution with . Jeong et al.  extended BA model to include an aging effect and a class of homogeneous connection kernels. Golosovsky and Solomon [20, 21] further extended to include an effect of initial attractivity.
Here, we use the model proposed by Jeong et al.  and check the aging effect and homogeneity of the growth of citation network. If we denote the number of degree of node as , the time evolution of is obtained by
Here, is an aging factor and is an unknown scaling exponent. Krapivsky et al.  have shown, for the case without the aging factor, for (linear preferential attachment) the model is just same as BA model and derives the power law distribution with . For , the model derives the stretched exponential distribution, and for (super preferential attachment) a single node connects to nearly all other nodes, akin to gelation.
If we discretize the model and consider year, Eq. (2) is written by
We investigate the dynamics of growth for 44,932 papers published in 1985. The left panel of Figure 4 depicts the double-logarithmic scale scatter plot of the number of citations, , as of 1988 and the change of the number of citations, , from 1988 to 1999. If we divide into bins with logarithmically equal separation, and calculate the average value of for each bin, , we obtain the red dots which are depicted in the right pane of Figure 4. By these manipulations, Eq. (3) is written by
The red and solid line in the right panel of Figure 4 corresponds to the linear regression of red dots by Eq. (4). The slope of this line corresponds to and the intercept of it corresponds to . In Figure 4, blue, green, and magenta dots are analysis for the year 1993, 2003, and 2010, respectively.
The left panel of Figure 5 depicts the change of . The solid line in this figure corresponds to the regression by the power law function given by . The right panel of Figure 5 depicts the change of . This figure shows that for the entire period in which we investigated. From this analysis, we realize that the citation network has the characteristics of super preferential attachment; therefore, it is expected that a single node connects to nearly all other nodes. However, the aging effect prevents the citation network from an oligopolistic network.
4. Distribution of PageRank
Here, is the total number of articles contained in the largest connected component of the citation network. The sum is over the neighboring nodes in which a link points to node . In Eq. (5), is a free parameter that controls the convergence and effectiveness of the recursion calculation. In the original Google’s PageRank , is adopted and appropriate for the case of world wide web. On the other hand, is adopted in  and appropriate for the case of citation network.
Figure 6 depicts the double-logarithmic scale plot of the rank size distribution of Google number, . In this figure, filled circles correspond to the case of and open squares correspond to that of . The dashed line in this figure is the reference line and represents the power law distribution with . This value of exponent is same as the case of distribution of citation as depicted in Figure 3. Although the rank size distribution of Google number depends on , the Google’s PageRank, , is almost the same as depicted in Figure 7. This figure is the double-logarithmic scale plot of , and the abscissa is in the case of , and the ordinate is in the case of .
Table 2 depicts the top 20 lists of the Google’s PageRank. The characteristics of this list are that papers belong to many subjects and that the publication years of papers are relatively old.
|First author||Title||Journal, Year||Subject|
|4||G.M. Sheldrick||A short history of SHELX||Acta Crystallographica Section A, 2008||Chemistry; Crystallography|
|0.5||P. Chomczynski||Single-step method of RNA isolation by acid…||Analytical Biochemistry,|
|Biochemistry & Molecular Biology;|
|8.67||G.M. Sheldrick||Phase annealing in SHELX-90 – direct methods for…||Acta Crystallographica Section A, 1990||Chemistry; Crystallography|
|0.5||A.D. Becke||Density-functional thermochemistry. 3…||Journal of Chemical Physics, 1993||Chemistry; Physics|
|12.8||J. Kennedy||Particle swarm optimization||IEEE International Conference, 1995||Computer Science|
|2.5||J.M. Bland||Statistical methods for assessing agreement…||Lancet, 1986||General & Internal Medicine|
|0.43||C.T. Lee||Development of the Colle-Salvetti correlation…||Physical Review B, 1988||Physics|
|9.5||D.G. Lowe||Distinctive image features from scale-invariant…||International Journal of computer Vision, 2004||Computer Science|
|0.56||J.P Perdew||Generalized gradient approximation made…||Physical Review Letters, 1996||Physics|
|4.6||S. Kirkpatrick||Optimization by simulated annealing||Science, 1983||Science & Technology - Other Topics|
|1||Z. Otwinowski||Processing of X-ray diffraction data…||Macromolecular Crystallography, 1997||Biochemistry & Molecular Biology|
|8.08||F.H. Allen||Table of bond lengths determined by X-RAY…||Journal of the Chemical Society-Perkin Transactions 2, 1987||Chemistry|
|0.56||J.D. Thompson||Clustal-W - improving the sensitivity of…||Nucleic Acids Research,|
|Biochemistry & Molecular Biology|
|0.57||S.F. Altschul||Basic local alignment search tool||Journal of Molecular Biology, 1990||Biochemistry & Molecular Biology|
|0.47||S.F. Altschul||Gapped BLAST and PSI-BLAST: a new…||Nucleic Acids Research, 1997||Biochemistry & Molecular Biology|
|0.63||N. Saitou||The neighbor-joining method – a new method…||Molecular Biology and Evolution, 1987||Biochemistry & Molecular Biology;|
Evolutionary Biology; Genetics & Heredity
|1||S. Iijima||Helical microtubules of graphitic carbon||Nature, 1991||Science & Technology - Other Topics|
|5.94||H.D. Flack||On enantiomorph-polarity estimation||Acta Crystallographica Section A, 1983||Chemistry; Crystallography|
|4.32||A.L. Spek||Single-crystal structure validation with the…||Journal of Applied Crystallography, 2003||Chemistry; Crystallography|
|6 s.45||N. Walker||An empirical-method for correcting…||Acta Crystallographica Section A, 1983||Chemistry; Crystallography|
5. Correlation between citation and PageRank
Bollen and Rodriquez  described that the Institute for Scientific Information (ISI) Impact factor (IF) which is defined as the mean number of citations a journal receives over a two-year period is a metric of popularity and that the Google’s PageRank is a metric of prestige. This concept is also proposed by Chen et al.  and Maslov and Redner  which investigated all publications in the Physical Review family of journals from 1893 to 2003 and found the linear relation between the Google number and the number of citations. Furthermore, [23, 25] found that some outliers from this linear relation, especially the papers of which the ranking of PageRank is remarkably high and that of citation is slightly high, are universally familiar to physicists [23, 25] called such papers scientific “gems.” Ma et al.  applied the concept of [23, 24, 25] to the field of biochemistry and molecular biology from 2000 to 2005. Though these studies investigated the citation network of some selected scientific field, this chapter investigates the citation network consisting of all scientific fields.
Figure 8 depicts the double-logarithmic scale plot of the correlation between the number of citations, , and the Google number, . In this figure, the solid gray line represents the mean value calculated for bins of with logarithmically equal width. This figure shows that versus is smooth and increases linearly with for . Thus, the Google number and citations are almost similar measures characterizing the importance of papers. This result means that prestige (Google number) is proportional to popularity (citations) in many cases.
However, there are outliers which have high prestige comparing to popularity. These papers are located above the solid gray line in Figure 8 and are regarded as extremely prestigious papers. If we denote the citation rank as and the Google’s PageRank as , these extremely prestigious papers are extracted by the order of Google’s PageRank with the constraint given by the ratio . Table 3 depicts the top 20 extremely prestigious papers selected by using the constraint . The characteristic of this list is that the subjects of papers are almost information science.
|First author||Title||Journal, Year||Subject|
|12.8||J. Kennedy||Particle swarm optimization||Proceedings of IEEE International Conference, 1995||Computer Science|
|22||1.6861||240||6500 (7458)||10.91||S.M. Alamouti||A simple transmit diversity technique for wireless…||IEEE Journal on Selected Areas in Communications, 1998||Engineering; Telecommunications|
|25||1.5103||516||4465 (6605)||20.64||I.F. Akyildiz||Wireless sensor networks: a survey||Computer Networks, 2002||Computer Science;|
|33||1.4160||481||4611 (6276)||14.58||Z. Pawlak||Rough sets||International Journal of Computer & Information Sciences, 1982||Information Science & Library Science|
|36||1.3169||784||3740 (5402)||21.78||I.F. Akyildiz||A survey on sensor networks||IEEE Communications Magazine, 2002||Engineering; Telecommunications|
|43||1.2155||998||3309 (4707)||23.21||T.R. Gruber||A translation approach to portable ontology…||Knowledge Acquisition, 1993||Computer Science;|
Information Science & Library Science
|48||1.1432||828||3656 (4463)||17.25||P. Gupta||The capacity of wireless networks||IEEE Transactions on Information Theory, 2000||Computer Science;|
|49||1.1387||1916||2441 (2839)||39.10||S. Floyd||Random early detection gateways for congestion…||IEEE-ACM Transactions on Networking, 1993||Computer Science;|
|53||1.1102||1247||2991 (3879)||23.53||G. Bianchi||Performance analysis of the IEEE 802.11 distributed…||IEEE Journal on Selected Areas in Communications, 2000||Engineering; Telecommunications|
|60||1.0626||608||4149 (5968)||10.13||S. Haykin||Cognitive radio: Brain-empowered wireless…||IEEE Journal on Selected Areas in Communications, 2005||Engineering; Telecommunications|
|76||0.9431||967||3360 (3961)||12.72||T. Murata||Petri nets - properties, analysis and applications||Proceedings of the IEEE, 1989||Engineering|
|79||0.9388||1758||2535 (3702)||22.25||W.B. Heinzel-man||An application-specific protocol architecture for…||IEEE Transactions on Wireless Communications, 2002||Engineering; Telecommunications|
|90||0.8884||1190||3048 (4075)||13.22||R. Ahlswede||Network information flow||IEEE Transactions on Information Theory, 2000||Computer Science;|
|93||0.8767||1565||2691 (3401)||16.83||T. Wiegand||Overview of the H.264/AVC video coding standard||IEEE Transactions on Circuits and Systems for Video Technology, 2003||Engineering|
|97||0.8598||1045||3245 (4674)||10.77||M. Dorigo||Ant system: Optimization by a colony of…||IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, 1996||Automation & Control Systems;|
|116||0.7923||2736||2052 (2426)||23.59||D. HAREL||Statecharts - a visual formalism for…||Science of Computer Programming, 1987||Computer Science|
|120||0.7838||4059||1705 (2982)||33.83||M. WEISER||The Computer for the 21st-century||Scientific American, 1991||Science & Technology - Other Topics|
|121||0.7796||1406||2840 (4011)||11.62||S. Deerwester;||Indexing by latent semantic analysis||Journal of the American Society for Information Science, 1990||Computer Science;|
Information Science & Library Science
|128||0.7584||3165||1914 (1948)||24.73||A.E. Leviton||Standards in herpetology and ichthyology…||Copeia, 1985||Zoology|
|129||0.7582||7409||1274 (1478)||57.43||X.Y. Wang||Room-temperature all-semiconduc-ting…||Physical Review Letters, 2008||Physics|
On the other hand, there are also outliers which have low prestige comparing to popularity. These articles are located below the solid gray line in Figure 8 and are regarded as extremely popular papers. These articles are extracted by the order of citation rank with the constraint given by the ratio . Table 4 depicts the top 20 extremely popular papers selected by using the constraint . These articles are divided into two groups. One group contains papers which are published in Nature, Science, and the Proceedings of the National Academy of Science of the United State of America (PNAS). Besides, publication year of these papers are approximately over 10 years ago. Furthermore, the growth rate of citations, , of those papers are low. The other group includes papers which are mainly published in Cell and are published relatively recently. What is more, the growth rate of citations, , of those papers are extremely high. Thus, we can regard these papers as rising papers.
|First author||Title||Journal, Year||Subject|
|627||0.3250||5.02||D. Hanahan||Hallmarks of Cancer: The Next Generation||Cell, 2011||Biochemistry & Molecular Biology;|
|1580||0.2042||5.32||D.W. Huang||Systematic and integrative analysis of large gene list…||Nature Protocols, 2008||Biochemistry & Molecular Biology|
|1608||0.2023||5.29||Y. Zhao||The M06 suite of density functionals for main…||Theoretical Chemistry Accounts, 2008||Chemistry|
|1810||0.1897||5.54||D.P. Bartel||MicroRNAs: Target Recognition and…||Cell, 2009||Biochemistry & Molecular Biology;|
|2128||0.1757||5.67||B.P. Lewis||Conserved seed pairing, often flanked by…||Cell, 2005||Biochemistry & Molecular Biology;|
|2506||0.1619||6.05||T. Jenuwein||Translating the histone code||Science 2001||Science & Technology - Other Topics|
|2123||0.1759||5.07||P. Li||Cytochrome c and dATP-dependent formation…||Cell, 1997||Biochemistry & Molecular Biology;|
|2802||0.1534||5.24||Z.G. XIA||Opposing effects of ERK and JNK-P38 map…||Science, 1995||Science & Technology - Other Topics|
|3120||0.1447||5.75||R.C. LEE||The C. elegans heterochronic geneG…||Cell, 1993||Biochemistry & Molecular Biology;|
|2865||0.1517||5.24||A. Hall||Rho GTPases and the actin cytoskeleton||Science, 1998||Science & Technology - Other Topics|
|3269||0.1411||5.45||S. Akira||Pathogen recognition and innate immunity||Cell, 2006||Biochemistry & Molecular Biology;|
|3585||0.1348||5.87||B.D. Strahl||The language of covalent histone modifications||Nature, 2000||Science & Technology - Other Topics|
|3359||0.1390||5.25||M.E. Raichle||A default mode of brain function||PNAS, 2001||Science & Technology - Other Topics|
|3326||0.1398||5.16||E.K. Miller||An integrative theory of prefrontal cortex function||Annual Review of Neuroscience, 2001||Neurosciences & Neurology|
|3572||0.1351||5.44||R.O. Hynes||Integrins: Bidirectional, allosteric signaling…||Cell, 2002||Biochemistry & Molecular Biology;|
|4096||0.1262||6.20||S.R. Datta||Akt phosphorylation of BAD couples survival…||Cell, 1997||Biochemistry & Molecular Biology;|
|4825||0.1166||6.83||T. Kouzarides||Chromatin modifications and their function||Cell, 2007||Biochemistry & Molecular Biology;|
|4096||0.1262||5.45||M. Corbetta||Control of goal-directed and stimulus-driven…||Nature Reviews Neuroscience, 2002||Neurosciences & Neurology|
|4446||0.1213||5.91||A. Brunet||Akt promotes cell survival by phosphorylating and…||Cell, 1999||Biochemistry & Molecular Biology;|
|3972||0.1280||5.06||J.D. Fontenot||Foxp3 programs the development and…||Nature Immunology, 2003||Immunology|
We investigated papers published from 1981 to 2015 and contained in SCI-E. The total number of papers is 34,666,719 and that of citations is 591,321,826. We extracted the largest connected component from this dataset. The obtained citation network consists of 34,428,322 nodes (articles) and 591,177,607 links (citations).
The right-tail part of the rank size distribution of citations follows the power law distribution with exponent , that is, . Furthermore, we introduced the generalized beta distribution of the second kind (GB2) as the best-fit function to the whole range of citation distribution. We introduced the stochastic model with growth, preferential attachment, and aging effect. Through the numerical analysis, we obtained the value of the parameter set.
Although the number of citations represent the popularity of papers, Google’s PageRank reflects the prestige of papers. We evaluated Google’s PageRank for the largest connected component which consists of 34,428,322 articles and 591,177,607 link citations. We found that the citations and Google numbers have a positive linear relation. We consider this positive linear relation as a benchmark and selected extremely prestigious and extremely popular papers. We found that the subject of extremely prestigious papers is almost information science. Furthermore, we found that extremely popular papers are divided into popular papers and rising papers.
We conclude this chapter by describing two remaining issues. One concerns the stochastic model. Though we introduce GB2 as the best-fit function to the whole range of citation distribution, there is no stochastic model that explains GB2. The other concerns the weight of links in the citation network. Almost all studies have investigated citation networks as unweighted networks. However, it is possible to define weight of links, for example, similarity between papers.
This work is supported by Nihon University College of Science and Technology Grants-in Aid 2012 and 2016. The authors thank the Yukawa Institute of Theoretical Physics at Kyoto University. Discussions during the YITP workshop YITP-W-17-14 on ”Econophysics 2017“ were useful to complete this work.