Assessment of Creativity: Theories and Methods

The history of creativity assessment is as old as the concept itself. Researchers from various cultures and disciplines attempted to define the concept of creativity and offer a valid way to assess it. Creativity is generally defined as the ability to produce work that is novel and appropriate. Researchers in the field attempted to measure creativity from different perspectives and tried to answer the question like “What are the mental processes involved in creative thought?, Which personality traits are associated with creativity?, How can a product can be judged to be creative? and, What are the external forces that affect creativity?”. The answers of these questions constitute the most commonly used creativity assessment instruments. This chapter presents a brief overview on assessment of creativity through the eyes of the psychometric perspective and discusses the strengths and weaknesses of various instruments used in the field. thinking test


Introduction
The belief that creativity is too difficult to measure is still a dominant myth [1] and can be considered as a byproduct of definitional issues. Researchers from various cultures and disciplines attempted to define creativity and offer a valid way to assess it. As creativity is a multifaceted phenomenon, it is a complicated task to define and operationalize it. For the sake of the discussion, one should start with defining "creativity". The usefulness of higher order cognitive constructs is related to their definitions' degree of clarity [2]. Unfortunately, most creativity research oversees the importance of this point. In a content analysis done for the articles published in two major creativity research journals, Creativity Research Journal and Journal of Creative Behavior respectively, researchers found that only 34% of the selected articles provided and explicit definition of creativity [3]. In order to examine a concept scientifically, we should rely on operationalized definitions and the relatively low rates of explicit definitions on creativity, constitutes a major problem for the field. As a result, I will use the following definition provided in Ref. [3] to clarify my perspective for this chapter. Creativity is "the interaction among aptitude, process, and environment by which an individual or group produces a perceptible product that is both novel and useful as defined within a social context".
Starting with a definition would help but not provide the answer to our question at hand, why assess creativity? Although this question may have hundreds of answers the most basic and extensive answer would be: because creativity is the apex of human evolution and it is the most desirable skill in the information age. Creative thinking was the main ability that helped humans to move forward towards using a hand ax to complicated machines or produce complex language algorithms. Furthermore, creativity has become one of the most popular skills that schools and organizations search for. World Economic Forum, in its Future of Jobs report, ranked creativity in number three out of ten most important skills for the fourth industrial revolution [4], and also creativity is listed in the competencies part of 21st century skills. As of now, supporting creativity is the common goal of a kindergarten, a research institute or the biggest corporations in the world. The importance of creativity is anticipated to increase in the future due to various societal and economic trends as explained in Ref. [5].
2. Product development cycles shortened due to the information and communication technologies (For example, contemporarily any product that has been manufactured is redesigned within 5-10 years and this time period decreases to 6-12 months if the product is a technological device).
3. More and more jobs get automatized if it does not require creativity.
As job market demanded creativity more, the schools started to restructure their goals and curriculum to meet those need too. In the educational context, assessment of creativity is mostly about recognizing creativity and creating ideal conditions to nurture it, not about categorizing the students as "creative" or "not creative". In Ref. [6] possible purposes of creativity assessment have been discussed; these can be summarized as follows: 1. Guide the individuals recognize their own strengths and support them in nourishing them.
2. Develop a better understanding about human abilities like intelligence and creativity. By maintaining that we will gain insight into the working structures of these complicated concepts.
3. Restructure the curriculum and learning experiences in accordance with the needs of the students. If educators understand their students' strengths and weaknesses regarding creativity, they can tailor the educational opportunities for supporting creativity. Table 1.
Four-C's of creativity.
techniques to assess it. The reader can find information on more 70 different creativity assessments on Center for Creative Learning's web page (see reference [22]). However, the variety of definitions and assessment techniques does not mean that creativity research has no consensus at all. Researchers tried to identify psychological factors that best predict creative outcomes and proposed several assessment techniques that imply these factors as a means of measurement [13]. Indeed, we can even argue that the field of creativity assessment has never been so prosperous before.

The psychometric perspective in creativity research
Today it is accepted that creativity is a combination of cognitive, conative and emotional factors which interact with the environment dynamically. As all of these factors are present in human beings and all these variables affects us to a certain degree, it can be argued that a specific combination of them results in creativity. In the historical research of creativity, several researchers tried to investigate the nature of creativity through the eyes of the aforementioned factors. The 4P framework (process, person, product, press) proposed by Rhodes [23] is a widely accepted categorization in psychometric study of creativity.
• Process: Mental processes involved in creative thought or creative work.
• Person: Personality traits or personality types associated with creativity.
• Product: Products which are judged to be creative by a relevant social group.
• Press (Environment): The external forces that effects creative person or process (e.g. sociocultural context, trauma) In this section, historical and recent research in the field of creativity assessment will be presented. Although, every single creativity test, scale or rating will not be discussed, instead the focus will be on the historical milestones and contemporary methods of creativity assessment. This chapter embraced the integrative review approach with the aim of assessing, critiquing and synthesizing the literature on assessment of creativity.

Assessing the creative process
Psychometric measures of creative process and potential has been extensively implied in the field. These processes involve cognitive factors that lead to creative production like finding and solving problems, selective encoding (i.e. selecting info that is relevant to problem and ignoring distractions), evaluation of ideas, associative thinking, flexibility and divergent thinking. Nevertheless, from this long list of cognitive factors the assessment of creative process mostly relied on divergent thinking in the creativity assessment tests. Even researchers in Ref. [24] underlined the irony in the study of creativity, although creativity itself requires novel and original solutions to a problem, researchers mostly focused on divergent thinking (DT) tasks. Not only major efforts were put on developing DT tests, even the earliest DT tests are still widely used in creativity research and educational areas. Divergent thinking can be explained as a thought process used to generate creative ideas via searching for many possible solutions. Whereas, convergent thinking is the ability to arrive the "correct" solution. Guilford [25] who came up with these concepts clearly underlined the difference between them. In convergent thinking tests, the examinee must arrive at one right answer. The information given generally is sufficiently structured so that there is only one right answer… An example with verbal material would be: "What is the opposite of hard?" In divergent thinking, the thinker must do much searching around, and often a number of answers will do or are wanted. If you ask the examinee to name all the things, he can think of that are hard, also edible, also white, he has a whole class of things that might do. It is in the divergent thinking category that we find the abilities that are most significant in creative thinking and invention (p. 8) In divergent thinking it is important to produce as many responses to verbal or figural stimuli as possible such that, more is better in DT. After the examinee come up with various answers, testers score them. The scoring is based on the concepts of originality (uniqueness of responses to a given stimuli), fluency (number of responses produced to a given stimuli), flexibility (number and/or uniqueness of categories of responses to a given stimuli) and elaboration (to add details to the ideas produced for a given stimuli) [25,26]. As Guilford pioneered the research on creativity, initial efforts to assess it came from him and his colleagues too. Though, there were others who developed test batteries to measure creative thinking abilities and focused mostly on process components (e.g., Kogan and Wallach, Torrance, Mednick).
Structure of Intellect Divergent Thinking Test: Guilford's famous Structure of Intellect Model (SOI) was mainly about defining and analyzing the factors constitute intelligence and he proposed 24 distinct types of DT [27]. His model covers 180 (6x5x5) intellectual abilities organized along three dimensions namely; operations (evaluation, convergent production, divergent production, memory, cognition), contents (visual, auditory, symbolic, semantic, behavioral) and products (units, classes, relations, systems, transformation, implications). Guilford's SOI battery included several DT tasks like; in figural implications examinees were required to add lines to simple figures to create a new figure or in semantic units, listing commonly mentioned consequences of an impossible event, such as people not needing to sleep. Other examples include the Making Objects task (fluency with figural systems); in which participants make a new object from the provided four and by using alt least two of them or the Name Grouping task (flexibility with symbolic classes) which requires participants, given a set of names, forming subgroups based on different rules.
"Guilfordian" Tests: Guilford's work was so influential that it was followed, replicated and reinterpreted by different researchers in 60s. Wallach and Kogan [28] argued that creativity tests should be administered in a game-like environment and should not apply time limitations. With this in mind, they focused on assessing creativity in children and developed the Instances Test (list as many things that move wheels, things that make noise) and the Uses Test (tell me the different ways you can use knife, tire or like in Ref. [29] toothpicks, chair or bricks). Wallach and Kogan proposed a different perspective than Guilford, not in the content of the test but for the target age group and way of administration (for a detailed discussion on the effects of different testing environments see reference [30]). Testing the divergent thinking ability of children would allow the educators and educational institutions to recognize their creatively able children and provide the necessary support and enrichment in their education.
Torrance Tests of Creative Thinking (TTCT): If we were to make a hits list for creativity assessment tests, TTCT most probably would be the number one. Torrance's name was equated with assessment of creativity but it was not his major goal. TTCT was developed for research and to provide a tool that can be used to individualize the instruction [31,32]. The TTCT, which are mainly based on SOI battery, are the most widely used and studied creativity tests [33,34] and continue to attract attention in international level [35,36]. Over the course of years, TTCT was refined in terms of scoring and administration and re-normed, which can account for its popularity. The TTCT consist of two different tests, the TTCT-Verbal and the TTCT-Figural, and each test has two parallel forms allowing it to be used as pre-posttests in experimental settings. The TTCT scores were expressed by four factors: fluency, originality, flexibility and elaboration. After the streamlined system introduced, Figural tests scored for resistance to premature closure and abstractness of titles in addition to originality, fluency and elaboration. Flexibility was removed because of the close correlation between fluency and flexibility scores [37]. The TTCT recommend an administration of game-like environment like Wallach and Kogan but apply time limitations.
The TTCT-Verbal is entitled as "Thinking Creatively with Words" and the Figural form entitled as "Thinking Creatively with Pictures". Verbal form consists of six activities each whereas figural form consists of three (see Table 2).
Remote Associates Test: Mednick [39], proposed a different perspective to creativity assessment and instead of solely focusing on divergent thinking he argued that convergent thinking should be taken into consideration too. Mednick believed that creative people are able to produce original ideas because they have the ability to form associations in their minds. Mednick analyzed the creative process through stimulus-response (S-R) perspective, he thought producing unusual or original responses to a stimulus required creativity and defined creativity based on this point of view.
….define the creative thinking process as the forming of associative elements into new combinations which either meet specified requirements or are in some way useful. The more mutually remote the elements of the new combination, the more creative the process or solution ( [39], p. 221).
Mednick argued that people can achieve a creative solution through serendipity, similarity and mediation. His analysis showed that people's associative hierarchies

Picture Construction
Participant uses a basic shape and expands on it to create a picture.

Picture Completion
Participant is asked to finish and title incomplete drawings.

Lines/Circles
Participant is asked to modify many different series of lines and circles.

TTCT-Verbal
Asking Participant asks as many questions as possible about the picture.
Guessing Causes Participant lists possible causes for the pictured action.

Guessing Consequences
Participant lists possible consequences for the pictured action.

Product Improvement
Participant is asked to make changes to improve a toy.

Unusual Uses
Participant is asked to think of many different possible uses for an ordinary item.

Unusual Questions
Participant asks as many questions as possible about an ordinary item (this item does not appear in later editions).

Just Suppose
Participant is asked to "just suppose" that an improbable situation has happened then list possible ramifications or set of responses to stimulus situations differ. Noncreative people have steep hierarchies, with a strong or dominant response to a given situation. As an example, if someone says pros, and if I cannot think anything else besides cons, that will be my dominant response to that stimulus and I will display a steep associative hierarchy. Whereas, the creative person has a flat associative hierarchy with multiple responses to a given stimulus. For example, for the stimulus word "table" a creative person might come up associations like chair, class, wood, leg, food whereas a noncreative person might come up with strongest associative links like chair, class and wood and stuck there.
For the operational definition of his theory, Mednick developed the Remote Associates Test (the RAT). RAT consisted of 30 items originally, each item included three stimulus words and the participant was required to find a fourth word that links them all. As an example; given stimulus set is; 'book/shelf/telephone' and the fourth word that link them all will be 'book' . Some argued that, as test requires a single correct answer, it does not seem to require creative thinking [40]. However, one should note that the RAT itself is not aimed to measure creative thinking directly; it is measuring the capacity to think creatively and also in order to reach a single answer one should think divergently in RAT. Weisberg [41] joined this discussion by giving the example of a marathon runner, if one wants to identify a runner who has the potential to be a good marathon runner, he should measure lung capacity instead of running speed.
The Test for Creative Thinking -Drawing Production (TCT-DP): The discussion on TTC-DP should start with an annotation that it is not solely based on measuring creative processes (especially traditional divergent thinking tests) instead designed to mirror a more holistic concept of creativity. Though, as the theoretical basis of the test reflects mostly the cognitive processes involved in creative production, I preferred to discuss it under this heading. Urban [42] explained the approach in developing TCT-DP as a more holistic and gestalt-oriented one and aimed to consider not only divergent thinking but also aspects like content, gestalt, composition, elaboration, mental risk taking, breaking of boundaries, unconventionality and humor. The TCT-DP was developed by Jellen and Urban [43] and the test consist from a 'big square frame' with five fragments in the square and one fragment out of it. The participants are required to complete the drawing as they wish. TCT-DP has two parallel forms and although participants are not informed about the time limit during administration, it has a fifteen-minute duration for each form. TCT-DP is both an individual and group-oriented test and can be used with test-takers of most ages, from 4 to 95 years. The evaluation manual for TCT-DP includes a set of 14 key criteria ( [42,43], see Table 3).
Evaluation of Potential Creativity (EPoC): EPoC, similar to TCT-DP is not solely a process assessment, although it has strong cognitive factors it synthesized several traditions of measurement. The developers [44] embraced the multivariate approach proposed by researchers [45], which is, the combination of the cognitive, conative-affective and environmental factors influences creative capacity. EPoC was developed for children aged between 5 to 12 years old and aims to evaluate the creative potential of school-aged children. The test has two parallel forms and measurement relates to two fields of expression, graphic and verbal, and implies divergent-exploratory (find numerous original responses based on a given stimulus) and convergent-integrative (produce an original work integrating several elements in a creative synthesis) ways of thinking [13,44]. EPoC's forms are composed of eight subtests, administered individually and it is considered to be a modular domain-specific tool (see Table 4). EPoC is the most up to date creativity assessment instrument and the team is working on the extension of the test battery for new domains of creativity like music and science.
For convenience TCT-DP and EPoC has been presented under assessing the creative process and the discussion regarding their psychometric evidence is included in the next part along with other process assessment tools. As the reader may guess, there exist numerous tools for creativity assessment. Furthermore, there is a growing interest for domain-specific creativity assessment but domain-specific measures of creative potential are beyond the scope of this chapter, interested readers may check the suggested sources (i.e., For example, see [46][47][48]).

Issues of reliability and validity in creativity assessment
The most important question regarding any measurement instrument, whether it is a thermometer or test of creative thinking would be; is it reliable, does it produce consistent outcomes? To ensure reliability psychometric instruments must show consistent results in tests of reliability like test-retest reliability and split-half Continuations (Cn) Any use, continuation or extension of the six given figural fragments.
Completion (Cm) Any additions, completions, complements, supplements made to the used, continued or extended figural fragments.
New elements (Ne) Any new figure, symbol or element.
Connections made with a line (Cl) Between one figural fragment or figure or another.
Connections made to produce a theme (Cth): Any figure contributing to a compositional theme or "gestalt".
Boundary breaking that is fragment dependent (Bfd) Any use, continuation or extension of the "small open square" located outside the square frame.
Boundary breaking that is fragment independent (Bfi) Any use or extension located outside the square frame independent of "small open square".
Perspective (Pe) Any breaking away from two-dimensionality.
Humor and affectivity (Hu) Any drawing which elicits a humorous response, shows affection, emotion, or strong expressive power.
Unconventionality, (Uc, a) Any manipulation of the material.
Unconventionality, b (Uc, b) Any surrealistic, fictional and/or abstract elements or drawings.
Unconventionality, c (Uc, c) Any usage of symbols or signs.

Speed (Sp)
A breakdown of points, beyond a certain score-limit, according to the time spent on the drawing production. Table 3.
Evaluation criteria for TCT-DP (source [42,43]).  reliability. Research studies have showed that divergent thinking tests are reliable [30]. However, there are important points for further consideration, for example, some studies found that performance on DT tasks is affected by instructions (if you instruct people to be creative, they score higher). Weisberg [41], highlighted this situation by asking the question 'If you instruct the examinee to be smart in the IQ test, will he be smarter?' . Weisberg himself gives the answer to this question; as children are used to answer questions exists in IQ tests, their score will not change with the instruction to be smart. However, questions in creativity tests are different in nature, most of them do not have a single correct answer and children are not familiar with this kind of questions. Thus, additional instruction might not be flaw for tests of creativity.

Field of expression
Once the reliability of a testing instrument is maintained, questions about validity arouse. Validity is a complex concept that can be ensured in a testing instrument via different analyses like discriminant, face, criterion and predictive validity. Tests of creative potential are reliable yet major discussions and suspicions exists about their predictive and discriminant validity.
To start with the Guilford SOI model, it is known that there exist enormous amount of assessment data and the archives are still available. SOI data was analyzed extensively within the years and the results generally supported the model [49,50], or some researchers said that revisions needed [51] or concluded that the model has serious problems [52]. The results are pretty much same for Wallach and Kogan, although tests are reliable there are mixed results about its validity.
TTCT has been the most widely used and researched test of creativity, thus having extensive data to support its reliability and validity. Research about TTCT report good reliability scores for scoring and test-retest reliability [53,54]. The majority of predictive validity studies for TTCT was run by Torrance himself, beginning in 1958 they included all grades 1 to 6 in two Minnesota elementary schools and in 1959 all students in grades 7-12 took TTCT. They followed up these students in four time periods (7-12-22-40 years) and collected data about their creative achievements. The longitudinal studies have shown that [20,37,55,56] TTCT results correlate to adult creative achievement thus having predictive validity (for a detailed discussion see [57]). Though, Baer [58] raised some questions about the relevance of criterion variables (subscribing to a professional journal, learning a foreign language), do questions asked for the creative achievements in adult life are solely related to creativity? One can justifiably argue that, these criterion variables are strongly related to intelligence too. In addition, Torrance tests also correlate with intelligence then the predictability of creative achievements might be based on intelligence not on divergent thinking ability [41]. On the other hand, Plucker [59] presented more positive results concerning the predictive validity of the divergent thinking tests. He used multiple-regression analysis to reanalyze the Torrance data and examined its predictive power and provided support for the tests' usefulness. Weisberg and Baer make other criticisms including the design of the study and interested readers should refer to these sources (see [41,58]).
Mednick 's Remote Associates Test enjoy mixed support in terms reliability and validity too. Although RAT showed to be reliable [60], validity of the test is problematic [61]. It is important to note that the criterion/predictive validity of RAT, TCT-DP or EPoC have been subject to less investigation compared to divergent thinking tests like SOI or TTCT. TCT-DP has been normed in several countries like Germany, Korea, Poland and Australia for different age groups. The reliability studies showed fair to very good scores in terms of parallel test, scoring and differential reliability [42,43]. Urban stated that the question of validity is hard to answer for TCT-DP as there are no instruments directly comparable to it [42]. So, they examined correlations with intelligence and verbally oriented divergent thinking tests and expected low or slightly positive correlations to ensure the instruments validity and attained supportive findings for the validity of the test [42]. As a modern creativity assessment instrument, EPoC was initially developed and validated in France with French sample. Internal validity was acceptable and for external validity researchers reached satisfactory results by proving that EPoC scores are independent from intelligence scores, moderately correlated with personality-relevant dimension like openness to experience and highly correlated with classic divergent tests [13,44]. Although, EPoC shows promising validity results, extensive research is needed to support its criterion and predictive validity.
Extensive discussion regarding the reliability and validity of creativity assessment is mostly based on the divergent thinking tasks and tests. One major problem is about the scoring systems and several researches showed that fluency can act as a contaminating factor on originality scores [62]. To resolve fluency problem a new calculation named Creativity Quotient (CQ ) was proposed by researchers [63]. CQ formula rewards response pools that are highly fluent and flexible at the same time. The discussion on fluency scoring is ongoing and some researchers advocate that fluency is a more complex construct than it is originally thought.
The debate on the predictive validity of divergent thinking tests is still ongoing, it seems like there exist two camps of researchers, one supporting the predictive power of DT [59,64] and the other opposes [41,58]. In an extensive review Kaufman and his colleagues [24] summarized the methodological issues in studies of DT tests' predictive validity and pointed out that scores may be susceptible to intervention effects, administration procedures can affect the originality and fluency scores, statistical procedures may be inadequate, score distributions often violate the statistical assumption of normal distribution and creative achievement in adulthood may be domain specific and the DT tests used are almost always domain general. Runco [65] with all these criticism in mind, advocated for DT tests by saying; Theorists who dismiss divergent thinking as entirely unimportant have ignored recent empirical research. . . . Additionally, some critics seem to expect too much from divergent thinking. Again, divergent thinking is not synonymous with creativity. Divergent thinking tests are, however, very useful estimates of the potential for creative thought. Although a high score on a divergent thinking test does not guarantee outstanding performance in the natural environment, these tests do lead to useful predictions about who is capable of such performances. . . . Divergent thinking is a predictor of original thought, not a criterion of creative ability. (p. 16) In the early 60s and 70s creativity assessment was pretty much equal to DT tests however after several years and hundreds of research, the field should embrace a wider perspective. We now have more complex systems theories of creativity and it would be more prosperous for the field, if the upcoming research focus on developing and testing contemporary instruments more.

Assessing the creative person
Autonomous, self-confident, open to new experiences, independent and original are some of the character traits that creative persons possess and the assessment of creative person deals with it. Measures that focus on the characteristics of creative person are self-reports or external ratings of past behavior or personality traits and they have been reviewed extensively in the literature [66]. Creative personality traits are diverse and can be perceived to be both positive and negative. Such as; perseverance, tolerance for ambiguity risk taking, psychoticism, dominance or non-conformity. One of the leading theories of personality is the five-factor theory. These five factors are neuroticism, extraversion, openness to experience, conscientiousness and agreeableness. Openness to experience is highly associated with creativity measures such as self-reports [67], verbal creativity [68], and psychometric tests [69].
Researchers study the common personality characteristics and past behaviors of people who are accepted as creative and develop instruments to measure personality correlates of creative behavior. There exist numerous instruments of personality scales and attitude checklist such as; The Khatena-Torrance Creative Perception Inventory, Group Inventory for Finding Talent, Creativity Achievement Questionnaire or Runco Ideational Behavior Scale.
The Khatena-Torrance Creative Perception Inventory: This inventory consists of two self-rating scales called What Kind of Person Are You? (WKOPAY) and Something About Myself (SAM). It is designed to identify creative people 10 years or older [70]. There are 50 forced-choice items in each inventory and asks test takers for example, if they have courage for what they believe or select true or false options for the sentences like; I have made a new dance or song. The inventory has satisfactory reliability data and validity data was moderate.
Group Inventory for Finding Creative Talent (GIFT): GIFT is a self-report for 1-6 grader to assess their creative potential [71]. Students give yes/no answers to a series of questions aiming to assess flexibility, curiosity, perseverance or hobbies such as; I like to take things apart to see how they work. Later in 1982, Davis and Rimm developed a new personality scale called Group Inventories for Finding Interests (I and II), known as GIFFI. These instruments were designed for junior and senior high school students and are very similar to GIFT [72]. Reliability and validity data for GIFT and GIFFI were moderate and researchers stressed that additional data is needed to support their psychometric structure.
The NEO Personality Inventory -NEO-Five Factor Inventory: Costa and McCrae's [73,74] inventories are one of the most popular five-factor measures of personality theory. For openness to experience part, they used down to earth-imaginative, uncreative-creative, conventional-original, prefer routine-prefer variety as adjective definers and fantasy, esthetics, feelings, actions, ideas and values as scale definers [73]. This type of items has been used in numerous studies and most of the studies did not find any personality differences among cultures except in some studies it has been shown that European-American cultures tended to be more open to experience than Asian-African cultures (for a detailed discussion see [24]).
Creativity Achievement Questionnaire (CAQ ): Self-reports of activities and attainments can be used to measure creativity. CAQ developed by researchers in Ref. [75] and assesses achievement across 10 domains of creativity. It is a selfreport checklist consisting 96 items that load on to an Arts (Drama, Writing, Humor, Music, Visual Arts and Dance) and a Science factor (Invention, Science and Culinary). The respondent indicates to which extent the phrases in the items represent him/her. For example, within Scientific Discovery scale items range from "I do not have training or recognized ability in this field" to "I have won a prize at a science fair or other local competition", to "My work has been cited by other scientists in national publications." The CAQ possess high levels of evidence of reliability and acceptable evidence of validity [75] and has been used in several studies (see [76,77]).

Runco Ideational Behavior Scale (RIBS):
In everyday life, generating creative ideas is a sign of creative performance and RIBS's purpose is to measure this idea generation. Ideation involves idea generation and attribution of value to it; thus, it can be an adequate creativity criterion. Runco and his colleagues developed a set 100 items and reduced it to 23 to measure ideational behavior [78]. Sample items include, "I am able to think about things intensely for many hours" or, "I often find that one of my ideas has led me to other ideas that have led me to other ideas, and I end up with an idea and do not know where it came from". Psychometric integrity of RIBS in terms of reliability and validity has been proven to be adequate [78] and RIBS has been used in several studies and adapted to other languages as well (see [79,80]). "Person" perspective or conative factors in creativity assessment mainly take into account that significant personal characteristics and existing creative behavior are best predictors of future creative behavior. Feist, an influential personality researcher, for example investigated the personality characteristics of scientists versus scientists, more creative versus less creative nonscientists and artists versus nonartists. In general, he showed that creative people are more open to new experiences, less conventional and less conscientious, more self-confident, self-accepting, ambitious, dominant, hostile and impulsive [81,82]. In sum, self-reported creativity has attracted considerable attention in the field because it is fast and easy to score. Although, researchers willing to use these instruments should take into account the validity issues and the possibility that respondents may not be telling the truth. All kinds of self-assessments generally correlate to each other but the correlation data with performance assessments are contradictory [83][84][85]. Thus, citing from reference [24] "although self-assessments have a function and purpose, they are not useful in any type of high-stakes assessment".

Assessing the creative product
Think about the Nobel, Oscar or Grammy prizes, how the winners are designated? For example, do the Nobel committee requires the nominees to take TTCT or fill the creativity questionnaires or a taxi driver's opinion will be count as an expert opinion in determining the nominees for chemistry? As explained in theories of Csikszentmihalyi and Amabile any idea or product to be seen as creative it should be valued by others or recognized experts in that field [86,87]. Measuring the creativity of a product can be the most important aspect of creativity assessment yet it did not receive as much attention as process or personality variables. Some researchers even believe that product assessment is probably the most appropriate assessment of creativity and referred as the "gold standard" of it [88]. Researchers developed several instruments to evaluate creative products, such as Creative Product Semantic Scale or Student Product Assessment Form. These instruments ask educators to rate the specific features of students' products. Though, above all Consensual Assessment Technique is the most popular way of assessing products. A brief explanation of each is provided below.
Creative Product Semantic Scale (CPSS): The CPSS is based on a theoretical model that conceptualizes three dimension of product attributes: novelty (the product is original, surprising and germinal), resolution (the product is valuable, logical, useful, understandable) and elaboration and synthesis (the product is organic, elegant, complex and well-crafted) [89]. The instrument relies on the idea that untrained judges can evaluate the creativity of a product by using a validated and reliable instrument [90]. The CPSS is scored on 7-point Likert-type scale, ranging from 1 to 7 between bipolar adjectives such as old-new. CPSS has shown to have adequate reliability values.
Student Product Assessment Form (SPAF): SPAF was developed by Renzulli and Reis [91], and aimed to assess the various types of products developed by students in enrichment programs. SPAF is designed for use with gifted learners and provides ratings of nine creative product traits (e.g. problem focusing, appropriateness of resources, originality, action orientation, audience) [92]. SPAF again, like CPSS have evidence of reliability although validity issues remained to be addressed. 13 Assessment of Creativity: Theories and Methods DOI: http://dx.doi.org /10.5772/intechopen.93971 Consensual Assessment Technique (CAT): Researchers need external criteria in creativity research to reach evidence of validity but an absolute criterion of creativity is not readily available (criterion problem) [24]. In CAT, the creativity of a product is judged by the experts in that field. These experts can be a group of mathematics professors to a group of kindergarten teachers depending on the product at hand. CAT was formulated by Amabile [87,93] and since then has been applied in the creativity research extensively. When using CAT, the participants are asked to produce something (an actual product like haiku, collage, poem etc.) and experts rate the creativity of these products according to their perception of a creative product. CAT's procedure is working similar to the real world and it does not provide standard scores, only comparative scoring is possible.
CAT has been proven to be reliable in several studies [58,85,88,93,94], inter-rater reliabilities ranged between .70 to .90. The average number of judges involved in the CAT studies run by Amabile [93] was just over ten. Using expert judges ranging between 5 to 10 is recommended, fewer than 5 experts may results in low interreliability levels and using more than 10 (although desirable) can be expensive and hard. Although, CAT steadily shows high reliability in various studies, using experts in creativity assessment is not without controversy. For example, Amabile states that determining the necessary level of expertise for judges is important and it is recommended that the experts should have formal training and experience in the target domain. Furthermore, researchers reported mixed results about the expert and novice ratings. For example, Kaufman and his colleagues showed low correlations among novice and expert raters [95], whereas in another study higher correlations reported [96], in more recent work researchers approached the expertise problem from a different perspective and argued that it should be understand as a continuum [88]. CAT also possess strong face validity yet, face validity (an instruments capability to measure what it looks like to measure) is not sufficient enough. For example, experts can agree a product is not creative and still be wrong (e.g. van Gogh was not valued as a creative artist by the experts in his time). Predictive validity discussion is even more complicating, it has been shown that CAT scores do predict later CAT scores, meaning they are stable across time in the same domain. However, does this mean CAT scores can predict later creative achievement? Historiometric research data supports this argument, for example analysis of Mozart's music pieces in his early life predicted his later creative achievement [97].

Assessing the creative press
Various environmental factors contribute to creative potential and have deep effects on it. Parental practices, trauma, birth order, culture, teaching practices and group interactions may affect creativity. Following the previous example of Mozart, we know that he was born in Salzburg and to a musical family (his father was a music teacher, composer, conductor and violinist). Imagine what would happen to the same Mozart if he would have born in small village in the Alps as son of a shepherd, would he be able to develop as a musical prodigy? Although creativity is highly related to cognitive factors, it is impossible to disregard the impact of environment.
As environmental factors are identified as important contributors to creative potential, studies aiming to determine the presence or absence of these factors in an individual's environment become really important. There are instruments for assessing classroom and learning environment like Classroom Activities Questionnaire-CAQ (cited in [13]). However, the majority of the instruments for assessing environmental effects on creativity are mostly about the organizational structures, such as KEYS: Assessing the Climate for Creativity [98]. CAQ has not been widely applied in research studies therefore lacking the psychometric data, KEYS on the other hand, which was designed to "assess individuals perceptions and influence of those perceptions on the creativity of their work" ( [98], p. 1157) possess evidence of reliability and validity and is widely applied in the organizational creativity field.

Conclusion
Creativity has various definitions, theories and also understood therefore assessed in many ways. Enhancing students' creative thinking skills has become one of the major goals of education. Unfortunately, Kim's comprehensive research on TTCT is disquieting. The normative data of TTCT 1974TTCT , 1984TTCT , 1990TTCT , 1998TTCT and 2008 (272,599 participants) were re-analyzed and it was found that creative thinking scores either remained static or decreased, starting at the sixth grade [99]. There can be millions of reasons behind this failure. The inability to embed creativity in classroom practices can be one reason whereas the development and implication of up to date creativity assessment is the other. The field should move forward to using comprehensive theories as the basis of assessment, renew the norms of existing creativity tests such as TTCT and pay more attention to the validity studies of the creativity assessment instruments.
This chapter introduced a brief overview of existing tools of creativity assessment and to reach a "perfect" measure, researchers should take these approaches' and instruments' strengths and weaknesses into account (a brief overview is provided in Table 5).
Furthermore, the argument that Sternberg [100] made by claiming that the evaluation of creativity is always local has to be kept in mind. Judging any thought or product is relative to some set of norms and this perspective raises questions for tests like TTCT or Unusual Uses, because these tests assume that some sort universal creativity exists and they measure it. Sternberg believes that creativity should be assessed locally because it has culture dependent elements just like intelligence and he suggests that "we should agree that our evaluations of what usually is viewed as constituting creativity -novel, surprising, and compelling ideas or products -represent local norms" ([100], p. 399).

Type of Assessment Examples Advantages Disadvantages
Process based assessment (e.g. divergent thinking tests)  Table 5.