The Multi-Tier Instrument in the Area of Chemistry and Science

Knowledge of students’ unscientific understanding before learning a new topic known as students’ preconception or prior knowledge is vital for helping the teacher design a proper teaching strategy. Meanwhile, knowledge of students’ understanding after teaching will provide a way for a teacher to evaluate the effectiveness of his/her teaching. For these reasons, science educators should investigate students’ understanding over time. Studying students’ understanding requires a proper and powerful tool/instrument such as a multi-tier instrument. This paper describes the history of multi-tier instruments initiated by the two-tier and recently became a five-tier instrument, the procedure to develop the instrument, and how to utilize the instrument to identify students’ unscientific understanding. Our recent study describing the development of a four-tier instrument of electrolyte and non-electrolyte solution (FTI-ENES) is presented.


Introduction
Investigating students' in-depth understanding, mainly their unscientific knowledge, has been carried out for decades. Teachers' knowledge of students' understanding, including their prior knowledge or preconception and understanding after teaching, is valuable. Knowledge regarding students' preconceptions is essential in assisting educators in providing effective teaching and learning. Many studies have proved the contribution of students' prior knowledge to their teaching success [1,2]. Several instruments have been used for uncovering students' conception in science, including concept mapping [3], interviews [4], and the multiplechoice test [5,6]. A proper and effective instrument must be utilized to investigate students' understanding. A typical instrument such as a multiple-choice question (MCQ ) cannot uncover a deep understanding [7] in science, particularly students' unscientific understanding/misconceptions. It has been revealed that the previous instruments have some disadvantages. Concept mapping relies on students' ability to master vocabulary [8], while the interview is time-consuming [9]. For multiplechoice questions, students' test-wiseness skills [10] could affect their reliability and validity indices, and the reason for students' answers cannot be fully uncovered [11]. Also, the role of guessing is often dominant in a multiple choice question [12].
Due to those previous instruments' disadvantages, the multi-tier format's diagnostic tool has recently been one of the most frequent instruments applied Insights Into Global Engineering Education After the Birth of Industry 5.0 in science education studies. Our previous study [13] investigated the instrument used in the study involving students' understanding of chemistry and other science disciplines (biology and physics) covered in Indonesian journals. We revealed that multi-tier instruments, particularly four-tier instruments, have been the most accepted instrument and widely applied by Indonesian researchers in identifying students' unscientific understanding.
In this paper, several terminologies, including students' conception, students' understanding, students' scientific understanding, students' scientific knowledge, students' unscientific understanding, and misconceptions, are found. Students' conception reflects students' ideas and mental processes regarding natural phenomena. The ideas could be relevant or irrelevant to the concept accepted by the scientific community [14]. For this reason, the terminology of students' conception and students' understanding are interchangeable in this paper. The ideas which adhere to the concept accepted by the scientific community are called scientific knowledge. In contrast, those different from a view taken by the scientific community are called unscientific understanding.
The incorrect idea harbored by any particular person has been described in several different terminologies in the scientific literature, including wrong knowledge, misconception, erroneous ideas, unscientific understanding, alternative conception, misunderstanding, erroneous concepts, naïve idea, alternative frameworks, naïve concept, misinterpretation, and oversimplifications. Although these terms are interchangeable, the "unscientific understanding" is preferred in this paper because it reflects the nature of students' incorrect ideas or concepts.

Two-tier instrument: The milestone of multi-tier instruments
The use of multi-tier instruments in science education was initiated by Treagust [15], investigating students' unscientific understanding in particular. The example of the two-tier instrument applied in such an instrument's initial development is provided in Figure 1.
The first-tier at the initial format portrayed in Figure 1 consists of a multiplechoice question (MCQ ) with only two options (one correct answer and one incorrect answer). This MCQ with a two-options format is quite uncommonly applied in science assessment, common in at least four options. The second tier consists of four statements covering the reasons for students' answers to the first-tier. The four reasons consist of one valid or scientific reason and three wrong or unscientific reasons. The combination of students' incorrect answers and the incorrect reason is the basis for revealing students' unscientific understanding or misconception. All incorrect reasons in the reason tier are composed based on students' actual unscientific understanding obtained from preliminary tests, interviews, and literature. The next generation of the two-tier instrument has employed a more standard MCQ in the first-tier, as depicted in Figure 2.

Three-tier instrument
After being applied in many studies, science education researchers realized that the two-tier instrument has deficiencies. Students selected the correct answer and correct reason randomly without holding a scientific reason to the relevant concept on certain occasions. The role of guessing and the actual unscientific understanding are difficult to be differentiated in a two-tier instrument [21,22].
To overcome the two-tier instrument's drawback, a three-tier instrument was developed with the additional confidence rating tier, as shown in Figure 3. The third-tier requires students to state whether they are sure or unsure of their answer and reason. A correct answer and reason with a sure expression imply a scientific understanding. Meanwhile, an incorrect answer and reason with a sure expression imply an unscientific understanding or misconception. An incorrect answer and reason with an unsure expression imply that the incorrect answer is not a result of Example of the two-tier instrument developed by Treagust [15].

Figure 2.
Example of the next generation of the two-tier instrument developed by Chandrasegaran et al. [9]. Insights Into Global Engineering Education After the Birth of Industry 5.0 misconception or unscientific understanding; instead, it lacks knowledge or guessing. This aspect distinguishes the three-tier format and the previous format. The same pattern of the three-tier instrument portrayed in Figure 3 has been used in the following studies [11,24,25].
The subsequent development of a three-tier instrument utilized a more flexible confidence rating with a broader range of confidence, as displayed in Figure 4. This pattern seems to have been influenced by the standard confidence rating scales applied in many four-tier instruments that had been published before this three-tier work was carried out.

Four-tier instruments
The confidence rating index (CRI), which is only attached to the third tier of the three-tier instrument, leads to an unclear message whether students have the same or different confidence levels between their answer and their reason [23]. For this reason, many science education researchers developed and applied the four-tier instrument. The first-tier, called Answer-tier (A-tier), consists of MCQ with several options (commonly 4). The second tier is the confidence rating for the A-tier. The third-tier, which is called Reason-tier (R-tier), consists of several statements with one correct statement relevant to the selected answer and several unscientific statements. The fourth-tier is the confidence rating for the R-tier.
The confidence rating index (CRI) for A-tier and R-tier ranged from 1 (just guessing) to 6 (absolutely confident). This more comprehensive range was then adopted for some studies that utilize three-tier instruments, as shown in Figure 4. In our recent works [7], we prefer to apply five scales of confidence rating instead of 6 scales ( Figure 5).
Using five scales of CRI provides better clarity in differentiating students' level of confidence ratings. For example, the difference between 'confident' [4], 'very confident' [5], and 'absolutely confident' [6] in a six scales CRI format is quite challenging to be recognized. However, 'quite confident' [4] and 'very confidents' [5] in 5 scales format is more comfortable to be understood. When a student is 100% sure of his/her answer, he/she will state very confident. Meanwhile, when he/she is not 100% sure of his/her answer, he/she will state quite confident. 'Average' [3] is used to express an equal portion of sure and unsure, which is not available in the six scales format. 'Very unconfident' [1] is used to express 100% unsure, including guessing or absolutely no knowledge regarding the concept. While 'not very confident' [2] is used to express an unsure reason with a small portion of feeling that his/her answer may be correct. For this reason, we suggest using five scales of CRI instead of 6 scales ( Figure 6).
The current development of a multi-tier instrument is a five-tier instrument published by Anam et al. [28], with the additional fifth tier in which students are required to provide a draw/pictorial representation of his/her answer. This Example of four-tier instrument with six confidence ratings [27].
Insights Into Global Engineering Education After the Birth of Industry 5.0 6 additional drawing will ensure the mental model of the students can be uncovered. Even though the work in a five-tier instrument is still limited, we believe that it offers a more powerful tool in this regard. A pictorial tool is supported by psychology cognitive theory that helps students solve a multistep task [29].

The procedure in developing a multi-tier instrument
Treagust [15] proposed the two-tier instrument development is the fundamental development of the next generations of multi-tier instruments, including three-tier and four-tier instruments. Treagust [15] employed ten steps with three board categories in developing a two-tier instrument. The first four steps are named defining the content. Steps 5, 6, and 7 are named obtaining information about students' misconceptions. The last three steps are named as developing a diagnostic test. The steps are: When we developed a four-tier instrument in the area of chemical kinetics named FTDICK [7], we simplified the procedure to be six steps as the following. This procedure is applicable to developing multi-tier instruments.

Step 1: Mapping concept
In this step, several essential concepts in a particular topic are identified concerning the concept's scope in the relevant curriculum. For example, when we developed a four-tier instrument to identify secondary school students' understanding of thermochemistry, the competence mastery indicator document (Indikator Pencapaian Kompetensi, IPK) in the syllabus for Indonesian chemistry secondary school was considered. System and surrounding, enthalpy, exothermic reaction, and endothermic reaction are essential concepts in the Indonesian curriculum. When we developed a four-tier instrument of chemical kinetics for first-year chemistry students, university students' chemistry curriculum was considered. Rate law, the relation between reactant concentration and time, temperature and rate, activation energy, and reaction mechanisms are essential concepts for first-year university students.

Step 2: Developing the multiple-choice question with free responses (MCQ-FR)
Each essential concept should be represented by two or more questions to ensure that it reflects all the competence and knowledge that should be mastered at the concept. Figure 7 below depicts an example of MCQ-FR in the concept of chemical kinetics, particularly rate law and the relation of concentration and rate.

Step 3: Validating the MCQ-FR
Before it is used to collect the preliminary data, the content of MCQ-FR, the relevance with curriculum, and language clarity are assessed to get feedback from some experts in the field. This feedback will be the basis to revise the MCQ-FR. The revised MCQ-FR is then used to collect preliminary data, which are students' unscientific understanding or illogical reasons. For example, in answering the question in Figure 7, some students believed that option D would be the highest rate because the concentration of two reactants (H 2 and I 2 ) is the same. These illogical reasons are then collected and employed as the basis to develop the prototype multi-tier instrument.

Step 5: Developing the prototype multi-tier instrument
A significant number of students should demonstrate students' unscientific understanding used as a reason option. Students' responses in this step are also used to measure the MCQ-FR quality in terms of validity, reliability, distractor effectiveness, discriminatory index, and difficulty level. The unscientific understanding above is utilized as the optional reason at the multi-tier instrument (Figure 6, Reason B).

Step 6: Validating the prototype and refining the final multi-tier instrument
The next step is testing the prototype multi-tier instruments to a group of students to measure its validity, reliability, distractor effectiveness, discriminatory index, and difficulty level (5 parameters). This step is also named empirical validity. Please refers to the educational evaluation and measurement references to find out the formulae to calculate these parameters. The analysis of the five parameters' values is the basis for revising the prototype and producing the final multi-tier instrument, which applies to the broader community.

Grading students' responses and how to determine students'
unscientific understanding level

Treatment of data
Students' responses to the multi-tier questions provide four types of combinations of students' answers and reasons, namely: Correct Answer and Correct Reason (CACR) representing good scientific understanding; Correct Answer and Wrong Reason (CAWR) representing a false positive of students' unscientific understanding; Wrong Answer and Correct Reason (WACR) representing a false negative of students' unscientific understanding. These three categories are not discussed widely in this paper. Wrong Answer and Wrong Reason (WAWR) represents an actual student's unscientific understanding. This WAWR is the central aspect discussed in this regard and the prime category to be used in interpreting students' unscientific understanding.

Parameters to classify students' unscientific understanding
Students' unscientific understanding is determined based on students' WAWR combinations. Several parameters and terminologies have been used to determine the level of students' unscientific understanding based on the students' confidence ratings or confidence rating index (CRI) of WAWR. Caleon & Subramaniam [21] employed six scales of confidence ratings and classified unscientific understanding or misconception as to the following. A genuine unscientific understanding is an unscientific understanding expressed with a CRI ≥ 3.5. Meanwhile, a spurious unscientific understanding is an unscientific understanding expressed with a CRI < 3.5. Genuine unscientific understanding is further categorized into moderate unscientific understanding (those expressed with medium level CRI -between 3.5 and 4.0) and high level of unscientific understanding (those expressed with a high CRI of 4.0 and above). Literature using this scale [1-6] considers 3.5, i.e., the midpoint of unconfident and confidence as the limit of a genuine misconception.
The use of this parameter with a decimal number (3.5 as the limit) raises a critique considering that all the CRI scales are in whole numbers. Therefore, the rationale to use the decimal limit is questionable. For this reason, we suggest using the following parameter to classify students' unscientific understanding for a multitier instrument that employs five scales of CRI ( Table 1).
The example of how to determine students' unscientific understanding is provided from our work in the area of thermochemistry, which is in the press for publication elsewhere. The question in Figure 8 was intended to investigate students' understanding of the system and surroundings, particularly the difference between open, closed, and isolated systems.
In answering the question in Figure 8 above, 34.43% of students demonstrated an unscientific understanding that the drop of water in the bottle's outer wall comes from the bottle's melting ice. This unscientific understanding was demonstrated by those provided WAWR combination and also CAWR combination. The WAWR combination was with answer A -Reason B, while the CAWR combination was mostly with Answer B -Reason B. To justify that the unscientific understanding is genuine or spurious, the CRI must be taken into account. If the CRI of whom provided WAWR and/or CAWR combinations is 4.0, it can be declared that the unscientific understanding is genuine and fall in the moderate category. If the CRI of those provided WAWR and/or CAWR combinations is 3.0, it can be declared that the unscientific understanding is spurious and is a result of a lack of knowledge rather than a misconception.

Figure 8.
Example of a four-tier instrument in thermochemistry [30].

Development of four-tier instrument in the topic of electrolyte and non-electrolyte solution (FTI-ENES): an empirical study
This section will present our current study in this area involving the development of a four-tier instrument in the topic of electrolyte and non-electrolyte solution. The instrument that was produced in this study is named the Four-Tier Instrument of Electrolyte and Non-Electrolyte Solution (FTI-ENES).

Method
This research employed the procedure proposed by Habiddin & Page [7] with six steps, as explained in Section 3 above. In the first step (mapping concept), it was found that differentiating electrolyte solution and non-electrolyte solution based on its electrical conductivity is the essential concept for a secondary school in Indonesia. The essential concept covers three indicators of competencies, including [1] identifying the electrical conductivity of the solution of an ionic compound, [2] identifying the electrical conductivity of the solution of covalent compound, [3] identifying the electrical conductivity of the solution of the polar covalent compound.
Next, several 22 MCQ-FR questions were constructed and intended to measure students' unscientific understanding regarding the three indicators. The example of a question in the MCQ-FR is presented in Figure 9. The questions were assessed in term of the scope of chemistry content and clarity in the language before being used for data collection by the chemistry lecturer and school teacher. The suggestions and feedbacks obtained were the basis for improving or revising the MCQ-FR.
In this study, the questions were focused on the conceptual type of question and avoided the algorithmic type. The initial data collection was carried out and involved five groups of students (153 in total) from two public secondary schools in Malang, East Java, Indonesia. Two groups from SMA Negeri 3 Malang (Public secondary school 3 in Malang) and three groups from SMA Negeri 8 Malang (Public secondary school 8) had taken the subject of electrolyte and non-electrolyte solutions.
Students' responses to the MCQ-FR of electrolyte and non-electrolyte solutions were categorized into scientific responses, unscientific responses and random responses. The unscientific responses were the basis to produce the FTI-ENES with 13 questions that experienced content validity afterwards. Next, the FTI-ENES was validated empirically involving two groups of students (62 in total) from SMAN 2 Ponorogo, East Java, Indonesia (Public secondary school 2 in Ponorogo). The parameters used in the empirical validation, including reliability, validity, difficulty level, discriminatory index and distractor effectiveness. Based on these parameters' values, improvements/revisions were made to refine the FTI-ENES and produce the final version of FTI-ENES.

Revealing students' unscientific understanding in the topic of electrolyte and non-electrolyte solution
In the initial data collection, several students' unscientific understanding were uncovered using the MCQ-FR. Some examples of students' unscientific understanding that C 12 H 22 O 11 (aq) is electrically conductive, partially ionized in water, and contains hydrogen bonding. Those unscientific understanding then adopted as the reason tier in the FTI-ENES, as shown in Figure 10.

The empirical validity of the FTI-ENES
The quality of the FTI-ENES is primarily reflected based on the values of 2 parameters, including validity and reliability. The two parameters are the most valuable aspect in assessing the quality of a question [31]. The last three parameters, including difficulty level, discriminatory index, and distractor effectiveness, are also essential, particularly formative and summative tests.

Validity
All the questions of the FTI-ENES instrument are valid with high validity indices. The average validity index for A-tier, R-tier and B-tier are 0.46, 0.45 and 0.53, respectively. These values confirm that the FTI-ENES is powerful for identifying students' unscientific understanding in the area of electrolyte and non-electrolyte solutions. The detail values for each question and each tier are provided in Table 2.

Reliability
The reliability index of the FTI-ENES was measured using the technic of Cronbach's Alpha. The reliability indices for A-tier, R-tier and B-tier are 0.69, 0.66 and 0.78, respectively. The values demonstrate that the instrument will produce a consistent result when it is employed over time.

Difficulty level
The difficulty level index (P) ranges from 0 to 1 and represent the number of students answering the question correctly. The higher the difficulty level value, the higher the number of students answering the question correctly, and vice versa. Table 3 shows that the "moderate" category is the majority incident regarding the question's difficulty level. On average, the P values for A-tier, R-tier and B-tier are 0.58, 0.53 and 0.42, respectively and fall in the "moderate" category. These values imply that the level of the questions is relevant for secondary school students.

Discriminatory index
Discriminatory index (D) compares the number of students answering the questions correctly between high achievement students and low achievement ones. The higher the D indices, the higher the number of students answering the question correctly from high achievement students, and vice versa ( Table 4).
On average, the D values for A-tier, R-tier and B-tier are 0.53, 0.52 and 0.62, respectively and fall in the "moderate" category. These values imply that the instrument can differentiate students with high achievement and those with low achievement.

Distractor effectiveness
The distractor effectiveness parameter represents whether each wrong option in the A and R tiers is functional. An option is considered functional when it is chosen by at least one student [32]. Table 5 demonstrates that all the options are functional, implying the homogeneity of the options.   Table 4. Discriminatory indices of questions of the FTI-ENES.

Conclusions
A two-tier instrument that was initially developed by Treagust [15] is the pioneer of a multi-tier instrument. The next generation of multi-tier instruments, including three-tier, four-tier, and five-tier, responds to the drawbacks of the two-tier, which is the inability to distinguish an actual unscientific understanding and the role of guessing. We also believe that an additional drawing tier, as shown by the work of Anam et al. [28], is a rational exercise to be applied in future assessment purposes. By adopting the procedure of two-tier development, we suggest a more straightforward procedure to develop a multi-tier instrument including Mapping concept, Developing the multiple-choice question with free responses (MCQ-FR), Validating the MCQ-FR, Testing and Collecting Students' Unscientific Understanding, Developing the prototype multi-tier instrument, and Validating the Prototype and refining the final multi-tier instrument. A wrong answer-wrong reason (WAWR) combination accompanied by a high confidence rating index (CRI) is the parameter to justify students' unscientific understanding level. In this paper, we suggest employing a five scale CRI instead of 6 because it provides a better clarity of students to express his/her level of confidence. We also suggest that using a CRI of 3 as a limit between genuine and spurious unscientific understanding will ensure a robust justification regarding students' unscientific understanding and lack of knowledge.
The FTI-ENES instrument developed in this study consists of 13 questions covering the topic of electrolyte and non-electrolyte solutions. The instrument's validity and reliability revealed that it is applicable to be used in identifying students' understanding of electrolyte and non-electrolyte solution. Even though the scope of the concepts covered in this study is relevant for secondary chemistry school, it may also be transferable for fresh university students, particularly to identify their basic chemistry knowledge gained from their learning experiences in their secondary school chemistry. Other detailed examples of the application of this procedure in developing multi-tier instruments can be found in our previous works, including in chemical kinetics [7], acid-base properties of salt solution [33,34], and thermochemistry [30].  Table 5.
Distractor effectiveness for each option each question of the FTI-ENES.