Establishment of FTIR Database of Roselle Raw Material Originated From Western Coastline in Peninsular Malaysia

Herbs from different geographical regions may differ qualitatively and quantitatively, hence it is crucial to determine the active components of herbs from different regions and build a reference database. This study focused on the database establishment for the authentication of the raw material of roselle (Hibiscus sabdariffa) collected at seven selected locations of the western coastline in Peninsular Malaysia. The validation on the unknown sample at the end of the study is to verify the accuracy of the established database. The inter-material distance (IMD) was presented as the mean distance of each sphere created by each batch of data from different locations. They were clustered with different folders and discriminated by Soft independent modelling by class analogy (SIMCA) algorithm. All materials from seven farms achieved 100% separation rate. The average IMD of these seven locations was 9.04. The FTIR techniques established in this study can be used to distinguish the geographical origin of the selected H. sabdariffa farm samples.


Introduction
The genus Hibiscus (Malvaceae) is distributed in tropical and subtropical zones [1]. Hibiscus sabdariffa (L.) planted in Malaysia endures high humidity and warmer climates. The main part of the plant with medicinal use is the edible red to pale yellow calyces or sepals that contain anthocyanin [2]. The various colour tones of the calyx rely on the location of planting and the composition of the soil. The factors such as genotype, types and intensity of light, orchard temperature, crop load and agronomic factors, including agrochemical application, irrigation, pruning and fertilisation, play certain roles in the quality of growth and products of roselle plant. Most of the roselle plantations are planted on Beach Ridges Interspersed with Swales (BRIS) soil in Malaysia [3]. Basically, this type of soil is not suitable for planting due to its high surface soil temperature and infiltration rate with low organic matter, nutrients content and water retention. Naimah et al. [4] reported that 20% of regulated deficit irrigation (80% irrigation) courses were required to enhance the roselle yield and preserve plant growth progression without adversely affecting calyx quality on BRIS soil.
According to statistics of industrial crops of roselle in 2016 [5], mostly shortlisted for western coastline of Peninsular Malaysia, Johor was the largest state with planted area of roselle and also achieved the highest production of roselle, followed by Penang, Selangor, Perak and Kedah. Roselle can be commercially grown throughout the year in Malaysia. Many constraints limit roselle production, including climatic variability such as flood and draught in certain district. The limited suitable land is also another factor.
Juhari et al. reported that the discrepancies of anthocyanin contents of H. sabdariffa reflected the difference in geographic origin of the plants which were selected randomly in the experiments, as the composition of anthocyanin was based on the geographic origin of the plants [10]. The anthocyanin content, however, reached 1.7-2.5% of dry weight of the calyces in all the strains examined [11]. Therefore, both biomass and production and anthocyanin biosynthesis rely on the nutritional factors which include type and concentration of carbon, nitrogen source and phosphate level [12].
Commercial H. sabdariffa products in various forms have been mushrooming in the market. The quality in term of the content of anthocyanin in these commercial products is a major concern since herbs from different geographical regions may differ qualitatively and quantitatively [13]. In addition, different processing methods including the harvest period, material of sample used and the time of delivery could be the factors affecting the quality of the roselle products. Hence, it is crucial to determine the active components of herbs qualitatively from different regions and build a reference database. There are many quality control technologies in this new era. Commonly, the types of chromatography consist of high performance liquid chromatography, gas chromatography mass spectroscopy and liquid chromatography-mass spectrometry. Fourier transform infrared (FTIR) is widely used as a new technology for many purposes [14][15][16], such as analysis of anthocyanin [17]. The advantages of FTIR are rapid, less-destructive and cost saving. Such information acquired can be utilised for the development of reference database of H. sabdariffa to provide basic information on the product for the purpose of authentication, as the spectrum of a product can be rapidly matched for validation of its geographical origin and to predict the anthocyanin contents. This study therefore focused on the database establishment for the authentication of roselle raw materials collected from seven selected locations of western coastline in Peninsular Malaysia.

Plant material
Only one variety of H. sabdariffa L. was obtained from seven different farms recognised by the State Agriculture Department along the western coastline in Peninsular Malaysia. The calyces of each individual plant were randomly collected ( Table 1)

Sample processing
Each of the individual calyces collected were processed individually. After removing the seed, the calyces were washed and air-dried at room temperature. After about 80% of dryness was achieved, the calyces were continually dried in the oven at 50°C for 3-4 days. The dried calyces were pulverised with a blender to the finest size for further use. The processing was repeated for all the individual calyces collected from the seven locations.

FTIR method
The measurements were carried out using a Fourier Transform infrared (FTIR) spectrometer Spectrum GX, Perkin-Elmer Ltd., England, equipped with a deuterated triglycine sulphate (DTGS) detector. Infrared spectra were recorded at 32 scans at a range of 4000-400 cm −1 with a resolution of 4 cm −1 [18]. The dried calyces were ground with potassium bromide (KBr) powder in the ratio of 1:200 under the lowest humidity environment. The KBr and sample mixture were pressed not more than 10 psi to form a thin disc to be scanned for mid-infrared spectrum. The spectrum that achieved more than 60% transmission was chosen for further use [19]. Three discs were produced from each plant calyces and scanned.

Assured ID for chemometric analysis
Software Assured ID (Assured ID Method Explorer 2015, PerkinElmer) was used for chemometric analysis. The chemometric SIMCA was chosen by selecting wave number in the range of 1900-515 cm −1 (Figure 1) instead of function with icon  "COMPARE" in the software. The outlaying spectrum was excluded in the developed method ( Figure 2) when troubleshooting under the Coomans skill (Figures 3 and 4).

Validation on unknown location sample
Validation was done on three batches of roselle given by a colleague for testing the established database. These roselle samples were labelled as A, B, C, D, E and F. The validation was also done on a roselle sample purchased from a Chinese shop in Georgetown, Penang, Malaysia.
The sample was in the dried form and pulverised with blender. The finest samples were obtained by sieving with a 150-μm sieve (Standard Test Sieve, "CE"). The finest powder form of sample was mixed with KBr and followed the similar procedure of FTIR method, as mentioned in Section 2.3. The spectrum of unknown sample was copied to seven sets and labelled in a series (such as A-1, A-2, A-3, A-4, A-5, A-6 and A-7) and imported into the established database. Later, each copy of the spectrum was given a location based on the location of the established database. The specified material total distance ratios (SMTDR) of the generated results were used to predict its geographical origin. The system has a default of specific material distance ratio limit with a value of 1.000 estimated by a ratio of the edge of the sphere with the diameter of the sphere. In fact, the SMTDR was less than 1.000, and the position of the spectrum was considered located in the area of the sphere. Fourier Transforms -Century of Digitalization and Increasing Expectations 6

Classification and performance report
The software "Assured ID" has successfully separated the spectra of the seven H. sabdariffa location samples based on different cluster of spheres. The analysis consisted of samples with extreme data (1.04% of excluded data) that were excluded from the system. All the materials from the seven farms achieved 100% rejection rate (Figure 5), showing that each of the H. sabdariffa spectra from the same location was distinguishable from the other locations when the software made a border line for the group of spectra from the same location. The 125 roselle   samples spectra from Penang derived a mean spectrum and used as reference, whereas 88 samples from Kedah were incorporated into another mean spectrum. Roselle sample spectra from other locations were also included in this database.
All the raw data were tested with chemometric SIMCA. Analysis of the sample shown only the group of spectra from Johor (Muar) achieved 100% (69/69) recognition rate. The lowest recognition rate (92%) was the samples from Perak (Lenggong), as out of a total of 108 spectra of samples from Lenggong, 99 spectra were recognised to the cluster of Lenggong. The other nine spectra were considered different from the Lenggong spectra cluster. This different spectrum was not overlapping with another cluster; nevertheless, they were not incorporated into the cluster of Lenggong. Samples from Sabak Bernam, Dengkil and Batu Pahat reported 3-6% elimination of perfect recognition rate. Figure 5 showed the tabulated IMD of all the locations at western coastline in Peninsular Malaysia.

Inter-material distances (IMD)
Inter-material distance is the mean model distance created by the software based on the cluster of spectra which include the residual and compared with the other cluster of spectra in the same model. IMD indicated the average separation distance of two clusters of spectra. IMD with greater value suggested each cluster was separated far apart and their components were possibly different. On the other hand, IMD with zero value represented each cluster possessed similar components.
The 3D principal component graph (Figure 6) illustrated the position of each cluster of spheres, which was viewed from different direction since their intermaterial distances varied. The 3D graph was established by three axes: PC1, PC2 and PC3. Each of the spheres was developed by the group of samples from their different locations. The spectrum of each sample was transferred to a particular dot form. They were surrounded by the residues and the whole sphere represented the mean of all spectra of the group. They were separated based on the intermaterial distance from the centre of the sphere. When the inter-material distance was closer, the two spheres would be overlapped. Since most of the inter-material was more than zero, the software was able to differentiate each group of samples. The areas of the spheres varied and relied on the derivative of the spectra from the main spectrum. When the size of sphere was smaller, the differences of each dot in the group from the mean spectrum were less and vice versa. Figure 6 illustrated that the seven area spheres were associated closely in a three-dimensional graph, which was viewed from different direction since the inter-material distances varied. The IMD with high value reflected the far distance of the sphere's separation. Some of the spheres overlapped at certain portion meaning they were having very small value of IMD.  The average inter-material distance of these seven locations was 9.04. The highest inter-material distance was 20.1 between samples from Kedah (Sik) and Selangor (Sabak Bernam). The prediction of this scenario was that the development of H. sabdariffa from Sik in Kedah and Sabak Bernam in Selangor could be different in terms of their growing environment. The IMD from the Perak (Lenggong) and Johor (Batu Pahat) samples were lowest (4.07), showing that they shared 97.84% similarities of components in roselle grown under similar conditions of soil, water, pH and weather ( Table 1). The analysis by software "Assured ID" indirectly also indicated that the sample from these two locations showed very similar spectra and the ingredients of the calyces were produced under similar conditions. Samples from Kepala Batas showed IMD of less than 5.00 similar to samples from Lenggong, Muar and Batu Pahat. Samples of H. sabdariffa from Kepala Batas might have produced comparable chemical content as samples from these three locations. The IMD value of Muar and Batu Pahat was almost similar, as both locations are only 60 km apart. The soil condition, water and climate are less different. The IMD value of more than 10 for samples from Selangor (Sabak Bernam) showed that samples from Kepala Batas had different quality compared with them. Samples from Sik showed lowest IMD (6.52) similar to Batu Pahat when compared with other locations. Samples from Lenggong scored higher IMD value compared with sample from Dengkil and could possibly be due to the organic fertiliser and soil used in Dengkil farm. Higher rate of organic fertiliser increased the stem diameter and stem height, leaves number and leaves area as well as the biomass and number of calyx [20]. This could explain why the samples from Dengkil achieved higher IMD among all the samples even though samples from Sabak Bernam were obtained from same state. In comparison, samples from Muar showed lower IMD compared with Batu Pahat and Sabak Bernam, as these two locations are located in the middle of western coastline of Peninsular Malaysia. However, samples from Batu Pahat and Sabak Bernam still produced IMD greater than 10. This could be due to other factors such as the expanding of roselle disease [21] in two different locations. This kind of disease affected the yields and products of roselle as they caused leaf spot, stem rots and root rots.

Validation of unknown sample
Three batches of raw roselle sample showed the SMTDR value of more than 1.000 ( Table 2). This could be due to the raw material used included many overlapping spectral points. The spectra used for database have wide range of variation. Thus, the sphere was built by covering varied sizes. The exclusion process was done to eliminate the variation. During the trouble shooting step, the rare spectrum points discarded from the system also affected the average of the sphere size and diameter, and another spectra point could appear and needs to be excluded. Therefore, exclusion plays a key in validation.
Since the SMTDR would not achieve less than 1.000, the prediction of the validation was based on the lowest value of SMTDR for the best result. By right, the range of SMTDR value of more than 1.000 was not mentioned in the system. There is no setting of SMTDR greater than 1.000, as the variation of database is built up by pure compound and theoretically the SMTDR of less than 1.000 for sample is validated within that specific sphere area. The validation of the sample needs to be conducted in a case by case manner. In the first batch of the sample, only sample F was predicted correctly. It is from Batu Pahat (Johor) with lowest SMTDR (5.6660). The prediction of the rest of the samples was inaccurate with SMTDR within the range of 6.000-9.000. Only sample B was predicted with highest SMTDR and totally out of the range, indicating that the sample was not in the list of the database. The result showed more than half of the sample was related to Batu Pahat (Johor). Sample E in the second batch of the samples was correctly validated from Batu Pahat (Johor). Sample B was validated from Johor also, but from Mersing another district, but the SMTDR was lower than sample E, showing that the established database was not able to distinguish the sample from another district ever though the SMTDR was lower. The prediction of the location of the unknown sample was 100% relied on the value of SMTDR. Sample F was validated with highest SMTDR of 28.9541 and was absolutely as a sample not from the western coastline. The other samples were validated with SMTDR of around 5.000-9.000.
The pattern of results for the third batch of the validation sample was similar to first and second batch samples. Sample B was validated correctly from Batu Pahat (Johor). Sample A which originated from Kuala Rompin (Pahang) was validated with highest SMDR. The rest of the samples were validated in the range of SMTDR 3.000-8.000. In summary, most of the result of validation referred to the sphere with bigger size, in this case, Batu Pahat (Johor) and Kepala Batas (Penang). The average of the SMTDR was around 3.000-9.000 for these batches of roselle samples. Calculated SMTDR not within this range is considered roselle sample located far away.
Validation of certain samples based on the established database showed the limitation and the reliability of the method. The database of samples from different locations with great variations caused the different sizes of the sphere in 3D graph. This phenomenon could affect the outcome, as it is preferable to possess bigger size sphere. The limitation of the established database includes the inaccuracy of determining the actual origin of the sample, since the outcome is only based on the SMTDR which is calculated by the software.

Conclusion
H. sabdariffa is the herbal plant adaptable to almost every state in Malaysia. It is easy to grow and prefers mineral soil with lower acidic pH. The calyces of H. sabdariffa are made into herbal tea and consumed by local Malaysians. Their anthocyanin contents have been reported as the key component in therapeutic studies. This project was sampled of the roselle farm in the western coastline of Peninsular Malaysia. There are some considerations when establishing the database with Assured ID. The preparation of the sample is important in ensuring accurate determination. Firstly, the sample size of the KBr disc should be minimum above 50. The exclusion of extreme spectrum may minimise the sample size. This is crucial to ensure the data are representative of the actual condition of the sample in the area. Secondly, the sample processing procedures must be simple and time saving. The selection of region of wavenumber must include the range of fingerprint of the sample, which is exhibited in the raw material spectrum. The IMD of the sample must be more than one. It is preferable to collect the sample over a wide area in order to minimise the error of determining the location of unknown sample. When the location of an unknown sample could not be determined from the established database, it is possible that its SMTDR value could be out of the range of the average.
In this study, roselle raw material spectrum database was established by importing the spectrum of each individual plant into the system. Each of the sample spectrum from different locations has formed their own position in the 3-D principle component graphs and combined to form the sphere separated by IMD. Validation of given simples was used to test the established database for its accuracy. The validation showed that only one out of six samples from each batch of sample was validated correctly, indicating a success rate of only 17%. On the other hand, the method successfully discriminated sample location in western coastline. It is concluded that with this established database, more than 50% of the validation detected the sample within the range of western coastline.
The established method of Assured ID database of roselle can be used as a reference database for roselle sample from unknown geographical locations in Malaysia with few limitations, but further improvement is needed.