Partitioning Error Sources for Quality Control and Comparability Analysis in Biological Monitoring and Assessment

Rationally, as scientists, we recognize that documented standard procedures constitute the first requirement for developing consistency within and among datasets; the second step is putting the procedures into practice. If the procedures were implemented as perfectly as they are written, there would be no need to question data. However, we are also cognizant of the fact that humans (a group of organisms to which we cannot deny holding membership) are called upon to use the procedures, and the consistency and rigor with which the procedures are applied are directly affected by an individual’s skill, training, attention span, energy, and focus (Edwards, 2004). In fact, we fully expect inconsistency due to human foibles, and often substantial portions of careers are spent in efforts to recognize, isolate, correct, and minimize future occurrences of, error. Many public and private organizations in the United States (US) and other countries collect aquatic biological data using a variety of sampling and analysis methods (Gurtz & Muir, 1994; ITFM, 1995a; Carter & Resh, 2001), often for meeting regulatory requirements, for example, by the United States’ Clean Water Act (CWA) of 1972 (USGPO, 1989). While the information collected by an individual organization is usually directly applicable to a specific question or site-specific issue, the capacity for using it more broadly for comprehensive assessment has been problematic due to unknown data quality produced by


Introduction
Rationally, as scientists, we recognize that documented standard procedures constitute the first requirement for developing consistency within and among datasets; the second step is putting the procedures into practice.If the procedures were implemented as perfectly as they are written, there would be no need to question data.However, we are also cognizant of the fact that humans (a group of organisms to which we cannot deny holding membership) are called upon to use the procedures, and the consistency and rigor with which the procedures are applied are directly affected by an individual's skill, training, attention span, energy, and focus (Edwards, 2004).In fact, we fully expect inconsistency due to human foibles, and often substantial portions of careers are spent in efforts to recognize, isolate, correct, and minimize future occurrences of, error.Many public and private organizations in the United States (US) and other countries collect aquatic biological data using a variety of sampling and analysis methods (Gurtz & Muir, 1994;ITFM, 1995a;Carter & Resh, 2001), often for meeting regulatory requirements, for example, by the United States' Clean Water Act (CWA) of 1972 (USGPO, 1989).While the information collected by an individual organization is usually directly applicable to a specific question or site-specific issue, the capacity for using it more broadly for comprehensive assessment has been problematic due to unknown data quality produced by Fig. 2. Total error or variability (s 2 ) associated with a biological assessment is a combined result of that for each component of the process" (Flotemersch et al. 2006).

Indicators
All aquatic ecosystems are susceptible to cumulative impacts from human-induced disturbances including inorganic and organic chemical pollution, hydrologic alteration, channelization, overharvest, invasive species, and land cover conversion.Because they live in the presence of existing water chemistry and physical habitat conditions, the aquatic life of these systems (fish, insects, plants, shellfish, amphibians, reptiles, etc.) integrates cumulative effects of multiple stressors that are produced by both point and non-point source (NPS) pollution.The most common organism groups that are used by routine biological monitoring and assessment programs are benthic macroinvertebrates (aquatic insects, snails, mollusks, crustaceans, worms, and mites), fish, and/or algae, with indicators most often taking the form of a multimetric Index of Biological Integrity (IBI; Karr et al., 1986;Hughes et al., 1998;Barbour et al., 1999;Hill et al., 2000Hill et al., , 2003) ) or a predictive observed/expected (O/E) model based on the River Invertebrate Prediction and Classification System (RIVPACS; Clarke et al., 1996Clarke et al., , 2003;;Hawkins et al., 2000;Hawkins, 2006).Of these latter three groups, benthic macroinvertebrates (BM) are commonly used because the protocols are most well-established, the level of effort required for field sampling is reasonable (Barbour et al., 1999), and taxonomic expertise is relatively easily accessible.Thus, examples of QC tests and corrective actions discussed in this chapter are largely focused on benthic macroinvertebrates in the context of multimetric indexes, though, similar procedures for routine monitoring with algae and fish could be developed.Stribling et al. (2008) also used some of these procedures for documenting performance of O/E models.

Potential error sources in indicators 4.1 Field sampling
Whether the target assemblage is benthic macroinvertebrates, fish, or algae, the first step of biological assessment is to use standard field methods to gather a sample representing the taxonomic diversity and functional composition of a reach, zone, or other stratum of a waterbody.The actual dimensions of the sampling area ultimately depend on technical objectives and programmatic goals of the monitoring activity (Flotemersch et al., 2010).The spatial area from which the biological sample is drawn is that segment or portion of the waterbody the sample is intended to represent; for analyses and higher level interpretation, biological indicators are considered equivalent to the site.For its national surveys of lotic waters (streams and rivers), the U. S. Environmental Protection Agency defines a sample reach as 40x the mean wetted width (USEPA, 2004a); many individual states use a fixed 100m as the sampling reach.Benthic macroinvertebrate samples are collected along 11 transects evenly distributed throughout the reach length, and a D-frame net with 500-µm mesh openings used to sample multiple habitats (Klemm et al., 1998;USEPA, 2004a;Flotemersch et al., 2006).An alternative approach to transects is to estimate the proportion of different habitat types in a defined reach (e.g., 100m), and distribute a fixed level of sampling effort in proportion to their frequency of occurrence throughout the reach (Barbour et al., 1999(Barbour et al., , 2006)).For both approaches, organic and inorganic sample material (leaf litter, small woody twigs, silt, and sand) are composited in one or more containers, preserved with 95% denatured ethanol, and delivered to laboratories for processing.A composite sample over multiple habitats in a reach is a common protocol feature of many monitoring program throughout the US (Carter & Resh, 2001).

Laboratory processing
Processing of benthic macroinvertebrate samples is a 3-step process.Sorting and subsampling serves to 1) isolate individual organisms from nontarget material, such as leaf litter and other detritus, bits of woody material, silt, and sand, and 2) prepare the sample (or subsample) for taxonomic identification.Taxonomic identification serves to match nomenclature to specimens in the sample, and enumeration provides the actual counts, by taxon, of everything contained within the sample.Although it is widely recognized that subsampling helps to manage the level of effort associated with bioassessment laboratory work (Carter & Resh, 2001), the practice has been the subject of much debate (Courtemanch, 1996;Barbour & Gerritsen, 1996;Vinson & Hawkins, 1996).Fixed organism counts vary among monitoring programs (Carter & Resh, 2001), with 100, 200, 300 and 500 counts being most often used (Barbour et al., 1999;Cao & Hawkins, 2005;Flotemersch et al., 2006).Flotemersch & Blocksom (2005) concluded that a 500-organism count was most appropriate for large/nonwadeable river systems, based on examination of the relative increase in richness metric values (< 2%) between successive 100organism counts.However, they also suggested that 300-organism count is sufficient for most study needs.Others have recommended higher fixed counts, including a minimum of 600 in wadeable streams (Cao & Hawkins, 2005).The subsample count used for the USEPA national surveys is 500 organisms (USEPA, 2004b); many states use 200 or 300 counts.If organisms are missed during the sorting process, bias is introduced in the resulting data.Thus, the primary goal of sorting is to completely separate organisms from organic and inorganic material (e.g., detritus, sediment) in the sample.A secondary goal of sorting is to provide the taxonomist with a sample for which the majority of specimens are identifiable.Note that the procedure described here assumes that the sorter and the taxonomist are different personnel.Although it is not the decision of the sorter whether an organism is identifiable, straightforward rules can be applied that minimize specimen loss.For example, "counting rules" can be part of the standard operating procedures (SOP) for both the sorting/subsampling and taxonomic identification, such as specifying what not to count:  Non-benthic organisms, such as free-swimming gyrinid adults (Coleoptera) or surfacedwelling veliids (Heteroptera)  Empty mollusk shells (Mollusca: Bivalvia and Gastropoda)  Non-headed worm fragments  Damaged insects and crustaceans that lack at least a head and thorax  Incidental collections, such as terrestrial insects or aquatic vertebrates (fish, frogs or tadpoles, snakes, or other)  Non-macroinvertebrates, such as copepods, cladocera, and ostracods  Exuviae (molted "skins")  Larvae or pupae where internal tissue has broken down to point of floppiness If a sorter is uncertain about whether an organism is countable, the specimen should be placed in the vial and not added to the rough count total.The sorting/subsampling process is based on randomly selecting portions of the sample detritus spread over a gridded Caton screen (Caton, 1991;Barbour et al., 1999;see also Figures 6-4a, b of Flotemersch et al., 2006 [note that an individual grid square is 6 cm x 6 cm, or 36 cm 2 , not 6 cm 2 a s i n d i c a t e d i n F i g u r e 6 -4 b ] ) .P r i o r t o b e g i n n i n g t h e sorting/subsampling process, it is important that the sample be mixed thoroughly and distributed evenly across the sorting tray to reduce the effect of organism clumping that may have occurred in the sample container.The grids are randomly selected, individually removed from the screen, placed in a sorting tray, and all organisms removed with forceps; the process is completed until the rough count by the sorter exceeds the target subsample size.There should be at least three containers produced per sample, all of which should be clearly labeled: 1) subsample to be given to taxonomist, 2) sort residue to be checked for missed specimens, and 3) unsorted sample remains to be used for additional sorting, if necessary.The next step of the laboratory process is identifying the organisms within the subsample.A major question associated with taxonomy for biological assessments is the hierarchical target levels required of the taxonomist, including order, family, genus, species or the lowest practical taxonomic level (LPTL).While family level is used effectively in some monitoring programs (Carter & Resh 2001), the taxonomic level primarily used in most routine monitoring programs is genus.However, even with genus as the target, many programs often treat selected groups differently, such as midges (Chironomidae) and worms (Oligochaeta), due to the need for slide-mounting.Slide-mounting specimens in these two groups is usually (though, not always) necessary to attain genus level nomenclature, and sometimes even tribal level for midges.Because taxonomy is a major potential source of error in any kind of biological monitoring data sets (Stribling et al., 2003(Stribling et al., , 2008a;;Milberg et al., 2008;Bortolus, 2008), it is critical to define taxonomic expectations and to treat all samples consistently, both by a single taxonomist and among multiple taxonomists.This, in part, requires specifying both hierarchical targets and counting rules.An example list of taxonomic target levels is shown in Table 1.These target levels define the level of effort that should be applied to each specimen.If it is not possible to attain these levels for certain specimens due to, for example, the presence of early instars, damage, or poor slide mounts, the taxonomist provides a more coarse-level identification.When a taxonomist receives samples for identification, depending upon the rigor of the sorting process (see above), the samples may contain specimens that either cannot be identified, or non-target taxa that should not be included in the sample.The final screen of sample integrity is the responsibility of the taxonomist, who determines which specimens should remain unrecorded (for any of the reasons stated above).Beyond this, the principal responsibility of the taxonomist is to record and report the taxa in the sample and the number of individuals of each taxon.Programs should use the most current and accepted keys and nomenclature.An Introduction to the Aquatic Insects of North America (Merritt et al., 2008) is useful for identifying the majority of aquatic insects in North America to genus level.By their very nature, most taxonomic keys are obsolete soon after publication; however, research taxonomists do not discontinue research once keys are available.Thus, it is often necessary to have access to and be familiar with ongoing research in different taxonomic groups.Other keys are also necessary for non-insect benthic macroinvertebrates that will be encountered, such as Oligochaeta, Mollusca, Acari, Crustacea, Platyhelminthes, and others.Klemm et al. (1990) and Merritt et al. (2008) provide an exhaustive list of taxonomic literature for all major groups of freshwater benthic macroinvertebrates.Although it is not current for all taxa, the integrated taxonomic information system (ITIS; http://www.itis.usda.gov/)has served as a clearinghouse for accepted nomenclature, including validity, authorship and spelling.

Data entry
Taxonomic nomenclature and counts are usually entered into the data management system directly from handwritten bench or field sheets.Depending on the system used, there may be an autocomplete function that helps prevent misspellings, but which can also contribute to errors.For example, entering the letters 'hydro' could potentially autocomplete as either Partitioning Error Sources for Quality Control and Comparability Analysis in Biological Monitoring and Assessment 65 Hydropsyche or Hydrophilus, and the data entry technician on autopilot might continue as normal.There are also, increasingly, uses of e-tablets for entering field observation data, or direct entry of laboratory data into spreadsheets, obviating the need for hardcopy paper backup.

Data reduction/indicator calculation
There is a large number of potential metrics that monitoring programs can use (Barbour et al., 1999;Blocksom & Flotemersch, 2005;Flotemersch et al., 2006), requiring testing, calibration, and final selection before being appropriate for routine application.Blocksom & Flotemersch (2005) tested 42 metrics relative to different sampling methods, mesh sizes, and habitat types, some of which are based on taxonomic information, as well as stressor tolerance, functional feeding group, and habit.Other workers and programs have tested more and different ones.For example, the US state of Montana calibrated a biological indicator for wadeable streams of the "mountains" site class (Montana DEQ 2006), resulting in a multimetric index comprised of seven metrics (Table 2).This discussion assumes that the indicator terms have already been calibrated and selected, and deals specifically with their calculation.For this purpose, the raw data are taxa lists and counts; their conversion into metrics is data reduction usually performed with computer spreadsheets or in relational databases.
To ensure that database queries are correct and result in the intended metric values, a subset of values should be recalculated by hand.One metric is calculated for all samples, all metrics are calculated for one sample.When recalculated values differ from those values in the matrix, the reasons for the disagreement are determined and corrections are made.
Reports on performance include the total number of reduced values as a percentage of the total, how many errors were found in the queries, and the corrective actions specifically documented.

Indicator reporting
Regardless of whether the indicator is based on a multimetric framework or multivariate predictive model, the ultimate goal is to translate the quantitative, numeric result, the score, into some kind of narrative that provides the capacity for broad communication.The final assessment for a site is usually determined based on a site score relative to the distribution of reference site scores to reflect degrees of biological degradation, the more similar a test site is to reference less degradation is being exhibited.Depending on the calibration process and how many condition categories are structured, narratives for individual sites can come from two categories (degraded, nondegraded), three (good, fair, poor), four (good, fair, poor, very poor), or five (very good, good, fair, poor, or very poor).There also may be other frameworks a program chooses to use, but the key is to have the individual categories quantitatively-defined.

Measurement quality objectives (MQO)
For each step of the biological assessment process there are different performance characteristics that can be documented, some of which are quantitative and others that are qualitative (Table 3).Measurement quality objectives (MQO) are control points above (or  (Diamond et al., 2006;Flotemersch et al., 2006;Stribling et al., 2003Stribling et al., , 2008a, b;, b;Herbst & Silldorf, 2006), and are roughly analogous to the Shewhart (1939) Table 3. Error partitioning framework for biological assessments and biological assessment protocols for benthic macroinvertebrates.There may be additional activities and performance characteristics, and they may be quantitative (), qualitative (∆) or not applicable (na).
Specific MQO should be selected based on the distribution of values attained, particularly the minima and maxima.Importantly, for environmental monitoring programs, special studies should never be the basis upon which a particular MQO is selected; rather, they should reflect performance expectations when routine techniques and monitoring personnel are used.Consider MQO that are established using data from the best field team, or the taxonomist with the most years of experience, or the dissolved oxygen measurements taken using the most expensive field probes.When those people or equipment are no longer available to the program, how useful would the database be to future or secondary users?Defensibility would potentially be diminished.Values that are >MQO are not automatically taken to be unacceptable data points; rather, such values are targeted for closer scrutiny to determine possible reasons for exceedence and might indicate a need for corrective actions (Stribling et al. 2003, Montana DEQ 2006).Simultaneously, they can be used to help quantify performance of the field teams in consistently applying the methods.

Field sampling
Quantitative performance characteristics for field sampling are precision and completeness (Table 3).Repeat samples for purposes of calculating precision of field sampling are obtained by sampling two adjacent reaches, shown as 500 m in this example (Figure 3), and can be done by the same field team for intra-team precision, or by different teams for interteam precision.For benthic macroinvertebrates, samples from the adjacent reaches (also called duplicate or quality control [QC] samples) must be laboratory-processed prior to data being available for precision calculations.Assuming acceptable laboratory error, these precision values are statements of the consistency with which the sampling protocols 1) characterized the biology of the stream or river and 2) were applied by the field team, and thus, reflect a combination of natural variability and systematic error inherent in the dataset.
The number of reaches for which repeat samples are taken varies, but a rule-of-thumb is 10%, randomly-selected from the total number of sampling reaches constituting a sampling effort (whether yearly, programmatic routine, or individual project).Because they are the ultimate indicators to be used in address the question of ecological conditions, the metric and index values are used to calculate different precision estimates.Root-mean square error (RMSE) (formula 1), coefficient of variability (CV) (formula 2), and confidence intervals (formula 3) (Table 4) are calculated on multiple sample pairs, and are meaningful in that context.Documented values for field sampling precision (Table 5) demonstrate differences among individual metrics and the overall multimetric index (Montana MMI; mountain site class).Relative percent difference (RPD) (formula 4) (Table 4) can have meaning for individual sample pairs.For example, for the composite index, median relative percent difference (RPD) was 8.0 based on 40 sample pairs (Stribling et al., 2008b).MQO recommendations for that routine field sampling for that biological monitoring program were a CV of 10% and a median RPD of 15.0.Sets of sample pairs having with CV>10% would be subjected to additional scrutiny to determine what might be the cause of increased variability.Similarly, individual RPD values for sample pairs would be more specifically examined.
Percent completeness (formula 5) (Table 3, 4) is calculated to communicate the number of valid samples collected as a proportion of those that were originally planned.This value serves as one summary of data quality over the dataset and it demonstrates an aspect of confidence in the overall dataset.where y ij is the i th individual observation in group j, j = 1…k (Zar 1999).Lower values indicate better consistency; and are used in calculation of the coefficient of variability (CV), a unit-less measure, by the formula: where Y is the mean of the dependent variable (e.g., metric, index across all sample pairs; Zar 1999).It is also known as relative standard deviation (RSD).
Confidence intervals (CI) (or detectable differences) are used to indicate the magnitude of separation of 2 values before the values can be considered different with statistical significance.A 90% significance level for the CI (i.e., the range around the observed value within which the true mean is likely to fall 90% of the time, or a 10% probability of type I error [α]).The 90% confidence interval (CI90) is calculated using RMSE by the formula: where zα is the z-value for 90% confidence (i.e., p = 0.10) with degrees of freedom set at infinity.In this analysis, zα = 1.64 (appendix 17 in Zar 1999).For CI95, the z-value would be 1.96.As the number of sample repeats increases, CI becomes narrower; we provide CI that would be associated with 1, 2, and 3 samples per site.
Relative percent difference (RPD) is the proportional difference between 2 measures, and is calculated as: where A is the metric or index value of the 1st sample and B is the metric or index value of the 2nd sample (Keith, 1991;APHA, 2005;Smith, 2000).Lower RPD values indicate improved precision (as repeatability) over higher values.Percent completeness (%C) is a measure of the number of valid samples that were obtained as a proportion of what was planned, and is calculated as: where v is the number of valid samples, and T is the total number of planned samples (Flotemersch et al., 2006).Percent sorting efficiency (PSE) describes how well a sample sorter has done in finding and removing all specimens from isolated sample material, and is calculated as:

www.intechopen.com
Modern Approaches To Quality Control 70 where A is the number of organisms found by the original sorter, and B is the number of missed organisms recovered (specimen recoveries) by the QC laboratory sort checker.Percent taxonomic disagreement (PTD) quantifies the sample-based precision of taxonomic identifications by comparing target level taxonomic results from two independent taxonomists, using the formula: where a is the number of agreements, and N is the total number of organisms in the larger of the two counts (Stribling et al., 2003(Stribling et al., , 2008a)).Percent difference in enumeration (PDE) quantifies the consistency of specimen counts in samples, and is determined by calculating a comparison of results from two independent laboratories or taxonomists using the formula: where n 1 is the number of organisms in a sample counted by the first laboratory, and n 2 , the second (Stribling et al. 2003).
Percent taxonomic completeness (PTC) describes the proportion of specimens in a sample that meet the target identification level (Stribling et al. 2008) and is calculated as: where x is the number of individuals in a sample for which the identification meets the target level, and N is the total number of individuals in the sample.Discrimination efficiency (DE) is an estimate of the accuracy of multimetric indexes and individual metrics is characterized as their capacity to correctly identify stressor conditions (physical, chemical, hydrologic, and land use/land cover) and is quantified as discrimination efficiency using the formula: where a is the number of a priori stressor sites identified as being below the quantified biological impairment threshold of the reference distribution (25 th percentile, 10 th , or other), and b is the total number of stressor sites (Flotemersch et al., 2006).
Table 4. Explanations and formulas for quantifying 10 different performance characteristics for different steps of the biological assessment process.
Qualitative performance characteristics for field sampling are bias and representativeness (Table 3).Programs that use multihabitat sampling, either transect-based similar to that used by the US national surveys (USEPA 2004a), or distributing sampling effort among different habitat types (Barbour et al., 1999(Barbour et al., , 2006)), are attempting to minimize the bias through two components of the field method.First, the approaches are not limited to one or a few habitat types; they are focused on sampling stable undercut banks, macrophyte beds, root wads, snags, gravel, sand, and/or cobble.Second, allocation of the sampling effort is distributed throughout the entire reach, thus preventing the entire sample from being taken in a shortened portion of the reach.Further, if the predominant habitat in a sample reach is poor or degraded, that habitat would be sampled as well.These field sampling methods are intended to depict the benthic macroinvertebrate assemblage that the physical habitat in the streams and rivers has the capacity to support.Another note about representativeness is to be cognizant that, while a method might effectively depict the property it is intended to depict (Flotemersch et al., 2006), it could be interpreted differently at different spatial scales (Figure 4).(Stribling et al., 2008b).Data shown are from the US state of Montana, and performance calculations are based on 40 sample pairs from the "mountain" site class (abbreviations -RMSE, root mean square error; CV, coefficient of variation; CI90, 90 percent confidence interval; EPT, Ephemeroptera, Plecoptera, Trichoptera).Accuracy is considered "not applicable" to field sampling (Table 3), because efforts to define analytical truth would necessitate a sampling effort excessive beyond any practicality.That is, the analytical truth would be all benthic macroinvertebrates that exist in the river (shore zone to 1-m depth).There is no sampling approach that will collect all individual benthic macroinvertebrate organisms.

Sorting/subsampling
Bias, precision, and, in part, completeness, are quantitative characteristics of performance for laboratory sorting and subsampling (Table 3).Bias is the most critical performance characteristic of the sorting process, and is evaluated by checking for specimens that may have been overlooked or otherwise missed by the primary sorter (Flotemersch et al., 2006).Checking of the sort residue is performed by an independent sort checker in a separate laboratory using the same procedures as primary, specifically, the same magnification and lighting as called for in the SOP.The number of specimens found by the checker as a proportion of the total number of originally found specimens is the percent sorting efficiency (PSE; formula 6) (Table 4), and quantifies sorting bias.This exercise is performed on a randomly-selected subset of sort residues (generally 10% of total sample lot), the selection of which is stratified by individual sorters, by projects, or by programs.As a ruleof-thumb, an MQO could be "less than 10% of all samples checked will have a PSE ≤90%".
Table 6 shows PSE results from sort rechecks for a project within the state of Georgia (US).
One sample (no.8) exhibited a substantial failure with a PSE of 77.8, which became an immediate flag for a potential problem.Further evaluation of the results showed that the sample was fully sorted (100%), and still only 21 specimens were found by the original sorter, prior to the 6 recoveries by the re-check.Values for PSE become skewed when overall numbers are low, thus failure of this one sample did not indicate systematic error (bias) in the sorting process.Three additional samples fell slightly below the 90% MQO, but were only ≤ 0.2 percentage points low and were judged as passing by the QC analyst.Precision of laboratory sorting is calculated by use of RPD with metrics and indexes as the input variables (Table 4).If, for example, the targeted subsample size is 200 organisms, and that size subsample is drawn twice from a sorting tray without re-mixing or re-spreading, metrics can be calculated from the two separate subsamples.RPD would be an indication of how well the sample was mixed and spread in the tray; the "serial subsampling" and RPD calculations should be done on two timeframes.First, these calculations should be done, and the results documented and reported to demonstrate what the laboratory (or individual sorter) is capable of in application of the subsampling method.Second, they should be done periodically to demonstrate that the program routinely continues to meet that level of precision.Representativeness of the sorting/subsampling process is addressed as part of the SOP that requires random selection of grid squares (Flotemersch et al., 2006) with complete sorting, until the target number is reached within the final grid.Percent completeness for subsampling is calculated as the proportion of samples with the target subsample size (±20%) in the rough sort.Considered as "not applicable", estimates of accuracy are not necessary for characterizing sorting performance.

Taxonomic precision (sample-based)
Precision and completeness are quantitative performance characteristics that are used for taxonomy (Table 3).Precision of taxonomic identifications is calculated using percent taxonomic  disagreement (PTD) and percent difference in enumeration (PDE), both of which rely on the raw data (list of taxa and number of individuals) from whole-sample re-identifications (Stribling et al., 2003(Stribling et al., , 2008a)).These two values are evaluated individually, and are used to indicate the overall quality of the taxonomic data.They can also be used to help identify the source of a problem.Percent taxonomic completeness (PTC) is calculated to document how consistently the taxonomist is able to attain the targeted taxonomic levels as specified in the SOP.It is important to note that the purpose of this evaluation approach is not to say that one taxonomist is correct over the other, but rather to make an effort to understand what is causing differences where they exist.The primary taxonomy is completed by one or more project taxonomists (T1); the re-identifications are completed as blind samples by one or more secondary, or QC taxonomists (T2) in a separate independent laboratory.The number of samples for which this analysis is performed will vary, but 10% of the total sample lot (project, program, year, or other) is an acceptable rule-of-thumb.Exceptions are that large programs (>~500 samples) may not need to do >50 samples; small programs (<~30 samples) will likely still need to do at least 3 samples.In actuality, the number of reidentified samples will be program-specific and influenced by multiple factors, such as, how many taxonomists are doing the primary identification (there may be an interest in having 10% of the samples from each taxonomist re-identified), and how confident the ultimate data user is with the results.Mean values across all re-identified samples are estimates of taxonomic precision (consistency) for a dataset or a program.

Percent taxonomic disagreement (PTD)
The sample-based error rate for taxonomic identifications is quantified by calculation of percent taxonomic disagreement (PTD) (Table 4, formula 7).The key exercise performed by the QC analyst is determining the number of matches, or shared identifications between the two taxonomists (Table 7).Matches must be exact, that is, negative comparisons result even if the difference is only hierarchical (genus vs. family, or other), whether they have been assigned different names, or whether specimens are missing from the overall results of either T1 or T2.Error typing individual sample comparisons is the process of determining differences as either: a) straight disagreements, b) hierarchical differences, or c) missing specimens.While tedious, this QC exercise provides information that is extremely valuable in formulating corrective actions.An MQO of 15% has been found to be attainable by most programs, and is used for the USEPA national surveys.As testing continues and laboratories and taxonomists become more accustomed to the procedure, it is becoming apparent that potentially the national standard could eventually be set at 10%.A standard summary report for taxonomic identification QC (Table 8) can be effectively communicated to data users.

Percent difference in enumeration (PDE)
Another summary data quality indicator for performance in taxonomic identification is comparison of the total number of organisms counted and reported in the sample by the two taxonomists (not the sorters).There is some redundancy of this measure with PTD, but it has proven useful in helping highlight coarse differences immediately, and is calculated as percent difference in enumeration (PDE) (Table 4, formula 8).While sorters may be welltrained, experienced, and have substantial internal QC oversight, they may not always be able to determine identifiability, the final decision of which is the responsibility of the taxonomist.It is rare to find exact agreement on sample counts between two taxonomists but the differences are usually minimal, hence the low recommended MQO of 5%.When PDE>5, reasons are usually fairly obvious, and the QC analyst can turn attention directly to the error source to determine if it may be systematic, and the nature and necessity of corrective action(s).

Percent taxonomic completeness (PTC)
Percent taxonomic completeness (PTC) (Table 3, formula 9) quantifies the proportion of individuals in a sample that are identified to the specified target taxonomic level (Table 1).
Results can be interpreted in a number of ways: the individuals in a sample are damaged or early instar, many are damaged with diagnostic characters missing (such as, gills, legs, antennae, etc.) or the taxonomist is inexperienced or unfamiliar with the particular taxon.MQO have not been used for this characteristic, but barring an excessively damaged sample, it is not uncommon to see PTC in excess of 97 or 98.For purposes of QC, it is more important to have the absolute difference (abs diff) of PTC between T1 and T2 to be a low number, as documentation of consistency of effort; those values are often typical at 5-6%, or below.

Taxonomic accuracy (taxon-based)
Accuracy and bias (the inverse of accuracy) are quantitative performance characteristics for taxonomy (Table 3).Accuracy requires specification of an analytical truth, and for taxonomy that is 1) the museum-based type specimen (holotype, or other form of type specimen), 2) specimen(s) verified by recognized expert(s) in that particular taxon or 3) unique morphological characteristics specified in dichotomous identification keys.Determination of accuracy is considered "not applicable" for production taxonomy (most often used in routine monitoring programs) because that kind of taxonomy is focused on characterizing the sample; taxonomic accuracy, by definition, would be focused on individual specimens.Bias in taxonomy can result from use of obsolete nomenclature and keys, imperfect understanding of morphological characteristics, inadequate optical equipment, or poor training.Neither of these performance characteristics is considered necessary for production taxonomy, in that they are largely covered by the estimates of precision and completeness.
For example, although it is possible that two taxonomists would put an incorrect name on an organism, it is considered low probability that they would put the same incorrect name on that organism.

Data entry accuracy
Recognition and correction of data entry errors (even the one mentioned in Section 4.3) could come from one of two methods for assuring accuracy in data entry; both do not need to be done.One is the double entry of all data by two separate individuals, and then performing a direct match between databases.Where there are differences, it is determined which database is in error, and corrections are made.The second approach is to perform a 100% comparison of all data entered to handwritten data sheets.Comparisons should be performed by someone other than the primary data entry person.When errors are found, they are hand-edited for documentation, and corrections are made electronically.The rates of data entry errors are recorded and segregated by data type (e.g., fish, benthic macroinvertebrates, periphyton, header information, latitude and longitude, physical habitat, and water chemistry).Issues could potentially arise when entering data directly into field e-tablets or laboratory computers.Because there would be no paper backup, QC checks of data entry are not possible.

Site assessment and interpretation
Quantitative performance characteristics for site assessment and interpretation are precision, accuracy, and completeness (Table 3).Site assessment precision is based on the narrative assessments from the associated index scores (good, fair, poor) from reach duplicates and quantifies the percentage of duplicate samples that are receiving the same narrative assessments.These comparisons are done for a randomly-selected 10% of the total sample lot.Table 9 shows this direct comparison that, for this dataset, 79% of the replicates returned assessments of the same category (23 out of 29); 17% were 1 category different (5 of 29); and 3% were 2 categories different (1 of 29).Assessment accuracy is expressed using discrimination efficiency (DE) (formula 10; Table 4), a value developed during the index calibration process, which relies upon, first, specifying magnitudes of physical, chemical, and/or hydrologic stressors that are unacceptable, and identifying those sites exhibiting those excessive stressor characteristics.The set of sites exhibiting unacceptable stressor levels constitute the analytical truth.The proportion of samples for which the biological index correctly identifies sites as impaired is DE.This is a performance characteristic that is directly suitable for expressing how well an indicator does what it is designed to do, detect stressor conditions, but it is not suitable for routine QC analyses.Percent completeness (%C) is the proportion of sites (of the total planned) for which valid final assessments were obtained.

Performance characteristic MQO
Field sampling precision (multimetric index)  10.Key measurement quality objectives (MQO) that could be used to track maintenance of data quality at acceptable levels.
Key to maintaining data quality of known and acceptable levels is establishing performance standards based on MQO.Qualitative standards, such as some of the representativeness and accuracy factors (Table 3), can be evaluated by comparing SOP and SOP application to the goals and objectives of the monitoring program.However, a clear statement of data quality expectations, such as that shown in Table 10, will help to ensure consistency of success in implementing the procedures.As a program becomes more proficient and consistent in meeting the standards, efforts could be undertaken to "tighten up" the standards.With this comes necessary budgetary considerations; better precision can always be attained, but often at elevated costs.

Comparability analysis and acceptable data quality
All discussion to this point has been directed toward documenting data quality associated with monitoring programs, hopefully with sufficient emphasis that there are no data that are right or wrong, but just that they are acceptable or not.If data are acceptable for a decision (for example, in the context of biological assessment and monitoring), a defensible statement on the ecological condition of a site or an ecological system can be made.If they are not acceptable to support that decision, likewise, the decision not to use the data should also be defensible.Routine documentation and reporting of data quality within a monitoring program provides a statement of intra-programmatic consistency, that is, sample to sample comparability even if collected from different temporal or spatial scales.If there is an interest in or need to combine datasets from different programs (Figure 5), it is imperative for routinely documented performance characteristics be available for each.Lack of them will preclude any determination of acceptability for decision making by data users, whether scientists, policy-makers, or the public.
Partitioning Error Sources for Quality Control and Comparability Analysis in Biological Monitoring and Assessment 79 Fig. 5. Framework for analysis of comparability between or among monitoring datasets or protocols.

Conclusion
If data of unknown quality are used, whether by themselves or in combination with others, the assumption is implicit that they are acceptable, and hence, comparable.We must acknowledge the risk of incorrect decisions when using such data and be willing to communicate those risks to both data users and other decisionmakers.The primary message of this chapter is that appropriate and sufficient QC activities should be a routine component of any monitoring program, whether it is terrestrial or aquatic, focuses on physical, chemical, and/or biological indicators, and, if biological, whether it includes macroinvertebrates, algae/diatoms, fish, broad-leaf plants, or other organisms groups.

References
www.intechopen.comPartitioningError Sources for Quality Control and Comparability Analysis in Biological Monitoring and Assessment 69 Also called standard error of estimate, root mean square error (RMSE) is an estimate of the standard deviation of a population of observations and is calculated by:

Fig. 4 .
Fig. 4. Defining representativeness of a sample or datum first requires specifying the spatial and/or temporal scale of the feature it is intended to depict.
Partitioning Error Sources for Quality Control and Comparability Analysis in Biological Monitoring and Assessment 63

Table 1 .
In this example list of hierarchical target levels, all taxa are targeted for identification to genus level, unless otherwise noted.Taxa with target levels in parentheses are left at that level.

Table 2 .
(Montana DEQ 2006, Stribling et al. 2008bic macroinvertebrates.Shown are those developed and calibrated for streams in the "mountains" site class of the state of Montana, USA(Montana DEQ 2006, Stribling et al. 2008b).
concept of process control.

Table 5
. Precision estimates for sample-based benthic macroinvertebrate metrics, and composite multimetric index

Table 6 .
Percent sorting efficiency (PSE) as laboratory sorting/ subsample quality control check.Results from 2006-2008 sampling for a routine monitoring program in north Georgia, USA.

Table 7 .
Summary table for sample by sample taxonomic comparison results, from routine biological monitoring in US state of Mississippi.T1 and T2 are the primary and QC taxonomists, respectively."No.matches" is the number of individual specimens counted and given the same identity by each taxonomist, and PDE, PTD, and PTC are explained in text.Target level is the number and percentage of specimens identified to the SOP-specified level of effort (see Table3as an example); "Abs diff" is the absolute difference between the PTC of T1 and T2.

Table 8 .
Taxonomic comparison results from a bioassessment project in the US state of Mississippi.
APHA. 2005.Standard Methods for the Examination of Water and Wastewater.21 st edition.American Public Health Association, American Water Works Association, and Water Environment Federation, Washington, DC.Barbour, M.T., & J. Gerritsen.1996.Subsampling of benthic samples: a defense of the fixed count method.Journal of the North American Benthological Society 15:386-391.