Open access peer-reviewed chapter

Managing Quantities and Units of Measurement in Code Bases

Written By

Steve McKeever

Reviewed: 12 September 2022 Published: 25 October 2022

DOI: 10.5772/intechopen.108014

From the Edited Volume

Updates on Software Usability

Edited by Laura M. Castro

Chapter metrics overview

71 Chapter Downloads

View Full Metrics

Abstract

Quantities in engineering and the physical sciences are expressed as units of measurement (UoM). If a software system fails to maintain the algebraic attributes of a system’s UoM information correctly when evaluating expressions then disastrous problems can arise. However, it is perhaps the more mundane unit mismatches and lack of interoperability that over time incurs a greater cost. Global and existential challenges, from infectious diseases to environmental breakdown, require high-quality data. Ensuring software systems support quantities explicitly is becoming less of a luxury and more of a necessity. While there are technical solutions that allow units of measurement to be specified at both the model and code level, a detailed assessment of their strengths and weaknesses has only recently been undertaken. This chapter provides both a formal introduction to managing quantities and a practical comparison of existing techniques so that software users can judge the robustness of their systems with regards to units of measurement.

Keywords

  • units of measurement
  • quantities
  • dimension checking
  • unit conversion
  • libraries
  • component based checking

1. Introduction

With ubiquitous digitalisation, and removal of humans in the loop, the need to faithfully represent and manipulate quantities in physical systems is ever increasing [1]. Popular programming languages allow developers to describe how to evaluate numeric expressions but not how to detect inappropriate actions on quantities. Consequently there have been infamous examples, such as the Mars Climate Orbiter [2], where units of measurement (UoM) conversion omissions led to catastrophic outcomes.

Humans have used local units of measurement since the days of early trade, enhanced over time to fulfil the accuracy and interoperable needs of science and technology. In the 19th century, James Clerk Maxwell [3] introduced the concept of a system of quantities with a corresponding system of units. This generalisation allowed scientists working with different measurement systems to communicate more easily, as unit names (such as inch or metre) are treated as numeric variables and can be interchanged through multiplication.

There are many ways in which manipulating physical quantities in a digital system can be made more robust. Thereby enabling the developer to depend on an automated validator, rather than trust, to ensure UoM are handled correctly. The software engineering benefits of embracing quantity checking and automatic conversion support is beyond dispute. It is clear from the various known UoM failures that one size does not fit all. However, a general lack of awareness has ensured that developers often reinvent the wheel or forego any kind of checking. From a correctness perspective, the optimal solution would be to natively support quantities as this allows for efficient unit conversion and static checking. However none of the mainstream languages provide such support, nor is that level of rigour always required. A software library might seem to confer the desired functionality of adding a unit type to one’s preferred language but they are inconvenient in practice, adding an extra layer to one’s code base, and incur a performance cost. All popular programming languages have a multitude of freely available quantity libraries but we argue that these are best suited to applications in which UoM checking is required at run-time. Component based checking and black-box testing are two lightweight methods of providing a degree of robustness while sacrificing completeness. Component checking will only validate the interfaces between components, while testing will ensure known examples of methods dealing with quantities perform correctly. Neither are comprehensive.

Implicit in our presentation is that the various approaches have compromises. For users who need a degree of robustness when using quantities in their code bases, this discussion is very important. An approach that initially might seem suitable for a given project, could be too costly in terms of speed, too cumbersome in terms of use or too risky by creating an unnecessary dependency on some library. In Section 2 we discuss early attempts to incorporate UoM checking into programming languages, and recent attempts to incorporate quantities into modelling languages. In Section 3 we introduce a very simple assignment language to show how the various aspects of quantities are defined and validated. In Section 4 we discuss the various approaches to providing quantity support in programming environments, highlighting their strengths and weaknesses. We summarise the results of our comparative study in Section 5, providing suggestions for developers as to which method to choose depending on their requirements, along with enabling software users to access pertinent aspects of an implementation that claims UoM support.

Advertisement

2. Background

Dimensions are physical quantities that can be measured, while units are arbitrary labels that correspond to a given dimension to make it relative. For example a dimension is length, whereas a metre is a relative unit that describes length. Units of measure can be defined in the most generic form as either base quantities or derived quantities. The base quantities are the basic building blocks, and the derived quantities are built from these. For instance, the base quantity for time is second and that for length is metre in the International System of Units (SI), also known as the metric system. The SI system of measurement is based on seven base quantities for length, mass, time, electric current, temperature, quantity, and brightness [4]. Velocity, (metre/second or metre × second−1), is a derived quantity made from the two base quantities. Rather than representing UoM as a tree structure, a normal form exists which makes storage and comparison a lot easier. Any system of units can be derived from the base units as a product of powers of those base units: base e1× base e2× base en, where the exponents e1,,en are rational numbers. Thus an SI unit can be represented as a 7-tuple e1e7 where ei denotes the i-th base unit; or in our case e1 denotes length, e2 mass, e3 time and so on.

Adding units to conventional programming languages originates in the 1970s [5] and early 80s with proposals to extend Fortran [6] and then Pascal [7]. However these efforts were heavily syntax based and required modifications to the underlying languages, reducing backwards compatibility and thus uptake. Ada’s abstraction facilities, namely operator overloading and type parameterisation, allowed for a more versatile approach [8] to labelling variables with UoM features. With the appearance of practical object oriented programming languages, such as C++ and Java, developers began to implement UoM either by means of a class hierarchy of units and their derived forms, or via the Quantity pattern [9]. This has led to a veritable explosion in the number of UoM libraries available for all popular programming languages based on this pattern [10].

Software development typically begins at a more abstract level through diagrams and rules that focus on the conceptual model that is to be implemented. Extensions to the Unified Modelling Language (UML) have been proposed to support quantities. SysML, for instance, is defined as an extension of a subset of UML to support systems engineering activities and has extensive support for quantities1. Unit checking and conversion can be undertaken before code is generated, either through a compilation workflow that leverages Object Constraint Language (OCL) expressions [11] or staged computation [12]. Unless the workflow has been created specifically, declaring quantities in a system specification language offers no guarantee that the UoM information is supported in the eventual implementation. The aim of this chapter is to discuss the various ways in which quantity information can be transferred into software and errors detected either at compile-time or run-time. Motivated by a prior critique of UoM libraries and a survey of scientific coders [13], we assess various approaches based on their ease of use, execution speed, numeric accuracy, ease of integration and coverage of unit error detection capabilities.

We lack a definitive understanding of how frequently quantity errors occur in practice. Anecdotally we can infer that it is not negligible from experiments described in the literature. When applied to a repository of CellML models, a validation tool [14] found that 60% of the descriptions that were invalid had dimensionally inconsistent units. A spreadsheet checker [15] was applied to 22 published scientific spreadsheets and detected 3, nearly 14%, with errors. Ore [16] applied his lightweight C++ unit inconsistency tool to 213 open-source systems, finding inconsistencies in 11% of them. It must be noted that these figures are gleaned from post development studies and are not representative of a quantity adhering software discipline. Thus, it seems important to ensure UoM information existing in software models is supported in derived implementations.

Advertisement

3. Validating quantities

Performing calculations in relation to quantities, dimensions and units is subtle and can easily lead to mistakes. We shall begin by looking at a very simple language of declarations and assignments so that we can untangle the various aspects involved in managing quantities correctly. A program will consist of a sequence of UoM variable declarations, uv, followed by a sequence of assignment statements, ustmt. Unit arithmetic expressions, uexp, impose syntactic restrictions so that their soundness can be inferred using the algebra of quantities.

ustmtuvuexpuexpuvuexp1+uexp2r*uexpuexp1*uexp2

By creating a separate syntax for unit expressions we can distinguish between scalar values, such as r, and unitless quantities in which all the dimensions are zero, such as moisture content. Consider a simple program to calculate Newton’s second law of motion:

begin

f : float;

 m : float of 5.7;

 a : float of 3.2;

 …

 f : = m * a

end

We can use the evaluate function, E of Figure 1, with an environment consisting of values for m and a to calculate f:

Figure 1.

Rules for evaluating expressions.

Em*am5.7a3.2=Emm5.7a3.2×Eam5.7a3.2=m5.7a3.2m×m5.7a3.2a=5.7×3.2

As is the case in nearly all programming languages, users have to assume that the mass (m) is given in kilograms, and the acceleration (a) is given in metres per second per second for the assignment to be correct. The reminder of this section explores the various aspects involved in handling quantities correctly, and how these aspects can be automated.

3.1 Dimensions

A dimensional analysis needs to ensure that (1) two physical quantities can only be equated if they have the same dimensions; (2) two physical quantities can only be added if they have the same dimensions (known as the Principle of Dimensional Homogeneity); (3) the dimensions of the multiplication of two quantities is given by the addition of the dimensions of the two quantities. If we only consider the three common dimensions of length, mass and time then we can capture the rules for addition and multiplication.

l1m1t1+̂l2m2t2=l1m1t1,ifl1=l2m1=m2t1=t2l1m1t1×̂l2m2t2=l1+l2m1+m2t1+t2

This allows us to rewrite the rules of Figure 1 by replacing the addition and multiplication operators to create a dimensional checker, shown in Figure 2. Scalar multiplication does not affect the dimensions of a quantity. The dimension of mass, m, is described as (0,1,0), while acceleration is length × time−2, or (1,0,-2) as a tuple. Our dimensional checker will compute with dimensions and attempt to ensure all assignments are correct. Consider our example program:

Figure 2.

Dimensional analysis rules for expressions.

begin

f : float of (1,1,-2);

 m : float of (0,1,0);

 a : float of (1,0,-2);

 …

 f : = m * a

end

Checking would proceed as follows:

DEm*am0,1,0a1,0,2=DEmm0,1,0a1,0,2×̂DEam0,1,0a1,0,2=m0,1,0a1,0,2m×̂m0,1,0a1,0,2a=0,1,0×̂1,0,2=1,1,2

and as this dimension matches f we know that the assignment is correct. Most UoM checkers adopt this approach, extending the checking into the statements and function calls of typical programming language constructs. For instance, all branches of conditionals and case statements must have the same dimensions, while comparison operators can only operate on quantities of the same dimension. If the dimensions of all variables are known at compile-time then this process can be undertaken before the program runs.

3.2 Kinds of quantities

Two values that share the same UoM might not represent the same kinds of quantities (KOQ) [17]. For example, torque is a rotational force which causes an object to rotate about an axis while work is the result of a force acting over some distance. Surface tension can be described as newtons per meter or kilogram per second squared, and even though they equate, they represent different quantities.

We have recently developed a simple set of rules for arithmetic and function calls that allow quantities to be named and handled safely [18]. This is not as straightforward as preserving the names of quantities throughout the program text. Multiplication will generate a new quantity so it is very likely that information is lost in intermediate stages of a calculation. Moreover, not all quantity variables in a program will have a name such as Torque or Work. Some might denote an entity such as length that could be in metres or yards, while another might be a variable used to store some temporary value. Neither of these need to be named. Using an algebraic data type, we define named quantities as:

type quantname=Named of string|No name

We can now define the rules for adding and multiplying named quantities. In both cases we assume that the unit expression is dimensionally correct, our concern is to define how named quantities conduct themselves. The operator takes two named quantities and states the conditions under which they can be summed: two named quantities can be added together only if they represent the same entity, if one quantity is named but the other is not then it is necessary for the result to be named, and if both are unnamed then the result will be too:

Namedn1Namedn2=Namedn1,ifn1=n2NamednNoname=NamednNonameNamedn=NamednNonameNoname=Noname

Our comparison rules cast upwards from Noname to Named, so as to assume a named quantity whenever possible. This is required to ensure named quantities behave correctly. For multiplication the rules are simpler. The operator Δ takes in two named quantities and defines how they behave over the multiplication operator. As multiplication sums the dimensions of the two operands, the value will be different to either and so the result will always be _Δ_ = Noname. The rules for named quantity analysis of expressions are given in Figure 3.

Figure 3.

Named quantity rules for unit expression.

Multiplication will generate a new quantity so it is very likely that information is lost in intermediate stages of a calculation. We propose that functions, whose return KOQ are known, are used to regain information when calculations use multiplication. In this manner a discipline of programming with quantities is suggested Nonetheless, we can exemplify how our approach behaves on a simple incorrect assignment where we try to add a value of torque to one of work:

begin

 sumt : float of Named Torque;

 t : float of Named Torque;

 w : float of Named of Work;

 …

 sumt : = t + w

end

Checking would start as follows:

NEt+wtNamedTorquewNamedWork=NEttNamedTorquewNamedWorkNEwtNamedTorquewNamedWork=tNamedTorquewNamedWork)ttNamedTorquewNamedWorkw=NamedTorqueNamedWork

and as we cannot proceed, we deduce that the assignment is unsafe. None of the current methodologies for managing quantities perform this kind of quantity checking.

3.3 Systems of Units

Dimension analysis would be sufficient if only one unit system, such as the SI system, was required. In such cases the base units of metre, gram and second could be implicit in implementations. There are, however, two further complications. A system of units provides a set of base units but they can be extended with prefixes. For instance, the SI system allows lengths to be described in terms of the base unit metre and prefixes, such as kilo- and centi-. While the Imperial system has the base unit feet on which an inch and a yard are constructed. Consequently we need to consider normalising values within a system of units and allow dimensionally faithful expressions to evaluate correctly with different units of measurement, as shown in Figure 4.

Figure 4.

Conversion within and in-between systems of units.

3.3.1 Normalising within a system of units

We can extend our tuple for dimensions to include prefixes. They are used to avoid very large or very small numeric values. Speed is typically shown in kilometres per hour, which we could represent in our 3-tuple as ((kilometre,1),(gram,0),(hour,-1)). This approach simplifies normalisation. A kilometre can be converted into metres by multiplying the value by a 1000, and an hour is 3600 seconds. Consider the dimensionally correct assignment below where the mass, m, is given in grams:

begin

 f : float of ((metre,1),(kilogram,1),(second,-2));

 m : float of ((metre,0),(gram,1),(second,0));

 a : float of ((metre,1),(gram,0),(second,-2));

 …

 f : = m * a

end

A UoM analysis would reveal that force, f, is expected in Newtons so the value for m needs to be divided by 1000, f:= (m/1000)*a. It is straightforward to normalise within a given system of units by aligning the prefixes. However, we also need to take the exponent into account. For example, the density of iron is 7.86 g/cm3, and can be converted into kg/m3 as follows:

7.86×Cgramkilogram×Ccentimetremetre3=7.86×0.001×0.013=7860kg/m3

where we assume the existence of a function C that takes in two prefixes and returns the conversion factor.

3.3.2 Converting between two Systems of Units

A second issue is that of converting in-between two systems of units. This can be achieved by extending the function C to convert between known systems, such as from yards to metres and visa-versa. Consider a similar example to the Mars climate orbiter error, here we want to calculate momentum, where p is mass times velocity, in Newton seconds, or m·Kg·s-1. However our mass is in pounds:

begin

p : float of ((metre,1),(kilogram,1),(second,-1));

 m : float of ((metre,0),(pound,1),(second,0));

 v : float of ((metre,1),(gram,0),(second,-1));

 p : = m * v

end

The function conversion, Cpoundkilogram, yields 0.45 so we need to modify the assignment so that the result will be correct, namely p := 0.45*m*v.

A naive unit conversion algorithm will convert the right hand side’s units to the left hand side’s for each sub-expression when performing addition or multiplication, but this is rarely the most efficient. The algorithm described by Cooper [14] is applied at compile-time. It choose the least number of conversions by generating all possible valid UoM conversions from a given expression, and selecting the one with the fewest conversions. Finally, many non-SI units continue to be used in the scientific, technical, and commercial literature. Some units are deeply rooted in history and culture. We use the term week rather than 168 hours. These also need to be supported through conversion functions.

3.4 Levels of measurement

A further aspect is that of the operations that might be applicable to a given quantity. To a physicist or applied mathematician, it is taken for granted that a quantity is used in the same manner as a unit-independent value, and that all arithmetic and comparative operators can be applied to it. However this is often not the case. For instance, what operations apply to a person’s IQ? It does not makes sense to multiply scales of intelligence or personality traits.

Stevens [19] identified four categories of scale that places limits on the type of measurement that can be used to construct valid terms:

  1. Nominal Scales represents the most unrestricted assignment of numerals. The only operations that applies to values in a nominal scale is that of equality and inequality, essentially numerals are only used as labels.

  2. Ordinal Scales permits rank ordering. The classic example of an ordinal scale is the scale of hardness of minerals. Most of the scales used by psychologists are ordinal scales. Operations such as greater-than and less-than are also applicable to ordinal scales. In the strictest sense, statistical operations involving means and standard deviations should not to be used with these scales as they imply a knowledge of something more than the rank order of data. This is due to the successive intervals on the scale being unequal in size but the ‘illegal’ application of statistical functions is often very useful, such as the average IQ of a certain population.

  3. Interval Scales are what we would consider quantitative and allow addition and subtraction, so all the usual statistical measures are applicable. The zero point on an interval scale is a matter of convention. Centigrade and Fahrenheit both represent volumes of expansion, with an arbitrary zero for each scale, such that a numerical value on one scale can be converted into a value on the other.

  4. Ratio Scales are most commonly encountered in physics and include the previous three relations, along with multiplication and division. An absolute zero is always implied.

Currently no system employs levels of measurement checking, such that only meaningful operations are applied to certain quantities. We suspect that this is due to even less awareness of scale levels and an assumption that quantity checking is only necessary for ratio scales. Hall [20] has developed a UML class diagram that captures the inheritance relationship between the four levels, enabling operations to be restricted to the most general of a pair of operands. As discussed with the ordinal scale, there is a pragmatic context that needs to be considered when using levels of measurement.

3.5 Summary of quantity checking

Through the use of an illustrative expression language we have shown how to ensure programs correctly manipulate the dimensions, the named quantities and the UoM of values. In doing so we have raised two important issues relating to the implementation of quantities: at what point checking and conversions can occur, and how extensive the coverage will be. If all UoM variables are annotated, or their annotations inferred, then both dimension checking and unit conversions can be undertaken by the compiler. This means that programs with UoM errors will be detected early, before the system is put in place, creating a strongly UoM typed language. Moreover, the code can be optimised so as to have the least number of conversions, reducing rounding errors, and increasing the accuracy of its calculations. The technique can be extended to include assertions on allowable unit conversions. If UoM are only known at run-time, or their design is embedded within the host language, then dimension checking and unit conversions will be undertaken at run-time. Programs will still manipulate UoM correctly but with a performance penalty and errors will only be detected once running. One must also consider the annotation burden, [21] found subjects choose a correct UoM annotation only 51% of the time and take an average of 136 seconds to make a single correct annotation.

Advertisement

4. Implementing quantities

Many constructs in Software Models can be directly translated into code. A UML class diagram can be used to build the class structure of an object oriented implementation. The situation for UoM annotations is more complex. UoM values are neither primitive nor reference types in modern object oriented terminology. As described in Section 3, they require an advanced checker to ensure variables and method calls are handled soundly by the compiler.

In this Section we shall look at the four practical methods that support unit checking of code basis. All implementation options are affected by the following three concerns [13]: lack of awareness, cumbersome implementations and lack of support from the given software eco-system. Software rarely lives in a vacuum so even if it has been designed and developed with one of these methods, associated components are unlikely to support UoM, such as legacy libraries, databases and spreadsheets. Nonetheless as sources of data are migrated to quantity aware formats, the software must be able to follow suit.

4.1 Native language support

Adding unit checking to conventional imperative, object-oriented and even functional languages using syntactic sugaring is beyond the algorithmic scope of their underlying type checkers. The pioneering foundational work of Wand and O’Keefe [22] demonstrated how the simply-typed lambda calculus could be extended with dimensions. Milner’s polymorphic type inference algorithm [23] is symbolic, so in order to include UoM variable resolution one has to provide an equational solver based on the theory of Abelian groups [24]. A programming language with native support will include additional syntax for UoM and an enhanced static analyser that calculates or infers the validity of annotations. When UoM annotations are validated prior to compilation, errors can be detected early and generated arithmetic code can minimise the number of conversions, mitigating round-off errors.

The only language in the 20 most popular programming languages [25] that supports units of measurement is Apple’s Swift language [26]. Microsoft’s F# [27] is a general purpose functional language with a full implementation of UoM, including unit variables that the static checker will seek to determine at compile-time [24]. This property permits valid UoM programmes to be written in which not all the quantity variables are annotated, thereby reducing the annotation burden and allowing greater reusability. However, large software projects rarely use either Swift or F#.

4.1.1 C++ boost library

C++ is still very popular [25] and has a de facto UoM extension that exploits the template meta-programming feature2. Consequently BoostUnits is more than just a library as it supports a staged computation model, similar to MixGen [12], which has the benefit of providing a language extension while still supporting backwards compatibility. C++ is quite unique in terms of being a popular programming language that supports this adaptable compilation strategy. Dimensional analysis is treated as a generic compile-time meta-programming problem, and delivers features and performance comparable to native language support. As no run-time penalty is incurred, BoostUnits supports UoM checking in performance-critical code. However, the survey [28] found both usability and accuracy issues with the use of this library.

4.1.2 Unit of measurement validators

Creating a new compiler feature for an existing language is contentious, non-trivial and likely to become outmoded. An alternative static approach is to define UoM through comments or attributes and to build a tool that attempts to perform as much scrutiny as possible. Such a validator checks at compile-time for unit violations without adding a new syntax or changing the run-time behaviour of the code. Figure 5 shows an example of the Unit of Measurement Validator for C# [29]. The Osprey [30] system is a C front end that automatically checks for all potential quantity errors. UoM are annotated with a $ and are modelled as types, reducing dimensional analysis to type checking, with Gaussian elimination to resolve unspecified UoM variable exponents. Simiarly [31] allows one to express relationships between the units of function parameters and return values. Ensuring the validity of unit conversions can be specified. PUnits [32] is a Java front end, or pluggable type system, that has many additional features. It can be used in three modes: checking the correctness of a program, solving UoM type variables, and annotating a program with units. This last feature is the most novel and allows human inspection which is useful since having a valid typing does not guarantee that the inferred specification expresses design intent, and allows KOQs to be expressed. These approaches are lightweight and scalable but they need to be supported for users to feel that they are credible. Alas few of these tools have lifespans outside of their original research project.

Figure 5.

Example of unit of measurement validator for C#.

4.1.3 Domain-specific languages (DSLs) with quantities

DSLs are languages specialised to a particular application domain. They are often written in a mark-up language such as XML or JSON which facilitates their use with general purpose languages as they can be easily parsed. They might have originally been designed for the purpose of curation, such as CellML and SMBL where the intention was to build biological repositories of computational models. A model will include the constants, variables and equations denoting a particular biological system. Thus, translating them into a programming language with the aid of a differential equation solver means that they can be readily simulated [33]. If the DSL contains UoM declarations then separately analysing source files can be undertaken before they are uploaded to a repository and translated into the run-time system [14]. Quantity checking coverage might be limited to the application domain but one can easily make the argument that this is where the vulnerabilities would lie.

4.2 Static or dynamic library support

Adding a feature to an existing programming language is usually undertaken by developing a library which implements the desired behaviour. It is written in terms of the language, using advanced abstraction methods such as classes and generics, and delivers a well-defined interface. There are many UoM libraries for all modern programming languages [34]. The basic idea behind dimensional analysis, as shown in Section 3.1, is relatively straightforward and familiar to scientific coders. However, “making a physical quantity library is easy but making a good one is hard” [35]. The reasons behind this lie in the subtleties of implementation: providing non-standard UoM, offering many new operators, supporting conversions, creating helpful error messages, while being efficient. These aspects are far harder to code and maintain than a faithful implementation of the Quantity pattern [9, 36].

Experience has shown that quantity libraries add an extra layer to an existing application through boilerplate code that is rarely idiomatic to the host language. The Quantity pattern ensures that values are hard-wired to have a single floating point representation, requiring further conversions to other internal formats and accuracy issues. Poor error messages are also a frequent complaint. Certain modern languages, such as Ruby, provide a special syntax for adding features to the language which exploit duck typing, enabling lightweight libraries to be built. The main conclusion is that for adoption to occur, UoM must be included in such a way that they are almost as easy to use as standard arithmetic types [13].

The standard technique for implementing the Quantity pattern is through exploiting overriding, making UoM checking a run-time activity. However UoM can be implemented through static overloading, or Java generic instantiations, ensuring compile-time checking. An example of both styles is shown in [13]. Merging the two techniques would double the amount of notation required and significantly add to the burden of adoption. Languages such as Python or Ruby are dynamically typed by definition, so quantity checking will occur at run-time regardless of how UoM are defined. Run-time support is a key reason why there are so many libraries for these two languages when typically one would assume quantities require high-performance executables. From a user survey of UoM libraries: “In our product line, our users may very well have one file whose units are ‘kgm3’, another whose units are ‘g_cc’and a third whose units are ‘degrees Celsius’. We therefore need to be able to operate on units at run-time, not compile-time” [28].

An additional limitation of libraries versus a native language or a pluggable type solution is that variables of a Quantity class can be reassigned at run-time. The 7-tuple that represents dimensions can be modified such that a kilometre could become a newton. Dimensional homogeneity and conversion errors can be caught by a library implementation but to avoid such programming style errors necessitates discipline. A dimensionally aware static checker ensures that all instances of a quantity declaration have the same dimensions. Errors caused by this violation were found to account for 75% of inconsistencies in the study of 5.9 M lines of code [37].

Implementing UoM through a data type or a class requires values to be boxed at run-time, incurring both speed and memory penalties. Native language solutions can perform the checking at compile-time so that generated executables contain no further UoM annotations and values are represented by primitive types. For scientific applications that perform many calculations, such as matrix multiplications, this unnecessary performance overhead is unacceptable. A UoM library might seem attractive and undemanding initially, but the subtle burden that they inflict will often increase the complexity of a project.

4.3 Component or Interface description support

Encapsulating implementation details, interfaces are a collection of the externally visible entities and function signatures of a component. They are used by the compiler to ensure access is handled correctly. Libraries, native language support and pluggable type systems usually require all quantity variable and function declarations to have UoM annotations, while component or interface based approaches only require certain functions to be annotated. This drastically reduces the annotation burden, supports legacy code but at the expense of robustness.

A component based approach seeks to add UoM information to the interface in order to enforce unit consistency when composing components and thereby reduce dimensional mismatch errors. There is some anecdotal evidence in the many quotes of [28] to support this approach. Damevski [38] hypothesises that UoM libraries are too restricting by requiring complete coverage, incurring an annotation or migration burden. His technique performs dimensional analysis on component interfaces at run-time, and if the calling parameters are compatible with the arguments dimensions then unit conversions occur. Consider the C++ class Earth [38]:

class Earth {.

  void setCircumference(in Metre circumference);

  Metre getCircumference();

}

It assigns and queries the earth’s circumference using Metre internally but can be called with Kilometre and the return value bound to a variable of, say, type Mile. Unlike libraries, within the class Earth no further annotations are required, nor will internal declarations be checked. This is a dynamic component based approach, units are converted at run-time.

Another lightweight methodology was presented by Ore [16] which performs an initial pass to build a table mapping attributes in C++ shared libraries to UoM. The shared libraries have been specifically enhanced to include UoM annotations. The table is used to disseminate UoM information into a source program and detect errors at compile-time. The algorithm successfully exploits dimensional analysis rules for arithmetic operators within components [39]. This is an example of a static component based approach, and through this process of static UoM propagation checking, manages to perform a greater coverage than Damevski’s run-time method.

A component based discipline means that the consequences of local unit mistakes are underestimated. The analysis of local assignment expressions will not occur in Damevski’s scheme and will be limited in Ore’s. On the other hand, checking at the component level allows diverse teams to collaborate even if their domain specific environments or choice of quantity systems were, to some extent, dissimilar. More importantly, either a static or dynamic component implementation would have been effective at correcting the Mars Climate Orbiter error.

4.4 Black-box testing

The last method that we will look at for checking quantities are managed correctly is known as black box automated testing. This method is usually applied to detect incorrect or missing methods, initialisation errors, interface errors and errors in data structures. By extrapolating use cases from the requirements specification to create unit tests, which are then systematically applied to the program, errors can be discovered before the code is put into operation. In the case of UoM tests, the resulting testing will not be comprehensive. Unit testing will focus on the correct initialisation of variables, the UoM correctness of assignments and method calls. This approach is also considered lightweight as no annotations are required in the program, however creating sufficient unit tests by hand is tedious. Efforts to generate unit tests from UML descriptions automatically [40, 41, 42, 43], either through behavioural diagrams or with rule based approaches, are seen to be costly and non-trivial in practice [44].

Modern agile software development practices rely heavily on manually developing tests for enabling refactoring but also to support test driven development (TTD). Unit testing for UoM errors does not require any extra tool support, and will not alter any other parts of the system as shown in Figure 6, where we show the testing required for a simple UoM based addition function. If t is true then both arguments are in kilometres, alternatively the second argument is in miles and converted accordingly. The two test cases capture this intention. Many developers would argue that time spent becoming familiar with a UoM library, and updating their programs accordingly, might be better spend writing unit tests: “I could use the same time to write tests and that would really find and prevent errors and at the same time not introduce a crazy complicated library every other developer in my team would have to deal with.” [28]. However, the UoM knowledge will be localised to each particular unit so the slight implementation cost comes at the expense of potentially average checking.

Figure 6.

Java code and JUnit test case for simple addition of two kilometres, or kilometre and mile distances [45].

Advertisement

5. Conclusions

Recent initiatives such as the FAIR data principles [46] emphasis machine-actionability of scientific data. With greater interoperability, industrial use of computational simulations and penetration of digitalisation through cyber-physical systems; there is an urgency to faithfully represent key properties of physical systems in code bases [1, 47].

We have endeavoured to provide the necessary background for software users to choose the most appropriate method for enabling quantity checking in code bases. Alas native language support is not available for popular programming languages. This situation is unlikely to change as it would require new language definitions and expensive compiler rewrites, with an important criteria of ensuring backwards compatibility with existing code. Validators solve some of these issues but require assurances that tool support will be maintained. DSL checkers are very effective for their given domain but lack generality. It is clear that even the best libraries currently cause significant performance issues while not being relevant for most developers. However some of the dynamic libraries include a lightweight syntax that makes UoM annotations significantly easier. A strength of component based techniques is that they can be undertaken at either compile-time or run-time with little overhead. They cede UoM checking completeness for a low annotation burden and ease of adoption. Approaches based on manual black-box testing frameworks offer many of the benefits of static component based techniques without requiring extra syntax. The drawback is that the UoM information will be implanted within the unit tests and not as a form of documentation within programs.

Annotating quantities in code bases is costly for developers [21] but relatively durable to software evolution. Refactoring does not change the external behaviour of the software, it will rarely require quantity annotation modifications unless there are changes to the core data structures. Techniques that manage UoM at compile-time, such as native language support or static lightweight solutions, allow unit conversions to be undertaken before code is created and values to be represented in unboxed form, thus resulting in better accuracy and efficiency.

The pros and cons of UoM techniques are listed in Table 1 using features that are relevant to users. Native language or pluggable type based techniques offer many benefits, such as an equational UoM checker that is capable of resolving UoM type variables, and unit conversion optimisation to improve run-time behaviour. Static component and black box based testing provide some of the benefits of static UoM libraries with less coverage but greater versatility. This is in tune with contemporary software development methods that favours lightweight techniques which integrate into existing digital platforms.

TechniqueProgramming ease of useExecution speedNumeric accuracyEase of integrationUnit error detection
Native SupportHighVery HighExcellentLowVery High
ValidatorHighHighVery GoodAverageAverage
DSLLowHighVery GoodLowAverage
Static LibraryLowAverageGoodLowHigh
Dynamic LibraryAverageLowGoodLowHigh
Static ComponentHighHighVery GoodHighAverage
Dynamic ComponentHighAverageGoodHighLow
Black Box TestingAverageHighVery GoodVery HighAverage

Table 1.

Contrasting alternative methods of implementing units of measurement in software projects, extended from [45].

Software users who require a degree of robustness have had little say over how much validation was undertaken on their code to ensure UoM are handled correctly. This chapter aims to inform users and developers of the various options that exist to check that quantities are handled properly, along with the implications of these choices with regards to their software eco-system. The needs of a reactive and responsive on-line application, with assorted UoM input, are very different to a fully quantity specified stand-alone safety critical application. Nonetheless software-intensive systems are prevalent in our daily lives, with complex functionality and strong interconnection. The need to ensure quantity values behave correctly are greater than ever before.

References

  1. 1. Hanisch R et al. Stop squandering data: Make units of measurement machine-readable. Nature. 2022;605:222-224
  2. 2. Arthur Stephenson, Lia LaPiana, Daniel Mulville, Frank Bauer Peter Rutledge, David Folta, Greg Dukeman, Robert Sackheim, and Peter Norvig. Mars Climate Orbiter Mishap Investigation Board Phase 1 Report, 1999. NASA Headquarters, Washington D.C., USA: NASA Press Release; [Last Accessed: May 19, 2022]
  3. 3. James Clerk Maxwell. A Treatise on Electricity and Magnetism [Microform] / by James Clerk Maxwell. Oxford: Clarendon Press; 1873
  4. 4. NIST. International System of Units (SI): Base and Derived. 2015. [Last Accessed: May 19, 2022]
  5. 5. Karr M, Loveman DB. Incorporation of units into programming languages. Communications of the ACM. 1978;21(5):385-391
  6. 6. Gehani N. Units of measure as a data attribute. Computer Languages. 1977;2(3):93-111
  7. 7. Dreiheller A, Mohr B, Moerschbacher M. Programming pascal with physical units. SIGPLAN Notes. 1986;21(12):114-123
  8. 8. Hilfinger PN. An Ada package for dimensional analysis. ACM Transactions on Programming Languages and Systems. 1988;10(2):189-203
  9. 9. Fowler M. Analysis Patterns: Reusable Objects Models. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.; 1997
  10. 10. McKeever S, Paçaci G, Bennich-Björkman O. Quantity checking through unit of measurement libraries, current status and future directions. In: Model-Driven Engineering and Software Development, MODELSWARD. Portugal: SciTePress; 2019
  11. 11. Mayerhofer T, Wimmer M, Vallecillo A. Adding uncertainty and units to quantity types in software models. In: Software Language Engineering, SLE 2016. NY, USA: ACM; 2016. pp. 118-131
  12. 12. Allen E, Chase D, Luchangco V, Maessen J-W, Steele GL Jr. Object-oriented units of measurement. In: Proceedings of Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA ‘04. NY, USA: ACM; 2004. pp. 384-403
  13. 13. McKeever S, Bennich-Björkman O, Salah O-A. Unit of measurement libraries, their popularity and suitability. Software: Practice and Experience. 2020;51(4):711-734
  14. 14. Cooper J, McKeever S. A model-driven approach to automatic conversion of physical units. Software: Practice and Experience. 2008;38(4):337-359
  15. 15. Antoniu T, Steckler PA, Krishnamurthi S, Neuwirth E, Felleisen M. Validating the unit correctness of spreadsheet programs. In: Proceedings of Software Engineering, ICSE ‘04. Washington, DC, USA: IEEE Computer Society; 2004. pp. 439-448
  16. 16. Ore J-P, Detweiler C, Elbaum S. Lightweight detection of physical unit inconsistencies without program annotations. In: Proceedings of International Symposium on Software Testing and Analysis, ISSTA 2017. NY, USA: ACM; 2017. pp. 341-351
  17. 17. Marcus Foster and Sean Tregeagle. Physical-Type Correctness in Scientific Python, CoRR, arXiv; 2018
  18. 18. McKeever S. Discerning quantities from units of measurement. In: Proceedings of the 10th International Conference on Model-Driven Engineering and Software Development - MODELSWARD. Portugal: INSTICC, SciTePress; 2022. pp. 105-115
  19. 19. Stevens SS. On the theory of scales of measurement. Science. 1946;103(2684):677-680
  20. 20. Hall B. The problem with ‘dimensionless quantities’. In: Proceedings of the 10th International Conference on Model-Driven Engineering and Software Development - MODELSWARD. Portugal: INSTICC, SciTePress; 2022. pp. 116-125
  21. 21. Ore J-P, Elbaum S, Detweiler C, Karkazis L. Assessing the type annotation burden. In: Automated Software Engineering, ASE 2018. NY, USA: ACM; 2018. pp. 190-201
  22. 22. Wand M, O’Keefe P. Automatic dimensional inference. In: Computational Logic - Essays in Honor of Alan Robinson. Cambridge Massachusetts, USA: MIT Press; 1991. pp. 479-483
  23. 23. Milner R. A theory of type polymorphism in programming. Journal of Computer and System Sciences. 1978;17:348-375
  24. 24. Kennedy A. Dimension types. In: Sannella D, editor. Programming Languages and Systems—ESOP’94. Vol. 788. Edinburgh, U.K.: Springer; 1994. pp. 348-362
  25. 25. TIOBE. The Importance of Being Earnest Index. 2022. Available from: https://www.tiobe.com/tiobe-index/ [Last Accessed: July 6]
  26. 26. Apple. Swift open source. 2022. [Last Accessed: May 19, 2022]
  27. 27. Microsoft. F# software foundation, 2020. [Last Accessed: May 19, 2022]
  28. 28. Salah O-A, McKeever S. Lack of adoption of units of measurement libraries: Survey and anecdotes. In: Proceedings of Software Engineering in Practice, ICSE-SEIP ‘20. NY, USA: ACM; 2020
  29. 29. Dieterichs Henning. Units of Measurement Validator for C#. [Last Accessed: May 19 2022]
  30. 30. Jiang L, Zhendong S. Osprey: A practical type system for validating dimensional unit correctness of C programs. In: Proceedings of the 28th International Conference on Software Engineering, ICSE ‘06. New York, NY, USA: ACM; 2006. pp. 262-271
  31. 31. Hills M, Feng C, Grigore R. A rewriting logic approach to static checking of units of measurement in C. Electronic Notes in Theoretical Computer Science. 2012;290:51-67
  32. 32. Xiang T, Luo JY, Dietl W. Precise inference of expressive units of measurement types. Proceedings of the ACM on Programming Languages. 2020;4:1-28. (OOPSLA)
  33. 33. Garny A, Nickerson D, Cooper J, dos Santos RW, Miller A, McKeever S, et al. Cellml and associated tools and techniques. Philosophical Transactions of the Royal Society, A: Mathematical, Physical and Engineering Sciences. 2008;366:3017-3043
  34. 34. Bennich-Björkman O, McKeever S. The Next 700 Unit of Measurement Checkers. In Proceedings of Software Language Engineering, SLE 2018. NY, USA: Association for Computing Machinery; 2018. pp. 121-132
  35. 35. Bekolay T. A comprehensive look at representing physical quantities in python. Scientific Computing with Python. Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering. 2013. pp. 121-132
  36. 36. Krisper M, Iber J, Rauter T, Kreiner C. Physical quantity: Towards a pattern language for quantities and units in physical calculations. In: Proceedings of Pattern Languages of Programs, EuroPLoP ‘17. NY, USA: ACM; 2017. p. 9:1–9:20
  37. 37. Ore J-P, Elbaum S, Detweiler C. Dimensional inconsistencies in code and ROS messages: A study of 5.9m lines of code. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Manhattan, NY, USA: IEEE; 2017. pp. 712-718
  38. 38. Damevski K. Expressing measurement units in interfaces for scientific component software. In: Proceedings of Component-Based High Performance Computing, CBHPC ‘09. NY, USA: ACM; 2009. p. 13:1–13:8
  39. 39. Ore J-P, Detweiler C, Elbaum S. Phriky-units: A lightweight, annotation-free physical unit inconsistency detection tool. In: Software Testing and Analysis, ISSTA 2017. NY, USA: Association for Computing Machinery; 2017. pp. 352-355
  40. 40. Alessandra Cavarra, Charles Crichton, Jim Davies, Alan Hartman, Thierry Jeron, and Laurent Mounier. Using UML for Automatic Test Generation. Proceedings of ISSTA. NY, USA: ACM SIGSOFT International Symposium on Software Testing and Analysis; 2002
  41. 41. Hartmann J, Vieira M, Foster H, Ruder A. A UML-based approach to system testing. Innovations in Systems and Software Engineering. 2005;1:12-24
  42. 42. Ali S, Hemmati H, Holt NE, Arisholm E, Briand LC. Model transformations as a strategy to automate model-based testing-a tool and industrial case studies. In: Simula Research Laboratory, Technical Report (2010-01). Norway: Technical Report, Simula Research Laboratory and University of Oslo; 2010. pp. 1-28
  43. 43. Mussa M, Ouchani S, Sammane W, Hamou-Lhadj A. A Survey of Model-Driven Testing Techniques. Proceedings - International Conference on Quality Software. QSIC. 2009. Manhattan, NY, USA: IEEE; pp. 167-172
  44. 44. Kasurinen J, Taipale O, Smolander K. Software test automation in practice: Empirical observations. Advances in Software Engineering. 2010;2010:01
  45. 45. McKeever S. From quantities in software models to implementation. In: Proceedings of the 9th International Conference on Model-Driven Engineering and Software Development - MODELSWARD. Portugal: INSTICC, SciTePres; 2021. pp. 199-206
  46. 46. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The fair guiding principles for scientific data management and stewardship. Scientific Data. 2016;3:1-9
  47. 47. Selic B. Beyond mere logic: A vision of modeling languages for the 21st century. In: Pervasive and Embedded Computing and Communication Systems (PECCS). Portugal: SciTePress; 2015. p. IS09

Notes

  • https://sysml.org
  • https://github.com/boostorg/units

Written By

Steve McKeever

Reviewed: 12 September 2022 Published: 25 October 2022