Calibration results for a single calibration using all of the points.
Ultrasound (US) imaging is one of the modalities commonly used for interventional guidance. It offers many advantages over other imaging systems such as MRI, CT, and fluoroscopy in that it is considerably cheaper, is portable, offers real-time imaging, and has no radiation exposure. Recent developments have led to the advancement of 3D US technology, making it a more feasible alternative to 2D US imaging, despite its higher cost. Among 3D US technology includes real-time 3D transesophageal echocardiography (TEE) enabling US imaging of the heart from within the body. Unfortunately, there are many disadvantages of US that must be addressed. Due to the presence of speckle, the image quality is very poor, especially away from the focal zone. The imaging field-of-view (FOV) is also quite limited, due to the size of the US transducer. These disadvantages significantly limit the usability of US for many interventional procedures.
To make US a more feasible interventional tool, it should be fused with information from another modality or a tracking system. The tracking system allows tools to be visualized in the US volume, especially helpful when imaging a needle or a catheter. Overlaying information from other modalities gives the clinician more information allowing them, for example, to see outside the limited FOV of the US. Often times, the fusion of multiple modalities is done using a tracking system as an intermediate link between them. Calibration of the tracking system to each individual modality is thus required. Calibration of a US probe is a well-studied problem with a variety of methods available to be used. Multiple types of systems can also be used for tracking, including optical and electromagnetic (EM) tracking. The methods of calibrating the US can generally be applied to any type of tracking system.
There are several papers that provide comprehensive reviews of the various methods for calibrating a tracking system to US imaging [8, 11]. These papers focus primarily on the calibration of 2D US systems, but some of these methods can be extended to 3D. Other reviews have been done specifically for the calibration of 3D US, which compare new methods made possible through the use of a 3D probe [5, 14].
Among methods for the calibration of a US probe, there are two commonly used categories of techniques: phantom-based and tracked stylus-based. There are numerous phantom-based calibration methods. Some of the more common types of phantom designs use “N-wire” fiducials [6, 13]. Other phantom designs consist of plane phantoms [15, 16], point phantoms , and cross-wire phantoms .
It is perhaps more beneficial to calibrate the US without using a phantom. First and foremost is the simplicity of the calibration using a readily available tracked stylus or needle. Also, manufacturing of a phantom may require additional cost and time to develop. Hsu et al.  provide an overview of stylus-based calibration methods. The simplest method of calibration using a tracked stylus or needle is to image the tip of the needle precisely in the scan plane of the US image. Given that the needle is tracked, the tip can be segmented from the US images and the collection of all positions can be registered using a point-based registration method [12, 19]. Zhang et al.  showed an accuracy of up to 3.5 mm using this method with only nine images for each calibration. In 2D, this method has a problem in that it is very difficult to precisely position the stylus tip in the US scan-plane. Even in 3D the image resolution and artifacts make it difficult to locate the needle tip position accurately. Poon and Rohling  performed a similar stylus calibration for a 3D US probe using five pairs of points. They showed an RMS accuracy of 2.36 mm. An optical tracking system was used which has a significantly better accuracy then EM tracking systems.
Another similar stylus-based method depends only on the intersection of the stylus with the imaging plane . The intersection of the US scan-plane with the needle will show a point that is easily segmented. The calibration transformation can be estimated using these measurements. Unfortunately, the calibration accuracy was not quantifiably analyzed here and was only performed for 2D. The biggest problem with this method is that the stylus must be imaged at a wide range of angles and positions.
In this paper we examine the problem of EM-to-TEE calibration for a 3D TEE probe by looking at two separate but similar solutions based off of prior 2D methods [10, 19]. We examine how these methods perform when extended to 3D using EM tracking. In addition, we further analyze the calibration errors using a statistical framework and also examine the effect of the speed of sound errors on the calibration.
The relationship between the TEE image volume and the attached EM sensor is shown in Figure 1. Two similar calibration methods are employed to calibrate the TEE probe and determine this relationship to the EMtracking. In both methods, the calibration is a multi-step process. The data acquisition step is first. Images of the needle from the TEE probe are acquired concurrently with the EM position information of the probe and needle sensors. Segmentation is then carried out to find the tip of the needle in the first method. In the second method, segmenting any point along the needle is sufficient. The calibration is finally performed either using a point-to-point or a point-to-line registration method.
2.1. Data acquisition
Ultrasound imaging was performed using an X7-2t TEE probe with a Philips iE33 echocardiography system. An EM sensor has been permanently mounted on the end of the probe (Figure 2). EM tracking was done using the Aurora Electromagnetic Measurement System (Northern Digital Inc, Waterloo, Canada). For each calibration experiment, a set of 30-50 volumes were captured, each containing the image of an EMtracked needle at a different orientation. It is important that these images are spread over a wide range of probe and needle positions to ensure the best accuracy of the calibration. Not doing so may introduce a bias to the calibration results. Additionally, all volumes are captured at a depth of 10 cm. TEE volumes are acquired in both the Live 3D (L3D) and Thick Slice (TS) imaging modes. L3D produces a volume size of 112 × 48 × 112 pixels or 91 × 47 × 100 mm in the width, height and depth directions. TS produces a volume size of 240 × 16 × 112 or 140 × 15 × 100 mm; almost twice the width as L3D with a very narrow thickness. Because of the narrow cross section, the TS volume can be thought of, in essence, as a 2D plane with some thickness. An overview of the acquisition system can be seen in Figure 3.
2.2. Needle segmentation
After the data is acquired, the needle must be segmented from the volumes in 3D. In both calibration methods the segmentation technique is the same. Needle segmentation is performed manually using three 2D maximum intensity projection images. Segmenting a needle point in two out of the three images is sufficient to identify the point in 3D. If the needle point is identified in all three views, the final 3D position will be the average position in all of the views.
The images were processed to enhance the needle’s appearance. Following the needle segmentation method described in Aboofazeli et al. , anisotropic diffusion  is first used for speckle reduction and smoothing with the subsequent application of an exponential contrast enhancement filter. The contrast adjustment filter has the form
In order to calibrate the TEE probe to the EM tracking system, four different measurements are required as inputs. The first input is the set of segmented needle points,
The purpose of calibration is to find the transformation between the TEE coordinate system and the TEE probe EM sensor coordinate system,
2.3.1. Calibration using segmented needle tips
If the needle tip position has been segmented from each TEE volume, the needle tip position can be determined relative to the EM coordinate system using
Since the points
2.3.2. Calibration using segmented needle points
The second calibration method is similar to that proposed by Khamene and Sauer  with an extension to 3D. In this method, any point on the needle may be segmented, not restricted to the needle tip as with the first method. However the calibration problem becomes more difficult to solve for. We must change our representation of the needle to that of a line, composed using two points. The first point,
We now have the problem of estimating the transformation that minimizes the distance between the TEE points and the line representing the needle. The general equation for the distance between a point,
where× is the cross product. The calibration transformation is estimated using a nonlinear least-squares optimization framework. The equation to be minimized is
Here we are minimizing the distance of each individual needle point segmented in the TEE volume and the line representing the needle relative to the probe EMsensor. The TEE point is transformed by the transformation
Posing the calibration problem in this way, using the point-to-line distance, unfortunately is very susceptible to the optimization being caught in local minimum. It requires a close initial estimate of the solution. An approximation to the solution can be estimated by solving for 3 as seen in the previous section. This solution works well as an initial guess for
2.4. Considerations for the speed of sound
For the best accuracy in calibration, careful consideration must be taken to account for the speed of sound (SOS) in water. The US machine assumes that the TEE probe is imaging human tissue, which has an approximate SOS of 1540 m/s. The SOS for water is different and depends on the water temperature. At room temperature (20°C), the SOS in pure water is approximately 1480 m/s . There is therefore a mismatch in the speed of sound when the EM tracked needle is imaged in water by the US. Although the water temperature can be adjusted such that the SOS matches approximately 1540 m/s, maintaining the water at room temperature is more practical, especially for the lengthy calibration experiments carried out here.
When performing a calibration experiment in water, the voxel scaling can simply be changed to account for the SOS. Given water at room temperature, the voxel size must be multiplied by 1480/1540 in each direction. A key point to consider regarding the SOS of water is that the calibration transformation for experiments performed in water is not valid for experiments performed on real tissues. The physical size of the water and tissue volumes are now different for the same depth setting. At an imaging depth of 10 cm there would be a physical size difference between tissue and water volumes of about 4 mm along the depth direction. The error incurred by using the incorrect calibration matrix would depend on the location of the volume origin (Figure 5).
To illustrate the effect that the speed of sound has on the calibration, lets look at how the volumes compare for different speed of sound values. We will look at a 2D case where the speed of sound of water is
In this figure, the calibration transformation in water that is calculated by scaling the voxels appropriately is seen in the interior square image of Figure 5(d). A volume acquired in real tissue will be larger requiring a different calibration transformation, as represented by the outer square image of Figure 5(d). Therefore, care must be taken to convert the calibration transformation when imaging in real tissues. This conversion can be done through a translation of the calibration transformation based on the origin of the volumes.
2.5. Statistical analysis
A statistical analysis can be performed on the data to remove outlying points and analyze the overall accuracy of the calibration. We wish to estimate the best points to include for a single calibration, as well as find the best calibration transformation with some indication of how well the calibration model fits. A single calibration using all available points can only provide a simple indication of the spread of the points. The analysis formed here includes outlier removal using RANSAC followed by a cross-validation analysis on the calibration using the remaining points. In this analysis, we will end up with a statistical view of the calibration accuracy using all available points.
A RANSAC framework is used for estimating outlying points . The RANSAC algorithm examines random sets of “minimal” combinations of points. Minimal is taken to mean the minimum number of points required to perform the calibration. Three points are technically required, however, we choose to use 5 points since the calibration method of Section 2.3.2 is not robust to using only 3 points. A variation of RANSAC was used for our calibration called MLESAC, where each random sample set is ranked based on a likelihood function rather than the number of points that fit the model .
Having eliminated outlying points, we attempt to find the best calibration model from the remaining set of points. A random sample validation framework is used as an analysis tool to choose the best calibration transformation. Within this framework, we also look at the convergence of the calibration error as more points are used. By examining the convergence of the error, we are able to have confidence that enough points have been acquired for a successful calibration.
For the measure of error, when a simple calibration is carried out, the fiducial registration error (FRE) is calculated for how well the EMand TEE points fit together. This error consists of either the RMS point-to-point distance or RMS point-to-line distance for all points used in the calibration. Additionally, the target registration error (TRE) measures how well the validation points fit the calibration model. These points were not used to calculate the calibration transformation, and therefore provide a better indication of the accuracy of the calibration.
To examine the convergence of the calibration error, 5 points are used initially, incrementing by one up to
Given multiple FRE, TRE, and calibration values at each incremental number of calibration points, the mean and standard deviations can be calculated and plotted. Example plots of this information is given in Figure 6. The final calibration transformation is taken to be the average over all of the trials above the convergence point. The convergence point is the point at which the TRE mean and standard deviation values become “constant”, when the differential values fall below a threshold (set to 0.01 mm for mean and 0.005 mm for standard deviation). A black vertical line represents the convergence point in Figure 6(a). If a convergence points cannot be found, the calibration cannot be reliably used. Also, provided enough points are acquired, we can estimate the minimum number of experimental points required for a successful calibration.
3. Results and discussion
The calibration methods were analyzed for two separate TEE probes using each imaging mode (L3D and TS). Both probes were analyzed using each calibration method. TS volumes were only calibrated using calibration method 2 since the volume is too narrow to image the tip of the needle precisely. The speed of sound during calibration was assumed to be 1480 m/s based on the approximate water temperature, and the TEE volumes were scaled appropriately. We take method 1 to be the method described in 2.3.1 and method 2 to be as described in 2.3.2.
3.1. Single calibration
We first examine the results of a single calibration. The calibration result is calculated after discarding 10% of the points having the highest FRE. Looking at the results, method 2 gives a lower FRE value than method 1. The reason for this is that in method 1, the error is based on the point-to-point error of the needle tip position. There is more variation in segmenting the specific needle tip in the TEE volume. As a result, the distance will be larger. Also, the point-to-line distance is orthogonal to the needle direction. As the needle tip distance may not be orthogonal to the needle itself, this error is expected to be larger.
It should also be noted that the calibration parameters are different between the two calibration methods. We cannot tell based on this table which one is better, but can attempt to understand why they are different. Although the translational terms are relatively similar, all within 1.5 mm of each other, the rotational terms have larger differences, up to about 4 degrees. Again, one main reason for these differences is the difficulty in segmenting the needle tip. Also, there may be a bias in the needle tip segmentation, since the low resolution of the TEE volume makes it nearly impossible to determine which point is the tip. Looking back to Figure 4, there is generally a significant volume at the needle tip. The tip is located somewhere within this area. It was taken to be the center of this area for this work, but it could in fact be closer to the end.
Looking at the thick slice mode, we see similar parameters to the Live 3D mode. In fact, only one parameter, the z-translation, is dissimilar between modes. Since the other 5 parameters are similar, we can guess that the TS volume is located at a similar orientation as the L3D volume, only translated along one axis. In this case, the differences in the 5 similar parameters would be due error in calibration. Higher errors are expected in the TS mode because it is essentially a calibration in 2 dimensions. Also, many of the needle positions are along a similar direction, orthogonal to the slice axis. Since many of the needle positions were in along this direction, there will be a high error localizing the plane along this direction.
3.2. Calibration statistical analysis
Next, we examine the statistical analysis of the calibrations. The analysis described in Section 2.5 was performed on the data acquired from each probe in each imaging mode (L3D and TS). The resulting FRE, TRE and parameters are listed in Table 2. Figure 7 shows the convergence of the FRE and TRE as more points are included in the calibrations. Note the X and Y-axis scaling in these images for comparisons. In this figure, the results are displayed from both calibration methods.
In Table 2, we see a lot of similarities to Table 1. There is a lower FRE in method 2 as compared to method 1. The listed parameters are also quite similar between the two tables, with translational differences all within 1.5 mm and rotational differences all within 4 degrees. The differences may be attributed to the fact that a different number of outliers were rejected by the RANSAC than the single calibration.
Much of the information for the statistical analysis is in the plots that are generated, as displayed in Figures 7. In Figure 7, the TRE plots are the most important. They show how well the error for points not included in the calibration converges. In L3D mode, each method performs similarly. With only a few points used, the standard deviation is much higher for method two, however, both methods converge to similar values. Also, both methods seem to converge in about 20-25 points. The convergence of the thick slice was much slower however, converging in about 35 points.
3.3. Needle position error
Assessing the accuracy of the TEE calibration must be done using points not included in the calibration. Since the resulting calibration from the statistical analysis includes all of the points, a new set of points must be acquired. A new set of 10 TEE images of an EM tracked needle at different positions were acquired with their corresponding EM measurements. The needle tip was then manually segmented and the average distance error between the needle tip in the TEE volume and the tip in EM after applying the calibration transformation was calculated. The average over all 10 points was 1.39±0.54 mm. This result was for Probe 2 using method 2. The error is reasonably close to the TRE seen in Table 2, aswould be expected.
3.4. Effect of speed of sound
To look at the effect of the speed of sound on the calibration, we perform a single calibration on all points using pixel scaling having an assumed speed of 1540 and 1480 m/s. The FRE values are given in Table 3. A more significant increase in error is seen for Probe 1 than Probe 2. One possible reason is that Probe 2 was calibrated with much less variation in needle position. All of the segmented points are in a small area, whereas in Probe 1, the points are spread out much more. Having the points more spread out means that the calibration will be more effected by scaling errors.
A framework for the calibration of a 3D TEE probe to an EM tracking system was examined. The calibration framework consists of three basic steps: data acquisition, needle segmentation and calibration. We were able to analyze the results of the calibration of two separate TEE probes. Finally, two different calibration techniques for calibration using a needle were explored. The first method requiring the tip of the needle being segmented from the TEE volume, while the second method allows any point along the needle to be segmented.
It is very difficult to analyze which calibration method is better. Method 1 shows a faster convergence, requiring fewer images to be captured. However, it is more difficult to segment the needle tip from the TEE volume. A large imaging artifact at the needle tip prevents it from being precisely localized in the TEE volume. The likelihood of introducing a bias in method 1 due to segmentation error is much greater than in method 2. Therefore, method 2 is preferred. It does however require that more images are acquired. Greater than 30 images are recommended for good convergence properties. The speed of sound clearly also has an effect on the calibration accuracy. FRE values were improved after applying a scaling adjustment indicating that the segmented points had a better fit.
One potential for future work is to automate the calibration process. The presented calibration methods are not well suited for acquisition of more then a few dozen images since the acquisition is performed one image at a time with the needle being manually segmented. Image acquisition can be made simpler by acquiring a continuous series of images while the user simply moves the needle around the needle field of view. A continuous data acquisition would lead to other problems however, such as a temporal delay between the US acquisition and the EM tracking data. The needle segmentation can be improved through the use of more sophisticated, automatic needle identification methods. There are several published needle segmentation algorithms . One way of segmenting the entire needle is to use a form of 3D Hough transform. If the needle can be automatically segmented, the calibration method described in 2.3.2 can be extended to a line-to-line registration method. More information extracted from these volumes can only improve the calibration consistency and accuracy.
The authors are sincerely grateful to Philips Research North America to provide the much needed support to conduct the research that has been presented here.