In this chapter, we will review the applications of augmented reality (AR) and virtual reality (VR) in diagnostic radiology. We begin the chapter by discussing state of the art medical imaging techniques to include scanner hardware, spatial resolution, and presentation methods to the radiologist or other medical professionals. We continue by discussing the methodology of a technique called Depth-3-Dimensional (D3D) imaging, which transforms cross-sectional medical imaging datasets into a left and right eye images that that can be viewed with state of the art AR or VR headsets. We include results of the D3D processed AR/VR imaging and conclude with a discussion of the path forward.
- augmented reality
- virtual reality
- diagnostic radiology
- depth perception
The purpose of this chapter is to review the applications of augmented reality (AR) and virtual reality (VR) in diagnostic radiology. This introduction section will provide a brief general summary of AR technologies, VR technologies, AR/VR applications in medicine and AR/VR applications diagnostic radiology. The remainder of the paper will discuss state-of-the-art medical imaging systems, current methods of viewing medical images, the imaging processing techniques for generating an AR/VR image, AR/VR imaging results and path forward.
1.1. Introduction to AR/VR
AR technologies can be classified into AR systems or mixed reality (MR) systems. In both the AR and MR systems, the user wears a head mounted display (HMD), which provides the simultaneous display of a virtual image and the scene of the real-world surroundings . The differences are that in AR, the virtual image is transparent like a hologram and in MR the virtual image appears solid. Examples of AR systems include the Meta and DAQRI systems. An example of an MR system is the Microsoft HoloLens. In AR systems, the user can view the virtual image and interact with the real-world scene.
VR technologies may be classified as fully immersive, semi-immersive or non-immersive . In fully immersive VR systems such as the Oculus Rift and the HTC Vive, the HMD displays a virtual image and the real-world surroundings are completely occluded from the user’s field of view . In semi-immersive VR systems such as the Samsung Gear VR, the HMD displays the virtual image, but the real-world scene is only partially occluded from the user’s field of view . In VR systems, the user can maneuver through the virtual world by head movements via HMD tracking or walking via external camera tracking systems. Additional ways that a user can interact with the virtual environment include voice gestures or through handheld devices with haptic feedback. The HMDs for the AR and VR systems display a unique image to each eye; therefore, stereoscopic imaging and depth perceptions is achieved. Both AR and VR are rapidly growing fields. Worldwide revenues for AR/VR were $5 billion in 2016, but are expected to increase to $162 billion by 2020 .
1.2. Applications of AR/VR in medicine
In 2015, the United States spent 17% of its gross domestic product on healthcare. 32% of the healthcare spending was during hospital stays . Nearly half of hospital costs are for surgical care . Therefore, approximately 3% of the gross domestic product includes costs related to surgery. There is a drive to improve efficiency in the operation room to both improve patient care and drive down costs. AR/VR holds promise in accomplishing these goals through improving pre-operative planning and enhancing intra-operative surgical procedures .
1.3. Applications of AR/VR in diagnostic radiology
In the United States, the total cost of diagnostic imaging has been estimated to be $100 billion in 2006 alone . Utilization rates of diagnostic imaging are on the rise [8, 9, 10]. Currently, there is no United States Food and Drug Administration (FDA) approved AR/VR system used in diagnostic radiology . AR/VR provides enhanced viewing including depth perception and improved human machine interface (HMI) [12, 13]. AR/VR HMDs provide unique images to each eye yielding depth perception. AR/VR systems leverage advanced gaming controllers and joysticks to improve HMI. Because of these features of AR/VR and others discussed later in this chapter, we believe there will be increasing applications of AR/VR in diagnostic radiology in the future.
2. Current state-of-the-art diagnostic medical imaging
Diagnostic radiology plays a major role in medicine as it provides precise anatomic and physiologic information to physicians enabling diagnosis of complex disease and monitoring response to treatment. This section will be organized into three subsections including medical imaging equipment, conventional imaging techniques and the advanced 3D rendering methods.
2.1. Medical imaging equipment
The field of diagnostic radiology includes a wide variety of imaging equipment including systems that generate inherently 2D images (e.g., chest radiograph) as well as systems that generate volumetric medical imaging datasets (e.g., computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET)). In this section, we will focus on the latter.
2.1.1. Computed tomography (CT)
In order to perform a CT scan (also known as a CAT scan), the patient is placed in the horizontal position on the CT scanner table. See Figure 1. The table is then translated through a donut shaped device containing both an X-ray tube and X-ray detector. Multiple projection images are acquired as the X-ray tube and detector assembly rotate around the patient.
Image reconstruction algorithms, such as filtered back projection, are performed in order to generate cross-sectional images in the axial plane (x-y plane). Since sequential, contiguous axial images can be obtained, coronal plane (x-z) and sagittal plane (y-z) plane images can be reconstructed. Data is stored in digital imaging and communication in medicine (DICOM) files with a typical matrix of a CT scan is 512 × 512 pixels. A pixel in the axial plane is a 2D object with a discrete length in the x-direction and discrete length in the y-direction. A voxel is a 3D object created with a pixel by adding a third dimension to create a volume.
Each pixel has an associated gray-scale value called a Hounsfield Unit (HU), which is a function of the density and composition of the tissue. As a reference, water has a HU of 0. Soft tissues (e.g., brain, kidney, muscle, etc.) are slightly denser than water and have a HU of approximately 30–40. Compact bone can have a density of 400. Fat is slightly less dense than water and has a HU of approximately −100. Air is significantly less dense than water and has a HU of −1000.
A process called “windowing and leveling” is performed by the radiologist to set the window “level” and window “width.” The window “level” refers to the HU number where mid-gray can represent. The window “width” is the range of shades of gray such that any value more extreme than the range is white (if more dense) or black (if less dense). See Figure 2.
Modern CT scans can be perform a total body head-to-toe scan in less than 15 seconds with spatial resolution of less than 1 mm. Thus, CT scans can be invaluable in the setting of trauma since it enables the radiologist to diagnose critical injuries anywhere in the body, such as traumatic brain injury . This provides critical information to the neurosurgeon who can then perform life-saving interventions.
2.1.2. Magnetic resonance imaging (MRI)
In order to perform an MRI scan, the patient is placed in the horizontal position on the MRI scanner table within a large cylindrical shaped device. A large magnetic field is directed through the long axis of the cylinder. Transmit coils direct a radiofrequency (RF) pulse into the patient’s body and receive coils process the returning electromagnetic signal from the body to create an image. Similar to CT, contiguous planar images can be stacked and axial, sagittal and coronal reformats can be reconstructed. The imaging data for a MRI scan is similar to that of a CT scan in matrix size and the fact that each pixel has a comparable gray scale.
Unlike CT scanners, MRI scanners do not employ ionizing radiation and are therefore ideal for young children or pregnant women who are more vulnerable to radiation. Furthermore, MRI scans have the ability to perform exceptional contrast resolution between tissues of similar density and can diagnose certain types of traumatic brain injury that cannot be seen on CT scans . Modern MRI scans require significantly more time to perform than CT scans, nearly one-hour of image acquisition time for a MRI scan of the brain.
2.1.3. Positron emission tomography (PET) scanner
In order to perform a PET scan, a radiopharmaceutical (e.g., fludeoxyglucose F-18) is administered to the patient. Then, the patient is placed in the horizontal position on the PET scanner table within a donut shaped device. As the radiopharmaceutical decays, photons are emitted from within the patient and are received by the PET detector crystals. As with CT and MRI, axial images can be stacked and sagittal and coronal images can be reconstructed.
A typical matrix of a PET scan is 128 × 128, which is smaller than that of CT or MRI. However, it is similar in that each pixel has an associated numerical value associated with it indicating gray scale. In the case of PET, the gray scale corresponds to the amount of radioactivity emitted from that location. Since the radiopharmaceuticals can target certain structures in the body, it is possible for improved diagnosis of conditions like Alzheimer’s disease compared with MRI .
2.2. Current methods of viewing medical imaging
In the previous section, we discussed three types of medical imaging scanners that generate volumetric data. In this section, we will review the current methods of viewing the volumetric data.
2.2.1. Conventional viewing of the volumetric data
The conventional viewing method for reviewing volumetric datasets is a slice-by-slice viewing method for axial, sagittal and coronal imaging planes or on occasion oblique reformats. It is estimated that most radiologists spend more than 95% of their total time on cross-sectional imaging datasets using this conventional slice-by-slice approach . Certain anatomical structures are easier to visualize on particular imaging planes. As an example, some radiologists have a preference for viewing the midline structures in the brain on the sagittal images, as shown in Figure 3.
Occasionally, the radiologist needs to view an abnormality in another plane, other than the axial, sagittal, coronal imaging planes. In these instances, oblique plane reformats can be used with the images still viewed in a conventional slice-by-slice approach. See Figure 4. Curved planar reformats can also be performed and have been shown to be beneficial .
A standard viewing method includes a flat screen, high-resolution diagnostic imaging monitor with keyboard and mouse, as shown in Figure 5. Typically, the radiologist will use the wheel on the back of the mouse to scroll through the stacks of images. Note that the radiologist also uses a microphone to dictate the radiology reports.
2.2.2. Challenges in conventional viewing of the volumetric data
First, is the challenge of information overload. The dramatic improvements in spatial resolution (commonly smaller than 1 mm) of CT and MRI coupled with large portions of the body imaged generate immense datasets and the radiologists face the challenge of information overload. As an example, a CT scan of the chest with an axial matrix of 512 pixels (x-direction) by 512 pixels (y-direction) would have 262,144 pixels on a single slice. Thin-cut imaging of the chest provides 500 axial slices, each containing the 262,144 pixels, or roughly 131 million pixels in the data set.
Second, is the challenge of detecting small lesions. One example of this is the challenge is the identification a small pulmonary nodule, which is a topic of great concern for radiologists and a top cause of litigation [19, 20]. Identifying a tumor at an small size and corresponding early stage is important in order to improve patient survival and reduce cost of treatment. A very deliberate slice-by-slice method takes considerable time.
Third, is the challenge of mentally building a 3D image from reviewing slices . Depending on the clinical scenario and body part imaged, the radiologist can be tasked with following certain twisting and turning structures through the body such as following a blood vessel or loops of intestines.
2.2.3. Advanced (non-AR/VR) viewing methods of the volumetric data
Most radiologists spend a small fraction (<5%) of their total time in interpreting their imaging scans with advanced viewing methods . Such non-AR/VR techniques include surface rendering and volume rendering.
184.108.40.206. Surface rendering
The first 3D rendering technique to display the human body’s anatomy was surface rendering (also known as shaded surface display). Through segmentation techniques such as thresholding to display only a prescribed set of pixels, apparent surfaces are displayed within the body. A virtual light source is used to provide surface shading. In surface rendering, only a single surface is used. An advantage of only displaying a single surface is the fact that surface rendering techniques are typically not limited by overlapping tissues within the human body. However, there are a few limitations.
One limitation of only displaying a single surface is the fact that thresholding is used for one tissue type at a time and it can be difficult to understand the anatomic relationship of multiple different organ systems when only a single organ system is displayed. Another limitation is the fact that many organs are of similar density to their surroundings and it can be difficult to segment these structures out. Finally, since surface rendering images have been displayed on flat screen images true depth perception is not achieved. See Figure 6.
220.127.116.11. Volume rendering
The technique of volume rendering has been researched for many years by the computer graphics industry and has recently been applied to diagnostic radiology . In volume rendering, a transfer function is applied to assign a color and opacity to each intensity value. As an example, voxels that correspond to the density of blood vessel are colored red and voxels that correspond to the density of bone are colored white. This has significantly helped radiologists visualize complex 3D structures . See Figure 7.
One of the key limitations of volume rendering is the overlapping structures [24, 25]. This limitation is significantly worse in settings such as viewing of the vasculature of the brain. See Figure 8.
3. Review of AR/VR in diagnostic radiology
In the first sections of this chapter, we have reviewed the medical imaging equipment, conventional slice-by-slice techniques and the advanced 3D rendering methods including surface rendering and volume rendering. We will now review AR/VR in diagnostic radiology.
The previously discussed limitation of overlapping structures can be minimized through a process called depth-3-dimensional (D3D) imaging by providing unique images to each eye and display on AR/VR HDUs. D3D transforms cross-sectional imaging datasets and displays them on AR/VR head display units (HDUs). In doing so, an overall immersive viewing experience is created. In AR/VR radiology, the radiologist will wear a head display unit, which can be either a VR, AR or MR system. The basic concept is outlined in Figure 9.
This section of the paper will be organized into three sections. First, we will discuss the process for generating an AR/VR image from a cross-sectional medical imaging dataset. Second, we will include results. Third, we will discuss the path forward and future opportunities of AR/VR in diagnostic radiology.
3.1. Image processing for AR/VR
To optimize visualization, the D3D software suite must provide the capability for visualizing medical imagery rendered in a true 3 dimensional representation [26, 27]. In order to accomplish this, input imagery is converted from 2 dimensional images into a 3 dimensional voxel space, segmented into distinct tissue types; and then filtered. Finally, rendering is then performed wherein the rendering engine computes a left and right view. This allows the operator to visualize the data using the same stereopsis that our eyes and brains have spent a lifetime interpreting and processing.
3.1.1. 3D segmentation
During the 3D segmentation process, each image pixel in the input imagery is treated as a voxel, a three dimensional entity with length, width, and height. Each voxel is read into a large 3D array. The 3D array is analyzed for similarity to its neighbors, and for context clues that imply similarity to historical training data.
Mean and variance statistics largely drive the similarity analysis. Every voxel is compared to all the neighboring voxels in the 3x3 (or 5x5 or larger) nearest neighbor region. The most similar voxels are assigned the same class designation. The 3x3 variance is used to determine the width of the local intensity distribution. The width of the local intensity distribution in turn determines the threshold applied for declaring similarity between voxels. Simple local features on the neighborhood are computed around each voxel to describe the local average intensity or texture surrounding a voxel, which provides a classifier algorithm with measurements that are used to distinguish one class from another, where classes correspond to the different tissue types.
Historical data from numerous similar imaging exams will be collected and “ground truthed” so that a neural network or deep learning classifier can be used to improve classification accuracy. Techniques will be used to first classify the input imagery into categories such as skeletal, fat, normal breast tissue, breast cancer, etc. [26, 27]. The user will need to input the tissue-type selection until an intelligent system will automatically determine the anatomy of the input imagery. User input of the tissue-type selection will be used as historical data to drive the classifier.
3.1.2. 3D filtering algorithm
The goal of the 3D filtering algorithm is to make the most understandable representation of the imagery to the radiologist [26, 27]. In order to accomplish this goal, it utilizes the image tissue-type from the 3D segmentation algorithm to determine which tissue types should be given highest priorities.
Current graphic hardware have limitations on how many voxels that can be displayed. Therefore, it is necessary to allocate percentages to the various tissue types based on priority. Those tissue voxels that must be seen from a clinically importance perspective (e.g., tumor) are given the highest percentage. Other tissues that are not as clinically important (e.g., subcutaneous fat) are sparsely sampled. There are two benefits of this process. First of all, this allows the operator to understand the high priority tissue (e.g., tumor) in proper context (e.g., tumor is touching the spine). It can be difficult or impossible to understand exact position of the tumor when all the surrounding tissue has been removed. Second of all, this lower priority tissue needs to be somewhat more transparent so the higher priority tissue can be seen through the lower priority tissue. However, transparency allocated may not necessarily correspond to tissue density. Additionally, thinning the tissue between the viewing perspectives and the area of interest makes it much easier to see the area of interest accurately.
User selected filtering is employed to enable the operator to slice away tissue sections to facilitate viewing of areas of interest. Additional implementation of various geometric volumes and surfaces to temporarily remove sections of the voxel cloud may be beneficial. The operator can also enable or disable display of specific tissue types.
3.1.3. 3D rendering engine
The goal of the 3D rendering is to replicate, insofar as possible, what a radiologist would see if looking with his/her own two eyes through the skin and into the body were possible. A slightly different image would be seen from each eye from its particular vantage point. This difference allows one to see depth. The basic concept is illustrated in Figure 10. The geometry behind the approach is illustrated in Figures 11 and 12.
The Left Eye Viewing Perspective (LEVP) would be at an observer/ radiologist selected point to inspect the volume. The Right Eye Viewing Perspective (REVP) would be the inter-ocular distance to the right of the LEVP and similar math would be applied to generate the slightly different image presented to the right eye display. The rendering engine generates the left and right eye views to provide a true 3D data visualization to be displayed in the AR/VR HDUs [26, 27]. This component relies heavily on the Graphics Processing Units (GPU) built into the graphics card. The rendering engine supports a convergence depth adjustment for fine-tuning the operator’s focal point as shown in Figure 13.
The interocular distance can of course be altered to change stereopsis. The rendering engine allows the operator to comfortably move the viewing position, zoom, and rotate about the pitch, roll or yaw axes. The angular field of view can be changed. The volume of interest (VOI) can be rotated. The viewing perspectives can also be rotated. See Figure 14. And to improve visibility, the operator can select any number of color palettes for different tissue classes. The rendering engine also incorporates a graphical overlay for image markup.
3.2. AR/VR control systems used in diagnostic radiology
The field of AR/VR is evolving with new and changing HDUs and gaming control systems coming on the market. In order to illustrate what we have used in our prior research, we illustrate example control systems shown and how various control buttons correspond with basic maneuvers and advanced maneuvers. See Figures 15–17.
3.3. Initial results of AR/VR in radiology
As previously discussed, there have been numerous applications of AR/VR in surgery to include pre-operative planning, education, and intra-operative assistance . However, AR/VR is not yet FDA approved in diagnostic radiology, but is being actively researched by DXC Technologies/D3D Technologies with an initial focus in breast cancer imaging [11, 12, 13].
Breast Cancer is one of the leading causes of death in women [28, 29]. Breast calcifications are extremely common and are present in up to 86% of mammograms . The calcifications are classified by distribution with a linear and branching pattern suspicious for ductal carcinoma in situ (DCIS) [31, 32]. Standard mammographic views may not reveal the true linear and branching pattern due to suboptimal view point and lack of depth perception. Therefore, the D3D AR imaging system was tested on a simulated set of microcalcifications. The radiologist who rated the AR system found that when the microcalcifications appeared as a cluster when viewed from a single perspective, but with rotation and the AR HDU appeared as a linear and branching pattern. See Figure 18.
In addition to microcalcifications, breast cancer can also present as a mass on imaging. Characterization of the shape and margins is important in determining whether the breast mass is malignant or benign. Dedicated breast CT provides high spatial resolution of breast masses . Recent data has shown the importance of characterizing tumor morphology . Therefore, the D3D AR system was also tested in viewing of a known breast cancer . Malignant features of spiculations were noted to be more conspicuous on the D3D system than the native CT. See Figure 19.
3.4. Future work of AR/VR in radiology
Both the fields of AR/VR and diagnostic radiology are large and rapidly growing. One of the most common reasons for performing diagnostic imaging is for cancer. Early determination of whether a particular therapy regimen is working would be extremely helpful to improve survival and would save costs. We outline a flow chart below as a recommended process for introducing AR/VR evaluating a cancer at multiple time points. See Figure 20.
It is foreseeable that AR/VR will one day play a major role in diagnostic radiology. Computer aided diagnosis (CAD) will help to identify the abnormalities. The role of the radiologist will include assessing an abnormality in great detail to appreciate subtle changes to diagnose accurately and assess treatment response. Features of AR/VR including depth perception, head tracking, improved GUIs create an overall immersive environment allowing for new opportunities in diagnostic radiology. Radiologists will interact with medical images in ways never before including voice commands, gestures or through handheld devices with haptic feedback.
Both diagnostic radiology and AR/VR are large and rapidly growing fields. In this chapter, we reviewed diagnostic medical imaging equipment, data storage, conventional slice-by-slice analysis and advanced 3D rendering techniques including surface rendering and volume rendering. We then introduced D3D processing of images, so volumetric medical imaging can be displayed in AR/VR HDUs. We showed a variety of GUIs including controllers and joysticks with a variety of functions achieved by various functions. Imaging cases were illustrated with a specific focus on breast cancer. We concluded with a discussion on future techniques in comparing how a tumor changes appearance over multiple time points.
Conflict of interest
All authors either have direct financial interest or are employees of D3D Technologies or DXC Technologies.