Resolving the Difficulties Encountered by JPL Interplanetary Robotic Spacecraft in Flight

Although many precautionary measures are taken to preclude failures and malfunctions from occurring in Jet Propulsion Laboratory (JPL) interplanetary robotic spacecraft before launch, unexpected faults and off-nominal conditions do happen in flight. Also, as spacecraft age, electrical and mechanical parts are expected to degrade in performance. Unlike aircraft vehicles, once robotic spacecraft are launched, they cannot be returned to the hangar for repairs. Maintaining the health and functionality of robotic spacecraft, probes, rovers, and their compliment of science instruments is an ongoing challenge which must be met throughout the lifetime of every mission. When unexpected or anomalous events arise, the Spacecraft Operations ground-based Flight Support (SOFS) team of engineers for that particular spacecraft must troubleshoot the problem and implement a solution within the allowable time constraints.


Introduction
Although many precautionary measures are taken to preclude failures and malfunctions from occurring in Jet Propulsion Laboratory (JPL) interplanetary robotic spacecraft before launch, unexpected faults and off-nominal conditions do happen in flight. Also, as spacecraft age, electrical and mechanical parts are expected to degrade in performance. Unlike aircraft vehicles, once robotic spacecraft are launched, they cannot be returned to the hangar for repairs. Maintaining the health and functionality of robotic spacecraft, probes, rovers, and their compliment of science instruments is an ongoing challenge which must be met throughout the lifetime of every mission. When unexpected or anomalous events arise, the Spacecraft Operations ground-based Flight Support (SOFS) team of engineers for that particular spacecraft must troubleshoot the problem and implement a solution within the allowable time constraints. Degradation of spacecraft components can occur from several different sources. Material stresses caused by environmental effects such as solar heating or the cold of deep space and solar radiation bombardment can contribute to malfunctions in subsystem components.
Additionally, autonomously running Flight Software (FSW) sequences and in-flight computer coding upgrades periodically sent to the spacecraft can potentially introduce human-induced faults. Further, as spacecraft design sophistication and complexity increases, failure modes increase in number, and fault diagnosis & resolution becomes a more difficult and time-consuming task for the SOFS team to handle. In order to meet mission constraints, timely solutions must be implemented for handling the task of collecting large volumes of telemetered data from the spacecraft which are compared with archived historical data & spacecraft design information to determine failure causes and implement fault resolution actions. Additionally, interplanetary spacecraft missions that experience large Earth-spacecraft distances (such as exploration missions to the outer planets of our solar system), present an additional challenge since the ever-increasing time delay between commands sent by the SOFS team and return telemetry received by the spacecraft limits the ability to respond to failure occurrences in a timely manner. This Round  Trip Light (and radio) Time (RTLT) delay between ground commanding and spacecraft data delivery back to the SOFS team is especially of concern when critical "one-chance mission events" must take place at a specific time (such as deploying a probe while flying by a planet's moon), or when serious, potentially mission-catastrophic failures occur so quickly that they must be fixed immediately.
To protect robotic spacecraft from these types of hazards and limitations, mission robustness is enhanced by implementing several strategies to provide a spacecraft system with greater integrity and diagnostic capability. This system health management approach is employed by several means: implementing "flight rules" and mission design constraints, applying functional redundancy through FSW, adding redundant hardware, and applying Fault Protection (FP) techniques which consist of automated response routines containing preprogrammed instructions to respond to failure conditions. This FP strategy involves autonomous monitoring of component operation to ensure device health, evaluation of internal and external conditions, and monitoring power allocation to spacecraft devices. In general, most JPL robotic spacecraft require some unique mission specific FP, but the majority of spacecraft configurations contain FP algorithms which protect the command and data processing capabilities, maintaining attitude control of the vehicle, protection against Earth-communication loss with the spacecraft, ensuring that safe external and internal temperature levels are maintained, and recovery from power overloads or power loss. To accommodate the majority of anticipated faults, most spacecraft are equipped with a general-purpose "Safe-Mode response routine" that configures the spacecraft to a reduced power state that is power-positive, thermally stable, in a communicative state, with a known predictable spacecraft configuration so that diagnosis of more complex faults can be addressed by the SOFS Team. Optimization of spacecraft post-fault recovery time is achieved through the development of automated tools and pre-determined "recovery procedures" which contain pre-defined actions for the SOFS team to follow which greatly reduces post-fault recovery time.
This chapter details the challenges and difficulties encountered by several JPL interplanetary spacecraft missions during the course of their mission flight phases and describes the solutions and workarounds implemented by their supporting SOFS ground teams to protect their mission objectives. throughout the lifetime of the mission. As spacecraft journey through the vastness of space, many factors will provide a challenge in maintaining spacecraft health and functionality. All of these risk factors must be taken into account when designing JPL spacecraft, even those influences and events which may be unforeseen.
In order for spacecraft systems to function properly, both external and internal temperatures must be monitored, regulated, and controlled during the entire lifetime of the spacecraft's mission. Exposure to the sun's heat is one of the most detrimental external influences on spacecraft operation in the vacuum of space if the vehicle flies in close proximity to this celestial body. The spacecraft's surfaces can superheat when exposed to the sun, while shadowed surfaces can fall to extremely low temperatures. Material stress can result from this thermal expansion-contraction effect, leading to uneven heating. This uneven heating can lead to warpage, breakage of components, or camera distortion. To help alleviate some of these problems, spacecraft are equipped with fault-preventative devices such as optical solar reflectors, mirror tiles, or multi-layer insulation thermal blankets which will reflect the sun's heat and radiation so that the spacecraft is somewhat protected against overheating, while retaining its internal heat to prevent too much cooling. Adverse thermal environmental conditions must be avoided since computers and spacecraft components will cease to work if spacecraft temperatures become too extreme (Qualitative Reasoning Group, 2005). Additional precautions must also be taken to ensure that instruments do not fall out of operating limits, since many devices are designed to operate within a narrow range of temperatures. Also important is the spacecraft's interior environment which must be properly managed as well, since heat build-up can occur from the spacecraft's own systems. One method employed to regulate internal temperatures is circulating the spacecraft's own gas or liquids (fuel) to cool its interior. Equally important is the thermal state of these substances since they must be maintained to ensure that they do not freeze from deep space exposure. This condition would render the propellant unusable so that the spacecraft would not be able to maneuver, eventually becoming misaligned with Earth so that no signals could be sent or received by the spacecraft.
Although precautionary measures are taken to preclude the possibility of human-induced electro-static discharge events (static electricity discharge) within spacecraft components during the manufacturing process, "latent failures" can occur after launch, rendering the device useless or partially useless. Additionally, human error can also be introduced within command sequences which are continuously generated and sent to the spacecraft. These sequences contain instructions for controlling the spacecraft's activities such as tracking Earth, monitoring celestial references for attitude targeting, performing maneuvers to finetune the trajectory when required, and carrying out science calibration and operations. These command sequences are all subject to human error which can potentially cause serious faults. One example would be accidentally turning off a radio transmitter or receiver device onboard the spacecraft, thus preventing communication with earth. Another fault could be turning on too many spacecraft instruments and components at the same time so that the spacecraft's power source (solar panels, Radioisotope Thermoelectric Generators (RTGs), fuel cells, etc.) are unable to provide the power required to support all operating systems. This condition is referred to as a "spacecraft-wide under-voltage power-outage" in which loss of power to critical devices can occur, such as the computers which must www.intechopen.com maintain their power levels to retain computer memory. Automated FP routines are implemented to resolve this type of condition, which is further detailed in an example covered later in this chapter.
Although radio waves travel at the speed of light, making spacecraft-earth transactions almost instantaneous near earth, as the distance between earth and the spacecraft increases, even a signal traveling at the speed of light can take hours. This lag time becomes a highrisk deterrent to fault recovery when spacecraft are sent out great distances like the Galileo, Cassini, and Voyager missions. Under some anomalous conditions, it is impossible for spacecraft to respond to ground commands quickly enough to preclude a catastrophic failure from occurring. An example would be the failure of a latch valve to close properly in the propulsion maneuvering system after re-pressurization of the spacecraft's fuel tanks has commenced. This type of fault can cause the tank pressure to rise substantially in a very short amount of time. If this condition were to occur on the Cassini spacecraft (Mission-to-Saturn) where the RTLT is approximately 3 hours, the pressure level could potentially reach a catastrophic point before the pressure measurement data could even reach earth to indicate that the fault condition has occurred, since Cassini's telemetry stream takes well over an hour to reach SOFS personnel from its Saturn-Titan orbit position. This "lag time" problem especially becomes a concern for spacecraft missions that contain one-time opportunities such as planet/moon encounters. For these events, the timing is crucial since only one opportunity exists to meet the objective and there may be no second chance. These unique events must proceed without the threat of fault interference in order for the spacecraft's mission to be successful (Morgan, 2011).
Another concern for spacecraft systems is Electromagnetic Compatibility (EMC) between components. When designing spacecraft subsystems, Electromagnetic Interference (EMI) effects must be minimized so that the spacecraft's systems function properly within their intended operational environment, without adversely affecting or being adversely effected by other spacecraft components. Spacecraft subsystems can become ineffective or malfunction if neighboring devices are not designed to minimize their EMI effects when operating simultaneously. To ensure component compatibility, EMI assessment and testing are required pre-launch to avoid undesirable electromagnetic fields, conducted voltages, and currents. As an example, the Cassini mission implemented a study to preclude EMI effects from other subsystem devices on the Duel Technique Magnetometer (MAG) science instrument. The MAG device consists of two instruments which have been mounted along an 11-meter boom apparatus to minimize spacecraft component EMI effects. During the early project phase, several engineering components and science instruments were identified to be potential magnetic interference sources (e.g. Traveling Wave Tube Amplifiers (TWTA), Propulsion Module Subsystem (PMS) multiple latch valves, Power & Pyro Subsystem (PPS) latch relays, etc.). Pre-launch preliminary assessments indicated that the permanent magnets contained in these subsystem devices had the potential to impact the upcoming MAG science experiments. A Magnetics Control Review Board (MCRB) was established to address EMC issues to ensure that magnetic cleanliness was maintained between devices (Narvaez, 2002). Participants representing these subsystem devices discussed precautionary measures such as shielding methods, implementation of magnetic compensation, wiring layouts to minimize loop areas, and replacing magnetic materials with non-magnetic materials. This effort led to the establishment of requirements and guidelines to assist hardware designers in developing EMC strategies which would produce minimal magnetic field output. The MCRB committee stressed implementing these fixes as early as possible in the design phase to allow for flexibility in the available solutions. Amongst the fixes implemented on behalf of the MAG instrument EMI reduction effort were: 1) both TWTAs were packed side-by-side within their housing so that their respective magnetic field polarities would be configured in opposing directions, 2) the PPS subsystem arranged all magnetic latch relays to occur in pairs, with their respective magnetic poles opposite to each other (provides selfcancelation; for odd number relays, a small compensation magnet was installed to neutralize the field), 3) a theoretical model was produced for the four RTGs which provided optimum compensation for the selected arrangement of clocking angles. For those subsystems which could not reduce their EMI effects or replace high magnetic materials with non-magnetic materials, magnetic compensation was implemented. The most significant magnetic compensation was installed into the PMS latch valve components. In this case, each latch valve was measured and magnetically compensated with magnets which contained the same dipole moment (opposing). Following these EMI reduction applications, each one of Cassini's components was tested in order to verify its respective magnetic cleanliness for the overall system, prior to its final installation on the spacecraft.
In addition to the above challenges, many spacecraft designs have become more complex throughout the last several years. As a result, fault diagnosis and resolution becomes a more difficult and time-consuming task to undertake since fault causes can lead to a plethora of possibilities for these very complicated systems. This poses a substantial challenge for the SOFS Team whose task it is to collect large volumes of telemetry data needed to diagnose faults and propose resolution actions. This can be an arduous, time consuming manual process, sometimes requiring hundreds of data products from the spacecraft's telemetry stream to be compared to archived historical data, as well as design information in order to evaluate the problem to propose a solution. To aid the fault diagnosis and solution process, automated FP routines are typically implemented into the spacecraft's FSW to deal with the majority of possible failure conditions; this FP is designed to protect for any Single Point Failure (SPF) conditions that might arise (unless proven extremely unlikely; waiver issued), with the following priorities in mind: 1. Protect critical spacecraft functionality 2. Protect spacecraft performance and consumables 3. Minimize disruptions to normal sequence operations 4. Simplify SOFS recovery response These FP groundrules are typically implemented with the following principle in mind, following any anomaly: Ensure the spacecraft's commandability remains intact as well as the maintenance of its systems; to remain in a stable, safe state for a pre-determined period of time following any anomaly (e.g. for the Cassini spacecraft, this period is two weeks, by which time the SOFS team should be able to recover the spacecraft and restart its onboard sequence

FP groundrules for JPL spacecraft
Each JPL spacecraft is unique in its configuration and mission objectives and the task of implementing autonomous FP must be considered carefully according to its configuration, expected environment, component design, and its operational tasks, although some FP is approached in a generic manner. In general, autonomous fault protection should only be implemented on-board the spacecraft for those fault conditions where a ground response is not feasible or practical, or if fault resolution action is required within a pre-defined period of time of detecting the failure. Otherwise, the ground system should have adequate time to respond to the fault and should be responsible for the fault recovery. In both cases, the ground is responsible for failure diagnosis and re-configuration of the spacecraft to nominal operations after the fault. Some spacecraft designs may be quite simple (e.g. lack propulsion and attitude control subsystems entirely, such as an atmospheric probe), and some spacecraft are quite complex, but many spacecraft share common systems which require a similar approach in FP design (Morgan, 2005).

Fault protection typically implemented into JPL spacecraft
Some spacecraft have design configurations simple enough to warrant only minimal fault protection which is meant to address any type of fault condition that might occur, yet other spacecraft designs are so complex and sophisticated, with long mission durations, that they must maintain a system which may present numerous error possibilities. Most spacecraft typically rely on a "general-purpose, Safe-Mode" fault response which typically configures the spacecraft to a lower power state by turning off all nonessential spacecraft loads, commanding a thermally safe attitude, providing a safe state for the hardware, establishing an uplink and a downlink, reconfiguring to a low-gain antenna, and terminating the command sequence currently executing on the spacecraft. This type of response is used to configure the spacecraft into safe and predictable state so that the SOFS team has enough time to evaluate the fault causes and determine a solution.
FP typically implemented into JPL spacecraft designs also includes an automated response to address "loss of spacecraft signal" faults that affect the SOFS team's ability to communicate with the spacecraft. Failure to receive the spacecraft's uplink signal can be caused by a number of problems which include ground antenna failures, environmental interferences, spacecraft hardware failures, as well as an erroneous spacecraft attitude (pointing error), radio frequency interferences, or an error introduced in an uplinked sequence (e.g. radio transmitter device accidentally turned off). If the spacecraft has experienced these types of failures and is no longer able to receive commands from the ground, a FP response can be implemented to help re-establish the uplink. This type of FP is referred to as a "Command Loss Response" (from the perspective of the spacecraft, that it is no longer receiving ground commands) which is typically an "endless-loop" response (see Section 3.2.1).
Another FP algorithm typically installed into spacecraft is for recovery from a system-wide loss of power. This is referred to as "Under-Voltage" recovery, and can be caused by a number of fault conditions depending on the spacecraft design (i.e. oversubscribing the power available, a short in the power system, or a communications bus overload). Should a www.intechopen.com system-wide power loss occur, not even the Safe-Mode response will execute since the main computer will also lose power thus causing loss of the mission. Therefore, FP must be implemented to detect the power level drop so that the system may automatically shed its non-essential loads from the communications bus, isolate the defective device, and reestablish essential hardware. The quick actions of this response allow critical spacecraft memories to be maintained throughout the Under-Voltage event (see Section 3.2.2).
FP monitors detect anomalous conditions using predefined "trigger values" which are referred to as "thresholds" or "redlines," that represent the value at which an anomalous condition is present. The monitor design may also include logic which detects for, and ignores data from failed sensors. "Consecutive occurrence counters" are also used in some FP monitors; these are referred to as "persistence filters" and are implemented for a variety of reasons: to ensure that transient occurrences do not trigger a response, to satisfy hardware turn-on constraints, or to allow other FP monitors to detect faults first. SOFS personnel can also enable or disable the spacecraft's monitors and responses during the mission as appropriate. This is accomplished through a FSW flag which may be manipulated by the team. For the most part, the FP is designed assuming that these flags will be enabled throughout the mission; however, some exceptions to this strategy exist:


The response is only appropriate when the associated device is powered on & operating  The response is required only for specific mission events  The response is not appropriate for a particular event  The response is not compatible with the currently operating sequence

FP examples from the Cassini-Huygens mission-to-Saturn spacecraft
The Cassini-Huygens spacecraft is a joint NASA/ESA/ASI mission to the Saturnian system sent to study the planet and its many natural satellites. The craft was launched from Cape Canaveral on October 15, 1997 following nearly two decades of development. It is comprised of a Saturn orbiter (shown in Figure 2) and an atmospheric probe/lander to investigate the moon Titan. The Cassini spacecraft has also returned data on a wide variety of tasks including assessment of the heliosphere, planet Jupiter, and has conducted relativity tests. During the early part of its seven-year cruise phase, Cassini's trajectory was fine-tuned by performing "gravity-assist flyby" maneuvers which utilized the inner planets of the solar system. Two of these gravity assist flybys were implemented around Venus (April 26, 1998& June 21, 1999, one around Earth (August 18, 1999), and one around Jupiter (December 30, 2000) as shown in Figure 5. With the use of this VVEJGA (Venus-Venus-Earth-Jupiter Gravity Assist) trajectory, it took 6.7 years for the Cassini spacecraft to arrive at Saturn in July 2004.
During the 6.7 year cruise phase, several Trajectory Control Maneuvers (TCM) were performed using Cassini's Main Engine (ME) and Reaction Control System (RCS) jets to guide the Spacecraft to its intended destination. Once near the Saturnian system, the "Saturn Orbit Insertion (SOI)" burn maneuver was implemented to slow the craft down so that it could be captured into Saturn's orbit. This marked the beginning of its four-year Orbital Tour phase around Saturn's complex system of moons which is shown in Figure 6. The probe was separated on Christmas Eve 2004, landing on the Titan moon in January 2005.
The current end-of-mission plan is for a controlled 2017 Saturn impact (Smith & TPS, 2009).

Cassini's command loss algorithm
Figure 7 illustrates how the Cassini Spacecraft's Command Loss FP Algorithm addresses faults that can cause ground-spacecraft communications loss; this condition is referred to as "loss of spacecraft commandability." A special "countdown timer" has been implemented into the onboard CDS FSW to keep track of the last time an uplink command was received from the ground operators. This timer decrements continuously (at one second intervals) and is reset back to its "default value" (several days for Cassini) each time an uplink command is received by the spacecraft. The extended absence of uplink commands will eventually lead to the monitor's request for the response, since the timer will eventually decrement to "0". Under these conditions, the assumption is that the spacecraft has experienced a failure where it can no longer receive commands. Cassini contains redundant units for the Command & Data Computer (CDS), Radio Frequency (RFS) devices, (Deep Space Transponders, TWTAs, Telemetry Control Units (TCU)), as well as three antennas (one High Gain Antenna (HGA) and two Low Gain Antennas (LGA)). The Command Loss Response is divided up into "Command Groups" with "Command Pauses" installed after each group of commands has been executed. These pauses allow several hours (the equivalent of at least two RTLT periods) for the SOFS team to attempt re-acquisition of the spacecraft using the newly response-commanded spacecraft configuration. As shown in the figure, the first Command Group will select the auxiliary oscillator and execute the Safe-Mode response which turns off non-essential loads, commands the spacecraft's High Gain Antenna to the Sun, and places the spacecraft in a known uplink & downlink state. A 15 hour wait period has been installed after this first Command Group to allow sufficient time for the SOFS team to re-establish the uplink, if possible, before hardware swaps begin. If this attempt is unsuccessful, the response will proceed with the next course of actions in Command Group #2 which is to start the series of RFS hardware unit swaps. Five to seven hour wait periods are installed between each subsequent Command Group to allow the SOFS team adequate time to send commands to the spacecraft to re-establish the uplink on the new commanded configuration. At the end of the response chain (approx. 5 days 20 hrs), a swap to the redundant CDS is initiated and the www.intechopen.com response will activate on the other computer's FSW (the response runs endlessly until an uplink command is received by the ground). The goal of Command Loss FP is to perform hardware swaps and/or re-command the S/C attitude until the ground acquisition is restored. Once the spacecraft successfully receives a command from the ground and the uplink has been re-established, the response will terminate and reset its countdown timer, thus leaving the spacecraft on the last successfully commanded configuration.

Cassini's under-voltage trip algorithm
Cassini's "Under-voltage Trip" monitor and response are shown in Figure 8, "Cassini Spacecraft's Under Voltage FP Actions for Shorted RTG" in which a RTG power unit (one of three on this spacecraft), has shorted. In this example, the Power Subsystem FP senses a power drop below the predefined threshold for the duration of the persistence filter. The first action taken by the Power Subsystem FP is to diode-isolate all three RTGs, turn off (loadshed) all spacecraft non-essential loads, regain the voltage regulation to 30 watts, and then turn on all essential hardware. It also sets three "UV Status Flags" (one for each RTG) to notify System-level FP (SFP) that an Under-Voltage trip event has occurred. Once the CDS becomes operational, it will deliver these UV Status Flags to SFP. SFP's Under Voltage monitor will examine the state of each RTG and if enabled, will request the Under Voltage response. The SFP response un-isolates any correctly operating RTG, unsets its corresponding UV Status Flag, and establishes a predictable, safe spacecraft state by executing the Safe-Mode response.

Cassini safing response activations to date
On the Cassini project, is the responsibility of the SOFS ground team to support spacecraft activities via the established Mission Plan, to follow established constraints, flight rules, agreed upon waivers, and requirements documentation in order to support the following activities:  Figure 9 shows the HAS Response following the Safing Response activation. www.intechopen.com

Examples of unforeseen, unplanned for spacecraft problems & SOFS team solutions
Not all spacecraft faults will activate the Safing Response and terminate the onboard command sequence. Some faults are benign enough to allow the sequence to remain in progress since the FP can fix the conditions without intervention from the SOFS team. However, some fault conditions are unforeseen prelaunch, presenting themselves as a new challenge for the SOFS to resolve during the actual flight phase. This section lists a few examples of unexpected faults that have occurred on several JPL spacecraft, without the benefit of preventative FSW, FP, or redundancy to fix the problem. In spite of this fact, all SOFS teams realized that in any spacecraft mission there is always the possibility that new problems can arise due to unknown environmental effects, human errors, or component/science instrument aging.

Solid state power switch SEUs
The Cassini spacecraft consists of 192 SSPS switches which are susceptible to SEUs, caused by galactic rays within the flight environment. One or more photon hits can occur on the voltage comparator resulting in a false indication that the current load is anomalously high. When this condition occurs, the SSPS switch transitions from either an "on" or "off" state to "tripped." The result of this condition can be benign to serious, depending on which switch is tripped, and if it is in use at the time. In May 2005, a SSPS trip event on the spacecraft's ultra stable oscillator caused the SOFS ground team to lose communication with the spacecraft for a short period of time. In September 2007, the TWTA device tripped which activated a FP response, thus causing a Power-On-Reset of the RFS system, and hardware swaps to the redundant Telemetry Control Unit and TWTA device; the Safing Response was also activated (see Section 3.3). Although nothing can be done to reduce or inhibit the occurrence of SEU induced SSPS trips (which are unpredictable and occur sporadically), the SOFS team designed a new algorithm in CDS FSW to respond to these upset events. This new "SSPS Fault Protection" algorithm cycles through one SSPS per second (of 192 switches) and responds to the tripped condition if three consecutive passes through the monitor logic determines that a tripped switch condition is present. A series of predetermined actions have been coded into FSW to respond to the "tripped" condition for each switch, depending on the appropriate action for that load. An example is shown below for the CAssini Plasma Spectrometer (CAPS) Instrument where if its electrical load current is tripped, the FP will log the event, command the switch "off", and then command its CAPS Replacement Heater Load Current (CAPS_RHtr_LC) "on" to protect the thermal integrity of the device: As of this the date of this writing, there have been 33 SSPS trip events (25 during the prime mission).

Main Engine Assembly (MEA) cover degradation
Cassini's ME assembly requires a cover which must be deployed (closed) when the engines require protection from micrometeoroid and on-orbit dust impacts which often surrounds Saturn and its moons. Shortly before the Deep Space Maneuver (DSM) burn when the MEA cover was stowed (opened), the cover assembly did not open as far as was observed in ground tests. The cover opened 14 degrees less than expected, but the SOFS team demonstrated that this opening angle was adequate to allow for successful main engine burns to commence (on either nozzle). The cause of this degradation in performance of the MEA cover was attributed to the increased stiffness in the cover material (kapton & beta cloth) due to exposure to the space environment which was experienced during flight within the inner solar system, although a period of disuse also contributed to this increased stiffness. These environmental effects cannot be adequately modeled in ground tests. The SOFS team's ongoing response to this unexpected behavior of the cover actuation was to monitor its behavior closely (along with device experts) with results to date demonstrating that the opening angle has remained acceptable through several dozen cycles, with no further signs of degradation observed as depicted above ( Figure 10). As of the date of this writing, 66 in-flight cycles have now been performed (Millard & Somawardhana, 2009 ,160,162,163,164,164A,165,166,167,168,169,170 OTMs 218,221,224,227,230,236,237,239,240 OTMs 251,254,269,272,275 OTMs 290,294 www.intechopen.com

Probe transmission design error during Titan moon encounter
The European Space Agency's (ESA) Huygens Probe was piggybacked aboard the Cassini orbiter to capture data from Titan's atmosphere and measure wind effects and surface features once deployed onto this moon. Return of the probe's data was a key element to the success of the joint Cassini-Huygens mission. Since the Huygens Probe had minimal onboard data storage capability, data was to be transmitted to the Cassini orbiter immediately during the Entry, Descent, and Landing (EDL) phase of the probe mission so that the orbiter itself provided the bulk data storage that was needed. To prepare for the probe deployment and relay of its Titan data back to earth, which was to commence in January 2005, end-to-end in-flight tests of the Probe Relay link were performed in February 2000. This was necessary in order to characterize the behavior of the combined Cassini-Huygens system, where the real probe signal was simulated in-flight from the DSN to the spacecraft. During these tests, the signal and data detection thresholds of the receiver were of particular interest. Results confirmed that there was sufficient margin to maintain the carrier and subcarrier lock for the duration of the probe mission, but the digital circuitry which decodes the data from the subcarrier did not have sufficient bandwidth to properly process the data from the subcarrier once it was Doppler shifted by the nominal 5.6 km/s velocity difference between the orbiter and the probe. Since the digital circuit design did not adequately account for the probe data's full Doppler shift, the affect of this anomaly was that it would lead to an unacceptable loss of data during the probe descent to Titan phase. This lead to the formulation of the Huygens Recovery Task Force (HRTF) team, a joint effort between ESA/NASA group of experts to troubleshoot the problem in January 2001. Their efforts led to a three-part solution which allowed recovery of the Titan data.
Firstly, the mission profile was redesigned to provide the Huygens Probe with a trajectory which allowed a low Doppler shift in the probe-Cassini orbiter radio link. This impacted the early part of the Saturn Tour phase resulting in a higher Cassini orbiter flyby altitude of Titan, at 60,000 km, which required redesigning the first two revolutions around Saturn into three revolutions, and then resuming the original planned tour (at a moderate ΔV cost). Second, the Probe Support Avionics assembly was to be commanded to the base frequency (called BITE Mode -a test mode that holds the lockup frequency at a level equivalent to -1m/s relative velocity) by the Cassini orbiter, instead of utilizing the signal at the expected Doppler frequency. This mode of operation was commanded at 12sec intervals (issued by FP; a reserve FP algorithm slot in FSW was utilized to aid in this solution), to ensure that BITE Mode was maintained. Thirdly, the probe's transmitters were pre-heated before probe descent into Titan's atmosphere to optimize the transmit frequency.
The Huygens Probe mission was very successful, with the exception of one (of two) data transmission channel to the orbiter which was not received and recorded (human error). Since all instrument data was duplicated between the two data channel streams, most data was collected with the exception of the Doppler Wind Experiment which relied upon receipt of both channels (Allestad & Standley, 2006.). www.intechopen.com

Imaging Science Subsystem (ISS) haze anomaly
The ISS instrument is a remote sensing device that captures most images in visible light, as well as some infrared and ultraviolet images. By radio telemetry the ISS has returned hundreds of thousands of images of Saturn, its rings, and its moons. The ISS device consists of a Wide-Angle Camera (WAC) to photograph large areas, and a Narrow-Angle Camera (NAC) for areas of fine detail. Each of these cameras utilizes a sensitive Charge-Coupled Device (CCD) as its electromagnetic wave detector, with each CCD having a 1,024 square array of pixels. Both WAC & NAC cameras are configured with spectral filters that rotate on a wheel in order to view different bands within the electromagnetic spectrum ranging from 0.2 to 1.1 μm.
In 2001 (five months after the Jupiter flyby), a distinct haze was observed around Saturn images that were captured by the NAC which had not been seen in previous images. Further analysis of these images indicated that this anomaly was caused by contamination of extremely small particles which resided either upon the filter assembly or the CCD window. The investigation pointed to a decontamination cycle that was performed on May 25, 2001, thirteen months after the previous decontamination cycle which occurred prior to the Jupiter flyby. This indicated that there had been a longer than usual time period for contamination to build up. Additionally, this decontamination cycle had started from a temperature of -90 deg C, whereas all previous cycles had started at 0 deg C. This meant that the Periodic Instrument Maintenance (PIM) had a temperature swing of 120 degrees instead of 30 degrees. A series of decontamination cycles commenced, ranging from seven to fiftyseven days in length. In July 2002, after the final cycle, the haze was no longer present in the images. A new flight rule was instated to prohibit the use of the Level 1 and Level 2 heaters at the same time which prevented heating to 30 deg C and experiencing a large temperature swing such as this event which cause the anomaly (Haemmerle & Gerhard, Undated).

Occurrence of waived failure in flight: Leaking prime PMS regulator
One month after Cassini launched (Nov. 1997), a waived, potentially mission catastrophic Single Point Failure (SPF) occurred in flight. FP design typically dictates that no credible SPF shall prevent attainment of mission objectives or result in a significantly degraded mission, with the exception of the class of faults exempted by waiver due to low probability of occurrence. In this case, a pre-launch waived failure of the Prime Regulator within the PMS failed to properly close. In fact, the regulator exhibited a significant leak rate when the fuel & oxidizer tanks were pressurized for the first time during the Trajectory Correction Maneuver #1 (TCM-1). The leak rate was determined to be 1700 cc/min compared to the expected 1.70 cc/min "worst case leak rate" which was observed in testing. It was determined that the first pyro valve firing prior to TCM-1 event was the cause of this high leak rate, due to a stuck particle in the regulator (from pyro firing debris). The subsequent 90 min DSM burn (initiated at launch +14 months) exhibited an even higher leak rate at an increase of 6.6 times larger than TCM-1. This behavior suggested that an even larger particle had become trapped in the regulator. With this anomaly in place (which was not correctable), all non-critical ME burns to commence during the mission were affected, as well as the critical Saturn Orbit Insertion (SOI) burn maneuver coming up in July 2004.

www.intechopen.com
Fortunately, the Prime Regulator leak problem was discovered several years before the SOI burn was to commence, thereby allowing sufficient time to evaluate the history behind this problem and discover the cause of the anomaly (an important 'lessons learned' for future spacecraft in their development phases) and determine a fix to the mission design. Cassini's pre-launch Regulator design was based upon Galileo's Teflon "soft-seat" configuration which had demonstrated very good performance in flight, exhibiting excellent leakage behavior. However, cold-flow tests indicated that this type of soft-seat design was likely to experience a blocked flow passage due to seat extrusion (potentially a mission catastrophic failure). Galileo's test data was unavailable to evaluate this problem, so that Cassini's softseat was replaced with a "hard-seat" to avoid susceptibility to this failed-block condition, with a slight performance difference: the specified leak rate is increased by a factor of "10" with this hard-seat design: Enhancements were incorporated into Cassini's PMS design due to this increased risk in leak rate. A redundant, backup regulator was installed, as well as two new 'Over Pressure' (OP) FP algorithms, which were designed to detect any tank over-pressurization within the fuel and oxidizer tanks, which was to be use for all non-critical ME burns (non-critical mission phases; i.e. not used during the critical SOI Burn event). The pre-launch mission design called for the PMS system to be characterized 30 days prior to the SOI Burn maneuver, so that the OP FP could be disabled. Leak mitigation measures were also added to the PMS plumbing: Two high-pressure helium latch valves (LV10 & LV11), a pyroisolation ladder upstream of the regulators (PV10-PV15), plus several filters as depicted in Figure 11 (Barber, 2002;Leeds et al., 1996): These design changes led to a heightened confidence which drove Cassini's mission design and led to the implementation of two waivers for the critical SOI Burn; so that the OP FP algorithms were NOT required during the SOI maneuver: Waiver #1: Any "Under Pressure" condition is negligible Waiver #2: Any "Over Pressure" condition is extremely improbable Fixing the Problem: Cassini's original flight strategy was to lock-up the prime regulator one month after launch, command the LV-10 helium latch valve open to feed helium into the tanks, and then leave LV-10 open for the remainder of the mission. With the leaky Prime Regulator in place, any pressurized ME burn had the potential to increase the leak rate. Since the Backup Regulator was also subject to a particulate-induced leak, swapping devices was deemed impractical unless the leak rate increased substantially on the Prime Regulator. Therefore, it was decided that LV10 must be opened just before to any ME pressurization activity, and must be closed as soon as the desired pressure levels were reached. Hence all ME burns had to be initiated via uplinked autonomous command sequences to ensure that the proper timing was maintained. This solution was not applicable to the SOI Burn which www.intechopen.com was critical to the Cassini mission in that the spacecraft must be decelerated sufficiently in order to be captured into Saturn's orbit (Morgan, 2010). Fig. 11. Cassini's Propulsion System Schematic (ME Only).
www.intechopen.com  www.intechopen.com Since the PMS system could no longer be characterized and pressurized 30 days prior to the SOI Burn, the solution was to open LV-10 70sec before SOI Burn would commence, and close LV-10 when the desired tank pressure levels were reached. Identification of new failure modes associated with these changes in SOI Burn strategy were also necessary (e.g. helium LV-10 could become stuck closed, thus requiring an automated swap to the redundant helium LV-11 via FP; the Prime Regulator could fail wide-open or completely closed, thus requiring a swap to the redundant Backup Regulator), and these studies were conducted during the cruise phase of the mission before reaching Saturn.
New/augmented FP changes were incorporated in FSW and uplinked prior to the SOI Burn event, as well as performing characterization studies of LV-10 leakage performance to ensure proper behavior (leak rate within spec). The SOI Burn commenced in July 2004 and was very successful with no faults present; regulator performance was also very good (no increase in leak rate or significant rise in tank pressure level).

Missed launch due to STS-51L Shuttle challenger explosion
The Mission-to-Jupiter Galileo spacecraft was finally launched via Space Shuttle (STS-34) on October 18, 1989 after 11 years of development effort and 6 major mission redesigns. Once completed, Galileo was scheduled to launch onboard Shuttle Atlantis, STS-61G in 1986. The Centaur-G liquid hydrogen-fueled booster stage was to be utilized for a direct trajectory to Planet Jupiter. However, the mission was delayed by the interruption in launches that occurred following the STS-51L Shuttle Challenger disaster. Implemented were new safety protocols as a result of the tragedy which prohibited the use of the Centaur-G stage on Space Shuttle flights, forcing Galileo to use a lower-powered Inertial Upper Stage solid-fuel booster. During the down-time between 1986 and 1988 while the Space Shuttle Investigation was underway, the Galileo team evaluated alternative measures, since the low-powered  booster option presented a crisis in that the energy required to achieve a direct trajectory to Jupiter would no longer be possible. The mission was re-profiled to use several gravitational slingshot maneuvers of the spacecraft by the solar system's inner planets, so that a Venus-Earth-Earth Gravity Assist (VEEGA) strategy was designed and implemented in order to provide the additional velocity required to reach its destination. Galileo flew by Venus on February 10, 1990 gaining 8,030 km/hr; flew by Earth twice, the first time on December 8, 1990, then a second flyby of Earth on December 8, 1992, adding 3.7 km/sec to its cumulative speed. In 1994, Galileo was perfectly positioned to observe the fragments of Comet Shoemaker-Levy 9 crash into Jupiter. Galileo released its probe on July 13, 1995, and became the first man-made satellite on December 8, 1995 to enter Jupiter in a 198-day parking orbit.
By the clever use of gravity assists from the inner planets Venus and Earth, a viable mission was possible, although required a much longer flight time to Jupiter. This extended journey required several design modifications which included adding several sun shields to protect the vehicle when flying by Venus. To ensure its systems would survive, Galileo also added operations modifications which included a delay in the deployment of the High Gain Antenna (HGA) until the spacecraft was past the first Earth flyby event.

High gain antenna deployment failure
Galileo's HGA consisted of a metalized mesh fortified by a set of ribs (i.e. similar to an inverted umbrella), held to the support tower by a series of pins and retaining rods. These retaining rods were release shortly after launch but the HGA was maintained in a closed configuration that was thermally protected from the sun until the spacecraft was > 1 AU www.intechopen.com away (after the first Earth flyby). The SOFS Team communicated with Galileo through its two Low Gain Antennas (LGA). When commanded to deploy on April 11, 1991, the HGA only partially deployed, leaving the HGA mission in jeopardy. An investigation team was organized to rectify the problem where numerous attempts were made to fully deploy the antenna over the next two years, while investigating the alternative of using the LGA to support the Jovian operations segment of the mission. All attempts to fully deploy the HGA were unsuccessful, leaving the HGA antenna nearly useless.
In order to redesign the Galileo mission for LGA use only, the telecommunications link architecture was redesigned. The current architecture only supported 10 bps at Jupiter which was less than 1/10,000 th of the 134 kilobits per second (Kbps) required. Since modifications to the spacecraft's hardware to boost the transmit power was not possible, receiving capability of Earth's ground stations and developing a more efficient data and telecommunications architecture was the primary focus of the needed upgrades. Arraying the DSN antennas increased the rate by a factor of 2.5, and modifications to the receivers and telecommunications link parameters, improving encoding and onboard data compression further increased the downlink from 10 bps to 4.5Kbps. Since these improvements were insufficient to bring down all science data objectives, the SOFS team negotiated with the science team to prioritize science goals, develop new science plans, and periodically update spacecraft FSW to increase data efficiency. Also, as a backup to the downlinked data, the onboard Data Memory Subsystem (DMS) tape recorder was utilized during selected high activity periods (Nilsen & Jansma, 2011).

Mars exploration rover wheel failures
The Mars Exploration Rover (MER) mission is an ongoing, scientific undertaking involving two golf cart-sized robotic rovers. This mis s i o n i s p a r t o f N A S A ' s M a r s E x p l o r a t i o n Program, which includes two previous Viking program landers (1976) and the Mars Pathfinder probe (1997). The six-wheeled MER robotic vehicles, Spirit and Opportunity, landed in 2004 to explore the Martian surface and its geology. The mission's primary objective is to search for and characterize a wide range of rocks and soils that hold clues to past water activity on Mars. Originally a three-month mission, the MER mission was extended to present day. To date, much evidence has been collected to indicate that Mars was once a wetter and warmer place than has previously been determined.  During the mission, one of Spirit's six wheels stopped working. Its right-front wheel became a concern once before, when it began drawing unusually high current five months after the January 2004 landing. The SOFS team decided to drive Spirit backwards, which redistributed its lubricant and actually returned the wheel to normal operation. However, during the 779th Martian day, the motor that rotates that same wheel ceased working. One possibility considered by the SOFS team was that the motor's brushes or contacts that deliver power to the rotating part of the motor had lost contact. As a result of dragging Spirit's right front wheel, it cut a furrow in the Martian soil, revealing the layer beneath the surface, and in doing so, unearthed a material which significantly changed our thinking about Mars. Spirit found the evidence for a hydrothermal system, not only proving the existence of liquid water on Mars, but that there were energy sources coincident with that of liquid water, revealing the potential for support of an ecosystem.

www.intechopen.com
Rover Spirit became trapped in soft sand in 2009, and eventually ceased communicating with Earth in March 2010. Nearly seven years after launch, Rover Opportunity is still healthy, although the SOFS team has been driving this vehicle backward for the last two years in order to spread wear more evenly within its gear mechanisms (Callas, 2006).

Voyager interstellar mission RTLT
The Voyager Spacecraft Program consists of two scientific probes; Voyager 1 and Voyager 2. Both were launched in 1977 to take advantage of the favorable planetary alignment of the outer planets. Although officially designated to study just Jupiter and Saturn, the probes were able to continue their mission into the outer solar system, and as of June 2011, have exited the solar system and currently reside within the Heliosheath (region between the Termination Shock and the Heliopause). Voyager 1 is currently the farthest human-made object from Earth; as of July 2011: Fig. 19. Voyager Spacecraft.    A total time allotment of 32hrs, 23min, and 55sec plus sequence execution time, FSW processing time, and command execution time is required in order to verify each command sequence (i.e. more than one DSN antenna pass coverage is required to uplink and verify every command/command sequence; see Figure 23). SOFS teams managing missions with long RTLTs such as the Voyager spacecraft must minimize real-time commanding. When commands are sent, typically the team must verify these commands the following day (or two). To date, both Voyager spacecraft have adequate electrical power and propellant margin to maintain systems and attitude control until around 2025, at which time, science data return and spacecraft operations will cease.

Lessons learned
Although numerous precautionary measures are implemented into JPL robotic spacecraft missions to preclude faults and prevent failures, many unforeseen problems can occur throughout its journey. "Lessons learned" documentation captured from previously flown spacecraft can be of great help when designing future missions. For the most part, autonomous FP algorithms are based upon past flight experience, but new mission destinations can present challenges never before encountered by spacecraft.
Overall, for spacecraft to function properly without significant risk or degradation to the mission and its objectives, autonomous FP must be implemented to ensure that detection and resolution of fault occurrences are dealt with properly so that the spacecraft may preserve its overall health and provide a system with adequate diagnostic capabilities. This effort requires that subsystems are characterized accurately. The approval of Cassini's prelaunch PMS regulator waivers is a good example of a mistake in ruling out the possibility of malfunction based upon surmised flight experience, without supporting test data (from Galileo) for adequate evaluation. Unfortunately, the enhancements to FSW and hardware boosted confidence in the upgraded design changes and drove Cassini's FP design strategy as well as its mission profile, as its most critical maneuver relied solely upon the successful initiation of the 30-day Pre-SOI Burn Characterization Task. Yet even under these circumstances, FP modifications and additions were successfully designed and uplinked to the spacecraft to preserve the mission, it's three tour phases, and safeguard its science data collection objectives since designers provided for the possibility of extra FP slots in FSW.
In any case, the experience gained though Cassini's leaky regulator problem, and lessons learned in many other JPL spacecraft missions, has demonstrated that these types of unexpected failures can be resolved though re-evaluation and implementation of new FSW/FP in-flight; an endeavor which is possible during spacecraft missions if enough time is available. Flight experience has also taught us that the development of post-FP response recovery procedures which contain pre-defined actions for the SOFS team to follow greatly reduces post-fault recovery time and accuracy in diagnosing faults. New strategies such as the "planet-flyby gravitational slingshot" concept developed for the Galileo mission provide innovative ideas which may be utilized on upcoming spacecraft designs; in this case, for boosting the heavy, two-story sized Cassini vehicle into deep space, thereby reducing propellant requirements by as substantial margin. The book also provides a detailed overview on Spacecraft Systems including reliability of low-cost AOCS, sliding mode controlling and a new view on attitude controller design based on sliding mode, with thrusters. It also provides a technological roadmap for HVAC optimization. The book also gives an excellent overview of resolving the difficulties for interplanetary missions with the comparison of present technologies and new advancements. Overall, this will be very much interesting book to explore the roadmap of technological growth in spacecraft systems.