As the worlds
clocks rolled over from 1999 to 2000 everyone in the back of
their minds was worrying in some way about the Y2K Bug. Would it
cause computers, microwave ovens, videos, ATM Machines and any
other picec of machinery to fail. The predictions of doom were
widespread but failed to materialise on a large scale. So was
this down to good planning and a well implemented Y2K strategy or
was the Y2K Bug a hoax to being with as some believe ? Lets start
at the beginning with an explanation of the Y2K Bug.
What was the Y2K Problem?
For whatever reasonto save
precious memory in an era when memory was incredibly expensive,
because systems were not expected to last this long, or simply
because the problem was not recognized programmers long ago
adopted a two-digit convention to represent a year. This
convention has and will continue to cause failures as we approach
the turn of the century and beyond.
Briefly
defined, the Y2K problem involves any or all of the following:
Representation of the year as a two-digit
number that causes failures in arithmetic, comparisons,
sorting, and input and output to databases or files when
date data is manipulated.
The use of an incorrect algorithm to
recognize leap years for years divisible by 400.
Hard coding "19" into software
routines or the use of two-digit years with
"99" and "00" as reserved values
meaning "never delete this" or "this is a
demonstration account," respectively (sometimes
called "magic numbers").
System date data types that may roll over
and fail when the storage register fills up.
Incorrect
software will assume that the maximum value of a year field is
"99" and will roll systems over to
"00"which can be mistakenly interpreted as 1900
rather than 2000the result being negative date
calculations. Incorrect leap-year calculations will assume that
the year 2000 has only 365 days instead of 366. Moreover,
although Jan. 1, 2000 is the primary witching hour, many
date-dependent algorithms and forward-referencing systems are
already beginning to fail.
How
Vast was the Problem ?
Modern computer systems inherited their
conventions, and some of their problems, from the mainframe era
when it was common practice to encode the year as a two-digit
field. After all, workstations and personal computers were
initially built to augment mainframe systems and use their data.
The Y2K problem is exceptionally
widespread. It affects hardware (BIOS, real-time clocks),
embedded firmware, languages and compilers, operating systems,
random number generators and security services, database systems,
transaction-processing systems, electronic data interchange and
banking systems, spreadsheets, PBXs (private branch exchanges),
telephone systems, and more. The Y2K problem is not merely an
information systems (IS) problem. Although the majority of Y2K
problems are located in ISs, the sad truth is that systems
anywhere that use dates may be threatened.
It is naive to assume that new
applications and systems are immune to the Y2K problem. It was
only late last year that a new version of Quicken, a popular
personal finance package, was released that could handle dates
beyond 1999 [1]. At the January Federal Interagency Y2K meeting,
it was reported that the National Institutes of Health received a
shipment of brand-new personal computers containing three
versions of BIOS, two of which failed to correctly handle the
century transition.
What Makes It Unique?
Although similar to other software
problems, such as the four-digit ZIP code extension, the Y2K
problem is more than a standard maintenance problem.
It has a deadline that cannot move and is
common to everyone. And we will all be competing for
scarce resourcesCOBOL programmers and testing
personnel, for example.
It affects every system that has an
external interface. Therefore, fixes for interrelated
systems must be deployed simultaneously; otherwise,
systems could be broken by the changes from systems with
which they exchange data.
Instead of the traditional reaction to a
maintenance problem"Here's the problem, fix
it"the reaction to a Y2K problem is
"Where is the problem and what are the fixes?"
Testing,
validation, and fielding will consume the main share of the costs
associated with fixing the Y2K problem. For some, the solutions
may be influenced, complicated, or dictated by legislative or
regulatory mandates. Other regulations will simplify how a fix is
fielded.
How Much Will It Cost to Fix?
It is difficult to estimate the cost of
the Y2K problem. The Gartner Group estimated that costs for the
U.S. Department of Defense (DoD) alone could start at upward of
$30 billion, and worldwide costs could approach $600 billion. A
detailed analysis of specific DHR systems, by Insight Consulting,
Inc., looked at 11 million lines of DHR code from nine agencies.
The study found that about 3.0 percentor about 321,000
lines of codecontainted date manipulations. The study
estimated that corrective maintenance to these systems would cost
roughly $33 million. $28.6 million of which will cover the cost
of contract and co-op staff. [2]
Other industry reports suggest that
between 10 percent and 15 percent of IT systems code is typically
affected [3].
Most cost estimates do not reflect the
costs to upgrade commercial off-the-shelf (COTS) products, work
around inaccessible documentation for software or firmware, or
produce and field new systems. We must answer "what-if"
questions concerning budget vs. technical trade-offs, justify
future Y2K funding, and understand the business consequences of
today's decisions.
Y2K Solutions
The leap year, magic number, and date
counter overflow problems can be repaired with straightforward
corrections wherever they are found.
The same is not true for the incorrect
handling of the two-digit year. Many approaches are available and
have been evaluated to identify the one that best suits our
organization's priorities, resources, and schedule. Bridging
mechanisms will be a major part of any solution approach. Bridges
convert between old and new data formats and programs or between
external and internal formats. Bridging will also allow updated
components to be deployed separately, while others may become
long-term structural pieces of the systems.
There are at least seven major, viable
courses of action that can be combined as appropriate to deal
with Y2K problems:
Expand the year field to four digits.
Encode century information in a six-digit
space.
Use a 100-year logic window (static or
sliding).
Employ a data bridge that uses a logic
window.
Reverse the system clock and use a 28-year
time bridge.
Replace the system (even retire it).
Do nothing.
The Y2K
problem is extremely sensitive to the solution approach. The
solutions also will heavily influence the scope of the changes,
testing, and the degree of interorganizational coordination that
is required. Careful review of alternative solutions can reduce
the magnitude and scope of the effort and hence the cost and
risk. ...
This
solution requires all references to and uses of a two-digit year
format (YY) be converted to a four-digit year format (CCYY), and
all programs must be converted to use the new format. The major
risk in this approach is that a program may directly access the
internal structures of a date field; thus, changing the field
size without also changing the logic that accesses the internal
structures will corrupt the revised system.
This solution is complete, and it
ensures the applications will operate correctly for the next
2,600 years. However, it involves the modification of every
program and every database that references date data; the
adjustment of every positional reference to data fields; changes
to the format of every record that contains date data; and a
reformat and rewrite of every data file, including historic data
files.
This approach may also require a change
to the format of messages between systems. The changes will start
a chain of modifications that will require simultaneous testing
and rollout or bridging of the new systems across organizations.
- Encoding Century Information in a Six-Digit
Space
Century information can be encoded into
the same space currently occupied by the standard six-digit date
field (YYMMDD). This might be done to address disk performance
tuning and balancing issues or because of rigidly defined field
sizes in standard messages. There are at least five ways to do
this:
Encode full Julian dates (the number of
days elapsed since noon Jan. 1, 4713 B.C. [4]) in a
binary 48-bit field, allowing values for dates well
beyond anything we should care to record (YYMMDD
> BBBBBB, B = 8-bit binary character).
Use the two 8-bit YY fields to encode the
full year in a binary 16-bit field, allowing values for
the years 216-1=65,535 (YYMMDD
> BBMMDD, B = 8-bit binary character).
Encode a century field (0 = 18, 1 = 19, 2
= 20, and so on) and replace the MMDD with a
day-of-the-year field DDD (YYMMDD > CYYDDD).
Encode a century field as above and encode
the month as a character field (M*) with values 1-9, A,
B, C representing the 12 months (YYMMDD >
CYYM*DD).
Encode the entire six-digit date field
into an offset from Jan. 1, 1900, which would also be
good for the next 2,600 years (YYMMDD > DDDDDD).
- Windowing
With a logic-window approach, a system
determines the century or decade of a given year by comparing the
value in a two-digit year field against an application window.One version of the window technique, called window sliding,
allows the span of years an application processes to be
indefinitely extended by changing the window boundaries as the
current date changes.
The
logic-window technique does require some extra overhead and code
logicdates, sorts, collations, literal comparisons, and
computations must be correctly mapped to the two-digit date to
assure that computations are correctly performed. However, this
technique avoids most of the massive changes and
interorganizational coordination associated with the expansion
approach. As long as old data is dropped and the 100-year span of
dates is retained, applications that use a 100-year sliding
window will continue to unambiguously interpret date data and
calculations for a long time.
Applications that have some large date
spans may be corrected with a combination of the expansion and
sliding window approaches. Such a mixed approach requires that
the uses of dates be modeled more precisely to distinguish
between fields that should be expanded and those that should not.
If an application deals with data that
is more than 100 years apart (like birthdays), this approach
cannot be used. However, many systems do not have to deal with
time spans of more than 100 years. For them, this solution is
extremely attractive because it requires no data
changesonly programs are changed.
- Data Bridge Using a Logic Window
This solution is popular because it
allows for a fairly straightforward conversion of the
applications to a four-digit year, yet allows external interfaces
to continue in two-digit formats. This is done with simple data
bridges that map the internal and external date data through a
logic-window approach for all inputs and outputs.
- Reverse the
System Clock and Use a
28-Year Time Bridge
For systems whose source code is no
longer available, cannot be replaced, or whose purchased
components are not century ready, there is an option other than
replacement: the system clock can be reversed by 28 years and
bridging scripts can be written to add or subtract 28 years from
the date data. The number is 28 because this offset retains the
same days of the week and month. When 28 years are added to
Saturday, Jan. 1, 1972, it will generate Saturday, Jan. 1, 2000.
Note that this works only for periods that have a leap year every
four years (1901-2100), which means this approach is good only
until Feb. 28, 2100.
- Replace or
Retire the System
For some systems, the best option may be
to replace them with Y2K-safe commercial products that do most of
what you need. This is especially attractive when dealing with
smaller systems or small embedded systems where the cost of
fixing would exceed the cost of replacement.
- Do Nothing
In some cases, the best option may be to
do nothing. This is not the same as to ignore the problem.
Rather, it involves an analysis of the risks to determine that
the Y2K problem will either not cause a problem or cause such
minor problems that users can work around them and compensate.
For example, a system might sort something out of sequence for
one reporting period or produce only minor errors that have no
impact on functionality. In a world without resource constraints,
this kind of system would probably be fixed, but in the real
world it is advisable to reserve resources for systems that will
have more severe reactions to the date change.
Y2K Test
and Evaluation
It is important to specify in some
detail the goals and role of the Y2K test and evaluation effort.
In many cases, it may be next to impossible to prove that the Y2K
problem has been fixed. One reason is that many systems interface
with components that cannot be rolled forward to test. Satellite
networks and telephone systems are two examples. These interfaces
can be simulated and analyzed, but it may not be possible to test
them to the same level of confidence as other systems. The need
to work with users should not be overlooked. They need help to
understand the levels and types of risk the system has so they
will remain confident in the system's performance.
Systems that undergo Y2K maintenance
should be evaluated to determine their degree of Y2K compliance
and whether this compliance is sufficient. Any testing activity
should be aimed at gathering information to determine the degree
of Y2K compliance.
Stage the testing and evaluation of
purchased components early so that infrastructure errors do not
mask the failure of the original application. Address the
validity of licenses and passwords for the duration of the full
testing period. Likewise, ensure that the testing environment and
testing tools operate through the entire testing period. They,
too, are susceptible to the Y2K problem.
Y2K testing requires a layered approach.
Test applications in isolation, then test systems, then test a
system of systems to the extent possible. These increasingly
complex entities will have the same compliance requirements, but
the approach to gathering data will be different for each.
System time windows and the existence of
various date-related rollovers of date counters complicate proper
date handling. Each system functions in its own time window,
which surrounds the present date. Planning and scheduling systems
work with dates that are weeks, months, and sometimes years in
the future. Trend analysis systems and billing systems regularly
reference dates in the past. You will need to verify the ability
of your system and its window of time to successfully process
dates before and after the transition to 2000.
A Silver Bullet?
To successfully manage the Y2K effort,
we may apply a wide variety of tool technologies. Many Y2K
discussions, on the Internet and elsewhere, focus solely on tools
that can find, analyze, and possibly fix the problems. But a
larger set of toolsincluding testing, configuration
management, project planning, and cost estimating toolsare
just as important and possibly indispensable.
One caveat to the pursuit of tool
technologies for Y2K is to do it appropriately rather than
overwhelm and distract staff from their jobs. It is important
that the introduction of new tools be compatible with your
organization and that it not cause a paradigm shift that will
lower productivity and reduce the quality of the work.
Tools to Locate and
Fix Code
First we need to find the code. Keep in
mind that the inventory includes systems that are homegrown, no
longer maintained, undocumented software, undocumented firmware,
and COTS systems. Tools and techniques are needed to find these
components. Once found, scanning, tagging, and conversion tools
can be used to place both the source code and documents on line.
Analysis tools can be used to search for
possibly erroneous code, such as operating system search tools
and commercial Y2K-aware search tools for application code. One
of the most significant problems with using analysis tools to
find the date problems in firmware and software is the false
negatives and the false positives. False negatives are date
problems that the analysis tools miss. False positives result
when the tools spew out a plethora of potential problem areas
that the analyst must then wade through to find the real
problems. Both types of problems can be addressed with tools that
more fully understand the context and usage of the dates, such as
some of the reverse-engineering and parsing technologies that the
Air Force's Rome Laboratory has helped develop and introduce into
the commercial marketplace. These intelligent tools reduce the
search to actual problemsfind date data by mining data
dictionaries, message sets, file exchanges, variables, and so
forth. These rule-based artificial intelligence tools can follow
assignments and look for behaviors within the code, but they need
specialized parsers, a great deal of computing power and memory,
and some must first be programmed with business rules to
understand what to look for.
Although intelligent tools are getting
better, they are not silver bullets, and they will not solve the
problem in its entirety. To make automation viable, you need to
partner the tools with knowledgeable and skilled analysts. Given
good parser technology, tools can detect most date-related lines,
but there are still ambiguities that require humans to decipher:
Strong-type violations - (char(5) =
num(7)).
Inconsistent naming conventions.
Aliasing and pointers.
Manual
effort is, in most cases, still required to select which affected
lines require repair. A staff that knows the systems well can be
as effective with simple tools as the staff using more
sophisticated approaches.
- Tools for Testing
Estimates suggest that testing will
account for 45 to 60 percent of the Y2K effort. One way to
minimize the test effort is to use testing tools to develop
automated and repeatable test methods. Test scripts and scenarios
provide a reasonable, repeatable way to conduct effective stress
tests of various problem dates.
Y2K testing is unique in that it
requires testing at numerous points in time, and each test
requires supporting test data and test scenarios. Table 1 shows
sample dates to consider testingthe specific dates to test
will depend on the composition of the systems.
Date Event
1998-01-01 Flag Year 98
1999-01-01 Flag Year 99
1999-09-09 Magic Number 9/9/99
2000-01-01 Overflow 2-Digit Years
2000-1-10 First 9-Character Date
2000-10-10 First 10-Character Date
2000-02-29 Leap Year
2000-12-31 Day 366 of the Year
2001-01-01 21st Century
2100-01-01 Not a Leap Year
3000-01-01 Not a Leap Year
Table:
Some possible date-related failure dates.
It is
vital that systems developers be responsible for the correction
and fielding of a fixed system. Central testing and clearing
facilities encounter tremendous logistical and coordination
problems in dealing with multiple environments and languages. The
systems developers, who already possess development and test
capabilities, are in the most cost-effective position to validate
and verify the correct behavior of their own systems. What the
central organization should do is provide adequate and specific
criteria to assure that everyone is consistently finding and
fixing the potential problems.
Tools for Project
Management
Configuration management tools are
indispensable to handle the rapidly changing system components
and the performance of multiple regression tests. Given the
varied steps and numerous parties involved, along with the stress
of an immovable deadline, a good project planning and project
monitoring tool set will be extremely valuable. Likewise, as the
project moves forward from its initial rough cost estimate, cost
estimating tools and models will be tremendously important in
answering trade-off questions and keeping a handle on the actual
impact of the Y2K remediation effort.
Where to Start
- A Y2K Process
A typical Y2K effort has five major
phases. The process can be tracked with a scorecard or some
similar mechanism. Upper management should be apprised of the
process at each phase:
Awareness
Inventory, Assessment, and Planning
Renovation or remediation
Validation and integration testing
Implementation
For each
of the four types of systems, the steps within the phases will
differ, as will the relative amount of the effort taken by each
phase.
Awareness
Make personnel responsible for the
system aware of the Year 2000 problem and its significance. Each
organization must educate its employees about what the Y2K
problem is and how to deal with it. Many people have a hard time
believing that something as trivial as a lack of two digits in a
date can cause a serious problem.
Inventory, Assessment, and Planning
The assessment must determine if a year
2000 problem exists, estimate the cost of fixing it, and create a
plan for resolving it.
This phase involves eight steps:
Scope the problem and lay out a high-level
plan of activities. Centrally disseminate a common
definition of types of problems being addressed and the
goals of the Y2K effort that include exit criteria.
Identify the cost of Y2K failure, risks and mitigation
approaches, and involve upper management in triage
review. Project management tools and techniques are
indispensable.
Inventory all systems and system
components. Go beyond management information systems to
identify all systems at risk. Many organizations
initially focus on their production information systems
yet do not consider the support and maintenance
infrastructure and the other systems in the organization
that are sensitive to dates such as the telephone and
power systems, embedded systems, and heat and light
management systems. The inventory includes the technical
environment on which the system operates, the
communications devices that the system uses, and the
application software itself.
Make an estimate of cost. Use source lines
of code totals for developed systems multiplied by
industry standard cost factors to generate a quick,
ball-park estimate.
Refine the plan. Identify priorities and
use disaster recovery plans to identify critical systems.
Rewrite disaster recovery and contingency plans to
reflect the Y2K problem and include safety consequences
and secondary effects.
Conduct a detailed assessment of the
portfolio with a consistent definition of Y2K problems.
For developed systems, use available tools. For
purchased, interfacing, and other critical systems, use a
compliance questionnaire to interview the owners of the
systems. The questionnaire is focused on establishing of
the types of issues that are or are not addressed by the
other party. Once this is understood, the risk this poses
to the systems can be gauged and fixes planned.
Identify potential solutions and their
costs in dollars, schedule, and ripple effects. Select
solution approaches for each type and mix of systems,
balance the solution against constraints, and generate a
real cost estimate on the basis of the technical findings
of the assessment and the proposed solutions. Some
questions to ask are, "How many systems are affected
and need to be synchronously fixed, tested, and fielded?
What is the cost of changein terms of effort,
storage, licenses, bridges, testing, and deferred
capabilities? How quickly can the solution be
implemented? What is the span of time with which the
systems work? What is the cost of not changing in terms
of liabilities, malfunctions, and errors?"
Refine the plan and get upper management
approval of plan and its schedule and cost estimates.
Make a full system backup of everything.
Renovation / Remediation
When ready to start the implementation
of fixes, you must define new or revised procedures and the
accompanying training, brief upper management on changes and
revisions so that adjustments can be approved quickly, and use
modern techniques to identify needed design changes. Complete the
actual correction of the year 2000 problem in each system.
"Bridges" required to interface with systems/databases,
are developed or implemented at this time.
Validation and Integration Testing
Here, you test fixes with an exit
criteria checklist. To test fixes to Y2K problems will require
new and expanded procedures compared to traditional approaches.
On top of all this, as mentioned above, systems typically deal
with a window of timea range of dates both before and after
the present datethat requires testing beyond Jan. 1, 2000.
Year 2000 testing must include regression testing, integration
testing, and simulated year 2000 testing.
Field
This phase includes the training
necessary for the system's users. Although most organizations
will be able to use their traditional methods to field new
systems and upgrades for the changes needed for Y2K, there will
probably be several additional issues that impact traditional
approach. The need to ensure that all changes are implemented on
time against a firm, no-slip deadline will be a new concept for
many. And, for those systems requiring a change in their
interfaces, the coordination and simultaneous activation of the
new release with interfacing applications will be especially
challenging when more than one interface is changing.
Conclusion
The Y2K problem may cause some bizarre
and possibly unexplainable happenings. Peter de Jager tells of a
Scottish bank that was so confident of its systems that it
overruled its Y2K service provider's desire to continue testing.
The bank advanced the year and began to run reports. One looked
rather peculiar. After long study, one of the bank's elder
statesmen recognized it as a printout in the old fractional
pounds, shillings, and pence. Apparently, when Scotland adopted a
decimal system in 1971, the bank had used a date switch that
caused every report generated after 1970 to use the new decimal
format instead of the old fractional units. But when the time was
advanced to 2000, the part of the system that used the date
switch thought it was back in 1900, so it reverted to running the
old reports.