by George Stachnik
Operating systems such as MPE/iX are tools that people
use to manage computer systems. The word manage can mean a lot of
things. But if you boil the meaning of this word down to its essence (at
least in the context of commercial computing), you'll find that the verb
manage boils down to doing just two things: You need to tell the
computer which programs to run, and you must tell it where to find or store
the data that those programs will operate upon.
This month, we're going to explore some of the tools application programmers
can use to access data stored in files--beginning with a brief review of
formal file designators, and continuing with the MPE/iX Intrinsic interfaces.
An application, once running, typically reads data from one or more input
files. It also writes new data into one or more output files. In earlier
articles in this series, we saw that the linkages between programs and their
input and output files are typically defined using special names called
formal file designators. We'll begin this month's installment with
a brief review of how these things work.
Formal File Designators
When a programmer designs an application program, he or she must decide
how many files the program will access, and how. For each file, the program's
source code will contain a name called a formal file designator. For example,
suppose you were designing a very simple application program with one input
file and one output file. You might decide to identify them using the formal
file designators INFILE and OUTFILE. There's nothing special about these
names; you can use any names you like. The only restriction MPE makes is
that the formal file designators must follow MPE's file naming conventions.
That is, they can be no more than eight alphanumeric characters long, and
the first character must be alphabetic.
When you run an HP 3000 application program, it will try (by default
at least) to read and write files that have names that match the formal
file designators. For example, suppose you have a program that uses the
formal file designators INFILE and OUTFILE. When you run such a program,
MPE will attempt to read and write files in your logon group that bear those
file names.
Input files are handled slightly differently from output files. For example,
when an application program opens an input file named INFILE, MPE will try
to find an existing file (either temporary or permanent) named INFILE. If
such a file doesn't exist, then the open operation will fail with an error.
If the program doesn't trap this error and handle it, the program will fail
the first time it tries to read the file. Similarly, when an application
program opens an output file named OUTFILE, MPE will attempt to open an
existing file (either temporary or permanent) by that name. If no file named
OUTFILE exists, MPE will create a new file by that name and use it.
In an earlier article in this series, we saw that formal file designators
can be linked with file names other than the ones specified in the program
itself. This is done using a form of the :FILE command called a file equation.
For example, if you want your application program to read its input data
from a file called INPUT01 instead of from INFILE, you simply issue the
following command prior to running the program:
:FILE INFILE=INPUT01
File equations are very versatile tools. You can use them to redirect
your output to other groups and accounts. For example, suppose you wanted
to redirect the output to a file called DATA01 in the PUB group of the MYACCT
account. Once again, referencing the formal file designator (OUTFILE), you
might use a file equation like this one:
:FILE OUTFILE=DATA01.PUB.MYACCT
File equations also can be used to redirect the input or output of a
program to special devices. By default, HP 3000 applications will read and
write files on disk drives (;DEV=DISC). But suppose you wanted to redirect
the records being written to the file designated OUTFILE to a tape drive
instead. Once again, a file equation, issued prior to executing the program,
will do the trick. This time the syntax is:
FILE OUTFILE;DEV=TAPE
As you can see, file equations can be used to make programs work with
files on any device, using any valid file name. As far as our application
programmer is concerned, all that he or she needs to be concerned with are
the names INFILE and OUTFILE (or whatever formal file designators he or
she chooses). People who run the program are free to associate those designators
with whatever files they please, on whatever devices they please using file
equations like those shown above.
The application programmer does not need to be aware of them at all.
File names and device characteristics are hard-coded into each application
program. But they represent only the program's defaults. They can be overridden,
as we have shown, using file equations. This characteristic of MPE/iX is
called "device independence," and it gives MPE/iX system managers
a great deal of flexibility in how they manage their applications.
In spite of this flexibility, there are some things that programmers
do need to be aware of and plan for. For example, application programs must
contain the program logic that defines exactly how they will access their
input and output files. One tool that is used to create this logic is a
set of special MPE routines called intrinsics.
Intrinsics
Suppose we want our application program to access an input file that
we'll designate as INFILE and an output file designated as OUTFILE. We must
make sure our application program does the following three things:
- The program must OPEN the files. For example, in COBOL, it might use
COBOL's OPEN verb. In COBOL, each file can be opened either for READ access
or for WRITE access. This is part of ANSII standard COBOL. HP COBOL is
an implementation of ANSII standard COBOL, with some extensions.
- After the files are OPENed, the application program will then READ
or WRITE the files. In general, each READ operation copies the contents
of one record from the input file into a buffer in the program itself.
Similarly, each WRITE operation copies the contents of a buffer into a
record in the output file. Programs written in HP COBOL typically achieve
this using the ANSII standard READ and WRITE COBOL verbs. There are similar
ANSII standard features for C, FORTRAN, and most other languages.
- When the program has finished executing, one of the last things it
should do before terminating is to CLOSE its files. Once again, this is
typically achieved using ANSII standard program statements such as COBOL's
CLOSE verb. If an HP 3000 application program terminates without closing
its files (as would happen in the case of a program abort), MPE will close
those files automatically.
Generally, the precise techniques that are used to OPEN, READ, WRITE,
and CLOSE files are language dependent. That is, the instructions that you'd
code in COBOL are different from those that you'd use in BASIC or FORTRAN
or Java. Even within a language, there may be different implementations,
depending on which ANSII standard the compiler complies with. On the HP
3000, there have been COBOL compilers that complied with the 1968 ANSII
standard, the 1974 ANSII standard, and the 1980 ANSII standard for COBOL.
In spite of this, there is a common denominator across all versions of all
HP 3000 languages. That common denominator is made up of the MPE intrinsics.
MPE intrinsics are specialized routines that were designed to handle
tasks such as opening, closing, and accessing files. The MPE concept of
an "intrinsic interface" (or "intrinsic" for short)
is quite similar to the UNIX concept of a "system call" or the
MS Windows concept of an "entry point." Each intrinsic is a piece
of operating system code that can be invoked (or "called") by
an application program. Each intrinsic is associated with a specific task.
There are MPE intrinsics for opening files, closing them, reading them,
and writing them. There are also intrinsics that create processes, terminate
them, and communicate between processes. There's an intrinsic to handle
just about any system task you can think of.
Let's begin our exploration of the MPE intrinsics by taking a look at
an intrinsic called FOPEN. FOPEN can be used to open a file (see the sidebar).
There are at least two ways that FOPEN can be invoked from an application
program:
- First of all, an application program can call FOPEN explicitly.
For example, a program written in COBOL can use COBOL's CALL verb and invoke
FOPEN in much the same way you'd invoke a subroutine. Unlike a user-written
subroutine, the intrinsics are not part of the application program. They
are part of the operating system. FOPEN shows up in your compiler listing
as an "unresolved external reference."
- Alternately, FOPEN might be invoked implicitly. For example,
suppose a COBOL program contains an ANSII standard COBOL OPEN verb. In
that case, a call to the intrinsic will be generated by the compiler. You
won't actually see it in the source listing, because you didn't code it.
In this case, the intrinsic is being called implicitly. But once
again, the reference to FOPEN will show up as an "unresolved external
reference."
The idea of unresolved external references was covered in part 12 of
this series, during our discussion of the linkage editor. Before an HP 3000
application program can be successfully executed, it must be processed by
the linkage editor, which will identify any unresolved references to MPE's
intrinsics that the program might contain. These references are not resolved
until the run time, when the program is actually executed. At that time,
the references made by the application program will be resolved using the
system libraries, such as SL.PUB.SYS and NL.PUB.SYS.
Some Details About FOPEN
The FOPEN intrinsic was originally designed to open files on MPE/V systems.
It is supported today on MPE/iX systems as well, although it is little more
than a shell. On MPE/iX, FOPEN calls HPFOPEN, which actually does the work
of opening the file.
The precise syntax of FOPEN is defined in the HP 3000 intrinsics manual.
If you're new to intrinsics, learning about FOPEN is a good place to begin.
It is in some ways typical of MPE intrinsics.
The first thing to understand about intrinsics is that when you call
them, you typically pass them a list of parameters. Table
1 shows some of the parameters that can be passed to FOPEN. The
first parameter passed to FOPEN is the formal designator. This is a byte
array (MPE-speak for a character string) containing the formal file designator
of the file to be opened. The formal designator points FOPEN to the file
that you want to open.
The formal designator is generally followed by two 16-bit binary words.
These words are referred to as the FOPTIONS array and the AOPTIONS array.
The FOPTIONS array is a string of 16-bits in which each bit has a special
meaning. Table 2 describes some of the combination of bits that are
typically used in the FOPTIONS word. The notation used in this table is
a little cryptic, but it's worth understanding because it is used throughout
the MPE documentation. The bit strings that appear in the first column of
Table 2 are described using two numbers. The
first number is the number of the starting bit. In a 16-bit word, the bits
are numbered starting with 0 (0,1,2,....15). The second number is the number
of bits in the string.
Let's look at a couple of examples. In a 16-bit word, the leftmost 4
bits would be designated using the expression 0:4. This expression literally
means: 4 bits starting with bit 0. The next 8 bits would be designated as
4:8--which is to say 8 bits, beginning with bit 4. Keep in mind that bit
4 is actually the 5th bit--counting left to right-- because we start counting
at 0. So the expression 4:8 references bits 5, 6, 7, 8, 9, 10, 11, and 12.
Here's one more example. The string 15:1 refers to the rightmost bit in
a 16-bit word.
For example, the first row of Table 2 is labelled 2:3.
This refers to bits 2, 3, and 4 of the 16-bit word (counting left to right,
starting with 0). The table shows that if these 3-bits are all 0, then the
file to be opened is a standard file. But if these 3-bits are 001, the file
to be opened is a CM KSAM file.
Take a look at the row of Table 2 that's
labelled 14:2. The last 2-bits of the FOPTIONS word tell FOPEN whether
it's going to open an existing file (01, 10, or 11), or create a new file
(00). If you are opening an existing file, you don't need to set the bits
that tell FOPEN things like what kind of file it is or what its record size
is. For example, if you're opening an existing KSAM file, it will figure
that out and handle it appropriately. But if you are opening a new file,
(bits 14:2=00) then FOPEN will be creating the file for you. In that case,
you'll have to pay attention to the other FOPTIONS bits, because they tell
FOPEN what kind of file to create.
Returning to Table 1, the third parameter
that is passed to FOPEN is another 16-bit word called the AOPTIONS word.
Once again, this is a binary array in which each bit specifies something
about how the file is to be accessed. Table 3
contains some of the values found in the AOPTIONS array.
Bits 12:4 determine whether the file is to be opened for READ access
(0000) or WRITE access (0001). There are other combinations that are used
to support direct access with FREADDIR and FWRITEDIR (0100) or FUPDATE (0101).
Bits 8:2 determine whether and how the file can be shared among other processes
on the system.
FOPEN and HPFOPEN
It's worth noting that the intrinsics have evolved over time as the HP
3000 has evolved. For example, there are two different MPE/iX intrinsics
that open files: FOPEN and HPFOPEN.
The FOPEN intrinsic dates back to the original models of the HP 3000--the
16-bit so-called "classic" systems. When HP introduced the newer
32-bit PA-RISC systems, support for FOPEN continued as part of the strategy
to maintain compatibility with the older models. FOPEN is a 16-bit compatibility
mode routine. As such, it is typically used by 16-bit compatibility mode
application programs that were ported from the classic environment.
HPFOPEN is a part of the PA-RISC version of MPE. It does not appear on
the older classic systems. The PA-RISC version of MPE was originally called
MPE XL, and later renamed MPE/iX. MPE/iX includes both intrinsics: HPFOPEN
and FOPEN.
Both intrinsics fundamentally serve the same purpose: They open files.
But the functionality provided by HPFOPEN is a superset of the functionality
supported by the older FOPEN intrinsic. The compatibility mode FOPEN intrinsic
is basically the same functionality that was available on the classic 16-bit
models. To use many of the new features of the file system that have been
implemented on MPE/iX, you must use the native mode HPFOPEN intrinsic.
We've seen that when you compile an ANSII standard COBOL program, the
compiler will generate intrinsic calls for you. If you compile a program
on an old 16-bit classic system, the compiler will only generate calls to
16-bit compatibility mode intrinsics such as FOPEN. On newer PA-RISC models
of the HP 3000, the situation is more complex. For one thing, depending
on the language you are using, you may have your choice of at least two
different compilers.
MPE/iX supports compatibility mode compilers such as the COBOLII compiler.
These compilers generate 16-bit machine code suitable for execution either
on classic HP 3000s or on PA-RISC models. MPE/iX also supports native mode
compilers such as COL85XL. These compilers generate 32-bit machine code
suitable for execution only on the PA-RISC machines.
If you compile a program that opens files, the compatibility mode COBOL
compiler will generate calls to FOPEN, but the native mode compiler will
generate calls to HPFOPEN. Virtually all HP 3000 applications use intrinsics.
Even if a program doesn't call an intrinsic explicitly, it's a pretty good
bet that it will call a number of them implicitly. Even if your applications
don't call intrinsics explicitly, it's a lot easier to troubleshoot applications
if you have a working knowledge of the MPE/iX intrinsics.
Table 4 contains a summary of the most
commonly used file system intrinsics on the HP 3000. We've seen how FOPEN
and HPFOPEN are used to open files. Next we're going to explore some of
the other intrinsics.
Files and Databases
When designing an application for the HP 3000, you must decide whether
to store the application's data in files or in databases. These days, most
commercial applications use databases to store critical user data. The advantages
of databases are well known, and we'll be discussing them in future articles
in this series, when we explore HP 3000 databases (particularly IMAGE/SQL)
in detail. But for the present, we're going to focus on what can be done
with ordinary files. In spite of the superior recoverability, security,
and versatility offered by databases, ordinary files still have their place
and are still used by many HP 3000 applications.
We've seen how FOPEN and HPFOPEN can be used to open files for access.
The FCLOSE intrinsic is used to close files when an application has finished
accessing them. There's only one FCLOSE intrinsic; it is used regardless
of whether the files were opened with FOPEN or with HPFOPEN. The actual
reading and writing of files is handled with intrinsics called FREAD, FWRITE,
FREADDIR, and FWRITEDIR. Next we will see when each of these is used.
The most common way to access files on the HP 3000 is sequentially. To
access a file sequentially, open the file for input access using either
FOPEN or HPOPEN. Then READ the file, one record at a time, using repeated
calls to FREAD. The first read operation retrieves the first record from
the file. Subsequent read operations retrieve the second record, the third,
the fourth, and so on until the end of file is reached. At that point, another
call to FREAD will return an "end of file" condition. This is
a signal to the application program that all the records in the file have
been accessed and the file should now be closed.
Sequential access to an output file works in much the same way, but with
one important difference. Opening a file for sequential output access effectively
erases any data the file contains. After opening the file for sequential
output access, a program's first call to FWRITE creates the first (and at
that point, the only) record in the file. Subsequent calls to FWRITE will
append additional records after the first one. When the file is closed,
the file will contain the records that were placed there by the calls to
FWRITE, in the order in which they were written.
Sequentially accessed files are widely used on the HP 3000. They are
most often found in batch environments and in large sorts.
Flat Files: Direct Access
Ordinary MPE files also provide you with another useful capability: direct
access. The intrinsics FREADDIR and FWRITEDIR can be used to access the
records in a file directly, using a relative record number. The best way
to explain the power of direct access is with an example.
Imagine a large table of 10,000 rows. Suppose that the whole table is
stored in a file on the HP 3000 so that each row is represented by one record
of the file. The file could be accessed sequentially as we've seen earlier.
For batch applications, sequential access would be appropriate, because
batch applications typically act on all the rows of the table. But what
about online applications? Users of online applications usually need to
select one or more rows from the table and then act on them. Suppose a user
wants to access the 9,999th row of a table. Sequential access means that
in order to access the 9,999th record in the table, you'd have to read the
9,998 entries that precede it. From a performance perspective alone, this
is totally unacceptable.
But with direct access, the application program can simply specify the
number of the row (record) that it's interested in. For direct-read access,
the FREADDIR intrinsic will retrieve the contents of a specified record.
Similarly, for direct-write access, the FWRITEDIR intrinsic will update
the contents of the specified record (without affecting other records in
the file).
Direct access provides a very fast means of accessing data directly,
although there is one very important (and fairly obvious) limitation. Records
must be accessed by their record number. In other words, if you want to
access the 975th record in a file, you have to know that the one you want
is the 975th one in the file. You cannot tell FREADDIR to find the record
containing the name "John Smith." Direct access does not provide
you with any kind of key beyond the record number. Keyed access is provided
by using another kind of file called a KSAM file, or by using a database.
KSAM Files
KSAM is an acronym that stands for "Keyed Sequential Access Method."
The original HP 3000 implementation of KSAM is similar in many respects
to the keyed access methods found on UNIX systems and on older IBM mainframes
(ISAM and VSAM). KSAM files can be accessed sequentially, just like ordinary
files. But they can also be read or written using keyed access.
Keyed access allows an application to select a particular record from
a file and read it directly. Unlike direct access, which required that the
application program select the desired record by a relative record number,
KSAM files allow you to use a key value. For example, instead of selecting
the 9,999th record, you'd be able to select the "John Smith" record
without having to know the number of the record that contains John's data.
KSAM was originally implemented on MPE/V systems. This implementation
of KSAM is also supported on MPE/iX systems--where it is known as compatibility
mode KSAM, or CM KSAM. A CM KSAM file is actually two files: a key file
and a data file. The data file contains the data. The key file contains
key values that can be used to access records in the data file. CM KSAM
files are created using a utility program called KSAMUTIL. This utility
program is also used to synchronize the key and data files, which can become
corrupted by system aborts.
In the early 1990s, HP brought a native mode version of KSAM (called
NM KSAM) to MPE/iX. The native mode version boasts better recoverability
than the older CM version. Currently, both versions of KSAM are supported
on MPE/iX.
We've seen three different ways to access files in this article: sequential,
direct, and keyed. Next month we're going to move beyond files, and begin
to explore HP 3000 databases.
George Stachnik works in technical training in HP's
Network Server Division. |