The HP 3000--For Complete Novices, Part 16: Accessing Files

The HP 3000--For Complete Novices, Part 16: Accessing Files

by George Stachnik

Operating systems such as MPE/iX are tools that people use to manage computer systems. The word manage can mean a lot of things. But if you boil the meaning of this word down to its essence (at least in the context of commercial computing), you'll find that the verb manage boils down to doing just two things: You need to tell the computer which programs to run, and you must tell it where to find or store the data that those programs will operate upon.
This month, we're going to explore some of the tools application programmers can use to access data stored in files--beginning with a brief review of formal file designators, and continuing with the MPE/iX Intrinsic interfaces.

An application, once running, typically reads data from one or more input files. It also writes new data into one or more output files. In earlier articles in this series, we saw that the linkages between programs and their input and output files are typically defined using special names called formal file designators. We'll begin this month's installment with a brief review of how these things work.

Formal File Designators

When a programmer designs an application program, he or she must decide how many files the program will access, and how. For each file, the program's source code will contain a name called a formal file designator. For example, suppose you were designing a very simple application program with one input file and one output file. You might decide to identify them using the formal file designators INFILE and OUTFILE. There's nothing special about these names; you can use any names you like. The only restriction MPE makes is that the formal file designators must follow MPE's file naming conventions. That is, they can be no more than eight alphanumeric characters long, and the first character must be alphabetic.

When you run an HP 3000 application program, it will try (by default at least) to read and write files that have names that match the formal file designators. For example, suppose you have a program that uses the formal file designators INFILE and OUTFILE. When you run such a program, MPE will attempt to read and write files in your logon group that bear those file names.

Input files are handled slightly differently from output files. For example, when an application program opens an input file named INFILE, MPE will try to find an existing file (either temporary or permanent) named INFILE. If such a file doesn't exist, then the open operation will fail with an error. If the program doesn't trap this error and handle it, the program will fail the first time it tries to read the file. Similarly, when an application program opens an output file named OUTFILE, MPE will attempt to open an existing file (either temporary or permanent) by that name. If no file named OUTFILE exists, MPE will create a new file by that name and use it.

In an earlier article in this series, we saw that formal file designators can be linked with file names other than the ones specified in the program itself. This is done using a form of the :FILE command called a file equation. For example, if you want your application program to read its input data from a file called INPUT01 instead of from INFILE, you simply issue the following command prior to running the program:

:FILE INFILE=INPUT01

File equations are very versatile tools. You can use them to redirect your output to other groups and accounts. For example, suppose you wanted to redirect the output to a file called DATA01 in the PUB group of the MYACCT account. Once again, referencing the formal file designator (OUTFILE), you might use a file equation like this one:

:FILE OUTFILE=DATA01.PUB.MYACCT

File equations also can be used to redirect the input or output of a program to special devices. By default, HP 3000 applications will read and write files on disk drives (;DEV=DISC). But suppose you wanted to redirect the records being written to the file designated OUTFILE to a tape drive instead. Once again, a file equation, issued prior to executing the program, will do the trick. This time the syntax is:

FILE OUTFILE;DEV=TAPE

As you can see, file equations can be used to make programs work with files on any device, using any valid file name. As far as our application programmer is concerned, all that he or she needs to be concerned with are the names INFILE and OUTFILE (or whatever formal file designators he or she chooses). People who run the program are free to associate those designators with whatever files they please, on whatever devices they please using file equations like those shown above.

The application programmer does not need to be aware of them at all. File names and device characteristics are hard-coded into each application program. But they represent only the program's defaults. They can be overridden, as we have shown, using file equations. This characteristic of MPE/iX is called "device independence," and it gives MPE/iX system managers a great deal of flexibility in how they manage their applications.

In spite of this flexibility, there are some things that programmers do need to be aware of and plan for. For example, application programs must contain the program logic that defines exactly how they will access their input and output files. One tool that is used to create this logic is a set of special MPE routines called intrinsics.

Intrinsics

Suppose we want our application program to access an input file that we'll designate as INFILE and an output file designated as OUTFILE. We must make sure our application program does the following three things:

The program must OPEN the files. For example, in COBOL, it might use COBOL's OPEN verb. In COBOL, each file can be opened either for READ access or for WRITE access. This is part of ANSII standard COBOL. HP COBOL is an implementation of ANSII standard COBOL, with some extensions.
After the files are OPENed, the application program will then READ or WRITE the files. In general, each READ operation copies the contents of one record from the input file into a buffer in the program itself. Similarly, each WRITE operation copies the contents of a buffer into a record in the output file. Programs written in HP COBOL typically achieve this using the ANSII standard READ and WRITE COBOL verbs. There are similar ANSII standard features for C, FORTRAN, and most other languages.
When the program has finished executing, one of the last things it should do before terminating is to CLOSE its files. Once again, this is typically achieved using ANSII standard program statements such as COBOL's CLOSE verb. If an HP 3000 application program terminates without closing its files (as would happen in the case of a program abort), MPE will close those files automatically.

Generally, the precise techniques that are used to OPEN, READ, WRITE, and CLOSE files are language dependent. That is, the instructions that you'd code in COBOL are different from those that you'd use in BASIC or FORTRAN or Java. Even within a language, there may be different implementations, depending on which ANSII standard the compiler complies with. On the HP 3000, there have been COBOL compilers that complied with the 1968 ANSII standard, the 1974 ANSII standard, and the 1980 ANSII standard for COBOL. In spite of this, there is a common denominator across all versions of all HP 3000 languages. That common denominator is made up of the MPE intrinsics.

MPE intrinsics are specialized routines that were designed to handle tasks such as opening, closing, and accessing files. The MPE concept of an "intrinsic interface" (or "intrinsic" for short) is quite similar to the UNIX concept of a "system call" or the MS Windows concept of an "entry point." Each intrinsic is a piece of operating system code that can be invoked (or "called") by an application program. Each intrinsic is associated with a specific task. There are MPE intrinsics for opening files, closing them, reading them, and writing them. There are also intrinsics that create processes, terminate them, and communicate between processes. There's an intrinsic to handle just about any system task you can think of.

Let's begin our exploration of the MPE intrinsics by taking a look at an intrinsic called FOPEN. FOPEN can be used to open a file (see the sidebar). There are at least two ways that FOPEN can be invoked from an application program:

First of all, an application program can call FOPEN explicitly. For example, a program written in COBOL can use COBOL's CALL verb and invoke FOPEN in much the same way you'd invoke a subroutine. Unlike a user-written subroutine, the intrinsics are not part of the application program. They are part of the operating system. FOPEN shows up in your compiler listing as an "unresolved external reference."
Alternately, FOPEN might be invoked implicitly. For example, suppose a COBOL program contains an ANSII standard COBOL OPEN verb. In that case, a call to the intrinsic will be generated by the compiler. You won't actually see it in the source listing, because you didn't code it. In this case, the intrinsic is being called implicitly. But once again, the reference to FOPEN will show up as an "unresolved external reference."

The idea of unresolved external references was covered in part 12 of this series, during our discussion of the linkage editor. Before an HP 3000 application program can be successfully executed, it must be processed by the linkage editor, which will identify any unresolved references to MPE's intrinsics that the program might contain. These references are not resolved until the run time, when the program is actually executed. At that time, the references made by the application program will be resolved using the system libraries, such as SL.PUB.SYS and NL.PUB.SYS.

Some Details About FOPEN

The FOPEN intrinsic was originally designed to open files on MPE/V systems. It is supported today on MPE/iX systems as well, although it is little more than a shell. On MPE/iX, FOPEN calls HPFOPEN, which actually does the work of opening the file.

The precise syntax of FOPEN is defined in the HP 3000 intrinsics manual. If you're new to intrinsics, learning about FOPEN is a good place to begin. It is in some ways typical of MPE intrinsics.

The first thing to understand about intrinsics is that when you call them, you typically pass them a list of parameters. Table 1 shows some of the parameters that can be passed to FOPEN. The first parameter passed to FOPEN is the formal designator. This is a byte array (MPE-speak for a character string) containing the formal file designator of the file to be opened. The formal designator points FOPEN to the file that you want to open.

The formal designator is generally followed by two 16-bit binary words. These words are referred to as the FOPTIONS array and the AOPTIONS array.

The FOPTIONS array is a string of 16-bits in which each bit has a special meaning. Table 2 describes some of the combination of bits that are typically used in the FOPTIONS word. The notation used in this table is a little cryptic, but it's worth understanding because it is used throughout the MPE documentation. The bit strings that appear in the first column of Table 2 are described using two numbers. The first number is the number of the starting bit. In a 16-bit word, the bits are numbered starting with 0 (0,1,2,....15). The second number is the number of bits in the string.

Let's look at a couple of examples. In a 16-bit word, the leftmost 4 bits would be designated using the expression 0:4. This expression literally means: 4 bits starting with bit 0. The next 8 bits would be designated as 4:8--which is to say 8 bits, beginning with bit 4. Keep in mind that bit 4 is actually the 5th bit--counting left to right-- because we start counting at 0. So the expression 4:8 references bits 5, 6, 7, 8, 9, 10, 11, and 12. Here's one more example. The string 15:1 refers to the rightmost bit in a 16-bit word.

For example, the first row of Table 2 is labelled 2:3. This refers to bits 2, 3, and 4 of the 16-bit word (counting left to right, starting with 0). The table shows that if these 3-bits are all 0, then the file to be opened is a standard file. But if these 3-bits are 001, the file to be opened is a CM KSAM file.

Take a look at the row of Table 2 that's labelled 14:2. The last 2-bits of the FOPTIONS word tell FOPEN whether it's going to open an existing file (01, 10, or 11), or create a new file (00). If you are opening an existing file, you don't need to set the bits that tell FOPEN things like what kind of file it is or what its record size is. For example, if you're opening an existing KSAM file, it will figure that out and handle it appropriately. But if you are opening a new file, (bits 14:2=00) then FOPEN will be creating the file for you. In that case, you'll have to pay attention to the other FOPTIONS bits, because they tell FOPEN what kind of file to create.

Returning to Table 1, the third parameter that is passed to FOPEN is another 16-bit word called the AOPTIONS word. Once again, this is a binary array in which each bit specifies something about how the file is to be accessed. Table 3 contains some of the values found in the AOPTIONS array.

Bits 12:4 determine whether the file is to be opened for READ access (0000) or WRITE access (0001). There are other combinations that are used to support direct access with FREADDIR and FWRITEDIR (0100) or FUPDATE (0101). Bits 8:2 determine whether and how the file can be shared among other processes on the system.

FOPEN and HPFOPEN

It's worth noting that the intrinsics have evolved over time as the HP 3000 has evolved. For example, there are two different MPE/iX intrinsics that open files: FOPEN and HPFOPEN.

The FOPEN intrinsic dates back to the original models of the HP 3000--the 16-bit so-called "classic" systems. When HP introduced the newer 32-bit PA-RISC systems, support for FOPEN continued as part of the strategy to maintain compatibility with the older models. FOPEN is a 16-bit compatibility mode routine. As such, it is typically used by 16-bit compatibility mode application programs that were ported from the classic environment.

HPFOPEN is a part of the PA-RISC version of MPE. It does not appear on the older classic systems. The PA-RISC version of MPE was originally called MPE XL, and later renamed MPE/iX. MPE/iX includes both intrinsics: HPFOPEN and FOPEN.

Both intrinsics fundamentally serve the same purpose: They open files. But the functionality provided by HPFOPEN is a superset of the functionality supported by the older FOPEN intrinsic. The compatibility mode FOPEN intrinsic is basically the same functionality that was available on the classic 16-bit models. To use many of the new features of the file system that have been implemented on MPE/iX, you must use the native mode HPFOPEN intrinsic.

We've seen that when you compile an ANSII standard COBOL program, the compiler will generate intrinsic calls for you. If you compile a program on an old 16-bit classic system, the compiler will only generate calls to 16-bit compatibility mode intrinsics such as FOPEN. On newer PA-RISC models of the HP 3000, the situation is more complex. For one thing, depending on the language you are using, you may have your choice of at least two different compilers.

MPE/iX supports compatibility mode compilers such as the COBOLII compiler. These compilers generate 16-bit machine code suitable for execution either on classic HP 3000s or on PA-RISC models. MPE/iX also supports native mode compilers such as COL85XL. These compilers generate 32-bit machine code suitable for execution only on the PA-RISC machines.

If you compile a program that opens files, the compatibility mode COBOL compiler will generate calls to FOPEN, but the native mode compiler will generate calls to HPFOPEN. Virtually all HP 3000 applications use intrinsics. Even if a program doesn't call an intrinsic explicitly, it's a pretty good bet that it will call a number of them implicitly. Even if your applications don't call intrinsics explicitly, it's a lot easier to troubleshoot applications if you have a working knowledge of the MPE/iX intrinsics.

Table 4 contains a summary of the most commonly used file system intrinsics on the HP 3000. We've seen how FOPEN and HPFOPEN are used to open files. Next we're going to explore some of the other intrinsics.

Files and Databases

When designing an application for the HP 3000, you must decide whether to store the application's data in files or in databases. These days, most commercial applications use databases to store critical user data. The advantages of databases are well known, and we'll be discussing them in future articles in this series, when we explore HP 3000 databases (particularly IMAGE/SQL) in detail. But for the present, we're going to focus on what can be done with ordinary files. In spite of the superior recoverability, security, and versatility offered by databases, ordinary files still have their place and are still used by many HP 3000 applications.

We've seen how FOPEN and HPFOPEN can be used to open files for access. The FCLOSE intrinsic is used to close files when an application has finished accessing them. There's only one FCLOSE intrinsic; it is used regardless of whether the files were opened with FOPEN or with HPFOPEN. The actual reading and writing of files is handled with intrinsics called FREAD, FWRITE, FREADDIR, and FWRITEDIR. Next we will see when each of these is used.

The most common way to access files on the HP 3000 is sequentially. To access a file sequentially, open the file for input access using either FOPEN or HPOPEN. Then READ the file, one record at a time, using repeated calls to FREAD. The first read operation retrieves the first record from the file. Subsequent read operations retrieve the second record, the third, the fourth, and so on until the end of file is reached. At that point, another call to FREAD will return an "end of file" condition. This is a signal to the application program that all the records in the file have been accessed and the file should now be closed.

Sequential access to an output file works in much the same way, but with one important difference. Opening a file for sequential output access effectively erases any data the file contains. After opening the file for sequential output access, a program's first call to FWRITE creates the first (and at that point, the only) record in the file. Subsequent calls to FWRITE will append additional records after the first one. When the file is closed, the file will contain the records that were placed there by the calls to FWRITE, in the order in which they were written.

Sequentially accessed files are widely used on the HP 3000. They are most often found in batch environments and in large sorts.

Flat Files: Direct Access

Ordinary MPE files also provide you with another useful capability: direct access. The intrinsics FREADDIR and FWRITEDIR can be used to access the records in a file directly, using a relative record number. The best way to explain the power of direct access is with an example.

Imagine a large table of 10,000 rows. Suppose that the whole table is stored in a file on the HP 3000 so that each row is represented by one record of the file. The file could be accessed sequentially as we've seen earlier. For batch applications, sequential access would be appropriate, because batch applications typically act on all the rows of the table. But what about online applications? Users of online applications usually need to select one or more rows from the table and then act on them. Suppose a user wants to access the 9,999th row of a table. Sequential access means that in order to access the 9,999th record in the table, you'd have to read the 9,998 entries that precede it. From a performance perspective alone, this is totally unacceptable.

But with direct access, the application program can simply specify the number of the row (record) that it's interested in. For direct-read access, the FREADDIR intrinsic will retrieve the contents of a specified record. Similarly, for direct-write access, the FWRITEDIR intrinsic will update the contents of the specified record (without affecting other records in the file).

Direct access provides a very fast means of accessing data directly, although there is one very important (and fairly obvious) limitation. Records must be accessed by their record number. In other words, if you want to access the 975th record in a file, you have to know that the one you want is the 975th one in the file. You cannot tell FREADDIR to find the record containing the name "John Smith." Direct access does not provide you with any kind of key beyond the record number. Keyed access is provided by using another kind of file called a KSAM file, or by using a database.

KSAM Files

KSAM is an acronym that stands for "Keyed Sequential Access Method." The original HP 3000 implementation of KSAM is similar in many respects to the keyed access methods found on UNIX systems and on older IBM mainframes (ISAM and VSAM). KSAM files can be accessed sequentially, just like ordinary files. But they can also be read or written using keyed access.

Keyed access allows an application to select a particular record from a file and read it directly. Unlike direct access, which required that the application program select the desired record by a relative record number, KSAM files allow you to use a key value. For example, instead of selecting the 9,999th record, you'd be able to select the "John Smith" record without having to know the number of the record that contains John's data.

KSAM was originally implemented on MPE/V systems. This implementation of KSAM is also supported on MPE/iX systems--where it is known as compatibility mode KSAM, or CM KSAM. A CM KSAM file is actually two files: a key file and a data file. The data file contains the data. The key file contains key values that can be used to access records in the data file. CM KSAM files are created using a utility program called KSAMUTIL. This utility program is also used to synchronize the key and data files, which can become corrupted by system aborts.

In the early 1990s, HP brought a native mode version of KSAM (called NM KSAM) to MPE/iX. The native mode version boasts better recoverability than the older CM version. Currently, both versions of KSAM are supported on MPE/iX.

We've seen three different ways to access files in this article: sequential, direct, and keyed. Next month we're going to move beyond files, and begin to explore HP 3000 databases.

George Stachnik works in technical training in HP's Network Server Division.