Category: IBM Mainframe Assembler

Videos and word documents that cover how to program in IBM mainframe assembler language.

Small is BeautifulSmall is Beautiful

The first computer I owned was an Apple II Plus.  It had a 6502 chip with an accumulator register and a couple of index registers.  From a programmer’s point of view, most of the work got done in the accumulator and half the machine cycles were spent doing loads and stores (load a value, add, store, …).  Essentially, it was a one register machine.  So when I started programming IBM mainframes, I was delighted to see that each machine came equipped with 16 general purpose registers.  Why so many?  Who needed so many registers?

As is often the case, I was wrong:

Register 0 – used occasionally to pass a parm

Register 1 – used for parameter passing

Register 2 – Available, but sometimes corrupted by TRT

Register 12 (And perhaps a few others) – Your typical base register

Register 13 – Points at a save area

Register 14 – Points to the return address

Register 15 – The target address for program calls and the return code register

So … what does that leave us?  Registers 2 through 11, or 3 through ?

Dive into many production programs and you’ll find that all the registers were allocated years ago.  Good luck finding a free register.  Many programs are so code-bloated that simply adding another variable will cause thousands of addressability errors.  What’s a programmer to do?

There are two strategies:

1)  Store the values in one or more registers.  Do the work you need to do.  Restore the registers to their previous values.  Ugly, but effective.

2)  Start a new program, pass it the data it needs, return the values you have computed.  The nice thing about this approach is that you start from a clean slate.  All the registers are yours to use as you see fit.

So … do we have enough registers or not?  It would be nice to have more, but I think 16 general purpose registers are sufficient.  Sometimes, constraints are a good thing – think Haiku.  In writing assembler programs, we need to think smaller and more modular.  Many maintenance assembler programmers use strategy 1 because it’s easier and they can turn the work out faster.  But in many cases, particularly for programs that have maxed out the storage covered by existing base registers, the second strategy may be the best choice. 

If you are writing something new, one base register should be sufficient.  If you find you need a second base register, it’s probably time to re-factor.  Small is beautiful

Location, Location, LocationLocation, Location, Location

  “Location, Location, Location”.  That old real estate mantra has a significant application to certain assembler programs.  If you’re writing programs that are I/O-intensive – programs that read hundreds of thousands of records, particularly those with large record sizes, then this old chestnut can help you cut your running times dramatically.  If you are fortunate, you may slash your running time in half, simply by making some small changes in your code.

   For years I never gave much thought to the difference between Move Mode I/O and Locate Mode I/O.  (If you’re not familiar with these two techniques you can check out an article that explains it here.)  Oh, I knew how to code both, but Move Mode was easier to teach, and delivered the goods right where you wanted them.  I knew that Move Mode records were being moved from a system buffer to my own storage, but I figured:  What’s an extra move or two among friends?  But I was wrong.  Working with companies that routinely process millions of records taught me otherwise.

   In Locate Mode, you can leave an input record in a system buffer and work with it directly.  On output, you locate an empty buffer and build your record there.  This simple technique works wonders if you are processing hundreds of thousands of large records.  The cool part is that it doesn’t take much refactoring of your code to switch over.  So if you are faced with processing hundreds of thousands of records, just remember:  Location, Location, Location.

Small ConsolationSmall Consolation

You can write quite a bit of source code in assembler without generating much object code. I’m always surprised at how small the object modules are after all my hard work. While assembler programmers work at a very fine-grained level, with most of our thoughts broken down into many tiny pieces, the instructions we use are quite powerful. A good chunk of each program consists of data areas – no real code there. And then each hard-won instruction only generates two, four or six bytes. The small object modules we create are even more surprising when you realize that much of the code is produced by IBM macros like OPEN, CLOSE, GET and PUT.

Try this experiment: Add the directive PRINT ON,NODATA,GEN at the top of your CSECT. Re-assemble your module and look at the program listing. If your code is typical, you’ll see lots of new statements that your didn’t see before. These lines are prefixed with a “+” sign and make up the code of your macro statements. It’s likely that a healthy percentage of lines in your program were macro-generated. If you coded very many macros, your own work may seem to disappear into the background.

So … write away without hesitation, knowing there’s plenty of room for all your ideas.

Code ArcheologyCode Archeology

It’s easy to spot the thinking of long-forgotten legacy assembler programmers in the production code that runs on today’s systems.  Working under the constraints of limited memory, these programmers developed techniques for making their code as small and as efficient as possible.  While we’re not as memory constrained as these early programmers, you’ll still find their old techniques being practiced in today’s shops.  When I code these old tidbits, I like to think of it as “tipping my cap” to programmers who came before me.  Here are some examples:

Decrementing a register:

BCTR   R5,R0

This is a simple way to subtract 1 from register five.  Coding 0 for the second operand causes the flow of control to continue with the next instruction.  We could just as easily have coded this:

S      R5,=F’1’

So, why was the first technique originally preferred?  BCTR is a register-to-register (RR) instruction and will run faster than the register-to-indexed storage (RX) version.  Also the second version requires the assembler to build a fullword in memory.  Today, we might not give a second thought to defining a field, but in 1964, every byte was precious.

Moving blanks to a field:

This technique has existed since the Ancient of Days.

Assume that field X is defined as below:

X     DS    CL80

The following code will move blanks to X :

MVI      X,C’ ’

MVC      X+1(L’X–1),X

The MVI fills the first byte with a blank, and the MVC propagates blanks, one byte at a time, from left to right in field X.  The length attribute L’X gets the assembler to plug in the correct length.  Today we might code this without thinking:

BLANKS   DC   CL80’ ’

MVC    X,BLANKS

This technique requires an 80 byte field that the first method does not.   But, what’s a few bytes among friends?

Loading a “small” number into a register:

This code will put 100 in register five:

LA     R5,100

What’s going on here?  The assembler is looking for a base/displacement and perhaps an index register for creating the address that will be loaded into the register.  Everything is explicitly coded and there are no parentheses.  Explicit addresses look like this – D2(X2,B2) – that means the only thing the 100 can represent is a displacement.  The base and index registers are assumed to be zero, and therefore don’t contribute to the effective address.  If you use this technique, you are limited to a maximum value of 4095 – the maximum displacement that will fit in three hex digits.  So why not just code this instead?

L        R5,=F’100’

Well … that works fine, but again, our “modern” technique requires the assembler to build a fullword field that isn’t needed with the first method.

Do you have a favorite legacy technique I’ve missed?  Let me hear from you.

Bit by BitBit by Bit

Most programmers rarely give a second thought to the bit patterns their compilers generate.  After all, they don’t need to.  Compilers have freed them from the low-level details of ones and zeroes.  If you are a Java programmer, the JVM has converted your box to a virtual Java machine.  It’s a convenient and effective abstraction that allows you to develop wonderful things.

Still … there is a fundamental appeal to writing assembly language, clearing away the virtual cloud, and working with bits and bytes directly, and without a net.  At the assembly level, it’s crucial to pay attention to all the ones and zeroes.  In fact, paying attention to ones and zeroes is the way forward … the way to get a grip on the hundreds of instructions that are available to you as an assembler programmer.

Assembler instructions fall naturally into groups based on the binary pattern of each instruction.  This pattern is called an “Instruction format”.  For example the MVC (Move Character) instruction belongs to a group of instructions called “Storage to Storage”.  Specifically, MVC is Storage to Storage, type one (SS1).  Here’s the format, bit by bit:

Bit 0 – 7                 |  The operation code

Bits 8 – 15           |  The length associated with operand 1

Bits 16 – 19         | The base register for operand 1

Bits 20 – 31        | The displacement for operand 1

Bits 32 – 35       | The base register for operand 2

Bits 36 – 47       | The displacement for operand 2

Suppose you had coded MVC   X,Y  and  you saw the assembled instruction in memory presented in a hex format:   D2 03 C0 04 C0 08 .  The D2 is the operation code for MVC.  The 03 represents the length (really the length – 1) associated with operand 1. C004 is the base/displacement address of X, and C008 is the base/displacement address of Y.  This is a fairly boring array of information, yes?  Perhaps so, until you realize that this is the only information presented to the CPU when it’s time to execute the instruction.

So what information does the CPU know about your instruction at execution time?

  • The operation – MVC
  • The number of bytes associated with X.  (This length may or may not match the actual length of X!)
  • The beginning address of X
  • The beginning address of Y

And what exactly doesn’t the CPU understand about y your instruction?

  • The ending address of X
  • The ending address of Y
  • The type of information stored in X or Y
  • Whether X and Y will overlap if the operation is repeated for the specified length

What’s really interesting and helpful is that lots of instructions fall into the SS1 instruction format.  The facts you learned above apply to all the instructions in this group.  Knowing an instruction is SS1 is half the battle to learning how the instruction works.

So here’s my plan.  Early on, learn the following instruction formats:  SS1, SS2, SI, RS, RX, and RR.  There are other types, but these six types will carry you a long way into this journey.

I’ve developed a software product called VisibleZ that will also help you on this journey.  VisisbleZ was written in Java, and is an object code emulator for IBM mainframes that will help you visualize instructions as you single step through object code programs.  It will also help you learn each instruction type.  You can download the product from the product homepage:   http://csc.columbusstate.edu/woolbright/visiblez.xml .

You will also find a series of lessons that will help you get started.  Start with the lesson called “Reading Objectcode” and you’ll soon be an expert on the six instruction formats mentioned above.

The Way ForwardThe Way Forward

   You’ve taken the plunge.  You’re familiar with the organization of memory, the PSW contents, the purpose of Registers, and the fetch/decode/execute cycle.  You understand the idea behind base/displacement addresses as a way to find an item in memory.  Now What?  What is the best way forward?

   The best strategy is to tackle instructions in groups, organized by data type, and the simplest group of instructions contains those that work on character data.   It’s harder to cause pesky abends when you are just moving character data from here to there, or comparing a couple of fields and branching based on the condition code.  As instructions go, character data instructions are also pretty straightforward – a great place to start.  So here’s the plan:  

1)  Character Data – Learn to define character fields and data here.

2)  Character Instructions – Read about MVC (Move Characters), CLC ( Compare Logical Characters), MVI (Move Immediate) and BC (Branch on Condition).

3)  Jumping in the Water – Listen to the beginning assembler videos that will show you how to build and submit a simple program:

4)  Packed Decimal Data and Instructions – When you are comfortable working with character data, tackle packed decimal data and the instructions that support it (PACK, UNPK, ED, ZAP, AP, SP, DP, MP, SRP, CP).

5)  Binary Data and Instructions – Move on to Binary data and learn to program instructions that work in the registers.  Start with L, LA, ST, A, S, M, D, C, AR, SR, MR, DR, LR, CR, SRDA.) 

6)  Name and Conquer –   Dummy sections (DSECTs) provide an assembler programmer the ability to use symbolic names to address any storage area.  This turns out to be a powerful idea – name and conquer!  Read about DSECTs and how they are commonly used.

7)  Divide and Conquer – For large projects, programmers divide the problem at hand into multiple programs that are easier to code and understand.   In order to make our programs work together, it’s important to learn the established linkage conventions for calling other programs, and for passing data back and forth.

8)  Go Exotic –  There is a collection of instructions that are invaluable when needed, but aren’t as commonly coded as the instructions mentioned above.  These instructions include TR, TRT, TM, MVCL, CLCL, and EX.  Take them one at a time and find out how each one is valuable.

9)  Keep It Up – There’s no substitute for writing code.  In a future post, I’ll highlight a series of assignments that will make you code all the instructions described above.  By the time you’ve coded the last program, you’ll be able to call yourself an assembler programmer. 

Starting AssemblerStarting Assembler

Here’s a daunting thought for anyone starting to learn IBM assembly language:  The latest System/z machine executes over 500 machine instructions.  Where on earth should you start if your goal is to become proficient?   As in many subjects that are complicated (think mathematics, physics, chemistry), taking a historical approach to their study is often helpful, and while there is no royal road to assembly language, going back to the roots of any subject can be illuminating.  

  The roots of IBM System/z assembly language started with the System 360 in 1964.  This historic machine was equipped with an instruction set that was much smaller than current models.  Ignoring privileged instructions (those that require special authority), and floating point operations altogether (business arithmetic occurs in packed decimal), the instruction set that is left over numbers around 100.  Another happy fact of learning assembly language is that by mastering a single instruction, you will learn several others by association.  We are left with a manageable instruction set of fifty or sixty instructions that make up a working subset – a subset that can make you marketable. 

   So where do we begin?  Here’s a plan that will ease you into assembly language:

1)  The Lay of the Land – Start with a general orientation to the machine and the components you will interact with as a programmer here.

2)  Location, Location, Location  – Every variable you create in assembly language is converted to a Base/Displacement address.  In a high-level language, we don’t usually worry about the exact location of variables, but in assembly language it’s critical.  Read about base/displacement addressing here and give it your full attention.  Alternatively, you can listen to the beginning assembler video tutorials on my website.

3)  Ground Zero Every displacement is measured from a specific point in memory called the “base address”.  This address is loaded into a register – the “base register”.  The combination of the base register and the displacement is used to identify where a variable lives in memory.  The instruction which establishes the base address and loads the base register is called BASR (BALR is an older similar instruction).  Look up IBM’s manual called Principles of Operation on the web and read about BASR.  While you are there, check out  MVC, CLC, and BC.

4)  See It Happen – Download the VisibleZ software from this site.  VizibleZ is a java-based object code interpreter that is designed to help you visualize assembly programs and the effects they have on system components.  After reading the article on instruction formats here, load the mvc.obj program that is distributed in the Codes directory.  Single step through each instruction.  Pay close attention to the base displacement addresses in each instruction.   Do they make sense?  The color coding in the product should help you figure it all out.

  At this point, you’re on your way.  It’s a long journey, but an interesting one, particularly if you like to program, you would rather do things yourself, in your own way, and you like laying code down directly on top of the metal.    

 

The Difference Between Load and Load AddressThe Difference Between Load and Load Address

Among other things, you need a sense of humor to be an assembler programmer, as the language presents you an unlimited number of opportunities for shooting yourself in the foot.  (Programmers think a lot about shooting themselves in the foot because it happens so often.)  Here are three ways to do just that in languages I’ve been using lately:

Assembly
You try to shoot yourself in the foot only to discover you must first reinvent the gun, the bullet, and your foot.  After that’s done, you pull the trigger, the gun beeps several times, then crashes.

370 JCL
You send your foot down to MIS with a 4000-page document explaining how you want it to be shot.  Three years later, your foot comes back deep-fried.

Java
After importing java.awt.right.foot.* and java.awt.gun.right.hand.*, and writing the classes and methods of those classes needed, you’ve forgotten what the hell you’re doing.

It seems that shooting our feet (It’s never just the one foot, is it?) is inevitable, and it’s easy enough to do with L or LA.  So, what exactly is the difference between Load (L) and Load Address (LA)?  Both instructions compute the address of operand 2.  LA simply puts the address in the operand 1 register.  But L, retrieves a fullword from memory – the one designated by the computed address – and puts that fullword in operand 1. But you probably already knew that.

There is another answer to this question that occurred to me in graduate school some thirty years ago.  I was struggling to fix the code for a multi-user mainframe operating system I was building.  It came to me in the form of a joke:

Question: What is the difference between Load and Load Address?
Answer: About a week’s work of debugging.

So now, whenever I code L or LA, I pause and give it an extra thought.  Over the years it’s a habit that has saved me many weeks of debugging.  It can help you, too.  In the heat of the battle, it’s tempting to just let the code fly and test later… after all, it’s all good, eh Joe?  But take my advice. Whenever you code, L or LA, take a sip of your favorite beverage, relax, and give it one extra thought.  Is it L or LA?  The answer is more important than you think.

WelcomeWelcome

Welcome to my site! I’ve been coding IBM mainframe assembly language since the late 1970’s. Over the years I’ve taught thousands of programmers, both in a university setting and as a corporate trainer for Fortune 500 companies, how to write assembler programs. Perhaps I can help you, too.
What you will find here:

  • A wealth of written material covering System/z assembler concepts
  • A freeware object code interpreter called VizibleZ that I developed as a tool for teaching assembler language
  • A video course (under development) that will teach you how to code in assembler
  • A weekly blog about all things Assembler

Dr. David E. Woolbright