Chapter 9 - Strings and Arrays

By definition an array is simply a list of homogeneous data items. What does this mean?

A character string is a special kind of array, i.e. an array of characters.

In general, an array of size N looks like the following:

P#1: Duplicate the following in assembly language:

     int arry[25];

Indexing into arrays

A subscript is an offset that is added to the base address of an array. The EBX, EDI, ESI, and EBP registers are used to index into an array. Most of the time the EBP register is used as an auxiliary stack pointer.

P#2: Consider the following data segment definition. Write the assembly code to place the value 6 in the 3rd element of the table.

.data
  table SBYTE 5 dup(0)

P#3: Simulate the following C code in Assembly

int i,arry[5];

for(i = 0;i <= 4;i++)
  arry[i]=i;

P#4: Write an assembly language subprocedure Large that returns the largest value in the array nums (a signed array). The number of elements in the array can be found in the first element of the array. Assume the array was defined as:

nums SWORD 25 dup(?)

String Instructions

String instructions are really simplified array instructions that operate on bytes and words. We must familiarize ourselves with the following conventions:

(1) Source elements are DS:SI
(2) Destination elements are ES:DI
(3) At the end of the string instruction, the DI and SI are automatically updated by 1 (byte) or 2 (word).
(4) The DF flag is used to determine increment or decrement. If DF=0 then increment; otherwise, decrement.

(5) In Protected Mode, ESI is an offset from DS and EDI is an offset from ES and ES & DS are set to point to the same segment. In Real Mode, ES and DS do not necessarily point to the same location.

(i62) [<label>] CLD

(i63) [<label>] STD

The MOVSB instruction causes the microprocessor to replace the byte at ES:DI by the byte at DS:SI and update DI and SI to point to the next byte in the strings pointed to by DI and SI. The general form is:

(i64) [<label>] MOVSB

A similar instruction exists for moving words as follows:

(i65) [<label>] MOVSW

The two load instructions cause AL and AX-registers to be loaded with a copy of the byte and word addressed by the DS:SI. Notice that these two instructions work with the SI, i.e. we are loading from a source. Also, the SI register is automatically updated as with all string instructions so I will quit mentioning this from now on.

(i66) [<label>] LODSB ; al <- ds:si

(i67) [<label>] LODSW ; ax <- ds:si

The two store instructions cause AL and AX-registers to be stored in the byte and word addressed by the ES:DI. Notice that these two instructions work with the DI, i.e. we are storing to a destination.

(i68) [<label>] STOSB ; al -> es:di

(i69) [<label>] STOSW ; ax -> es:di

Compare and scan string instructions are as follows:

(i70) [<label>] CMPSB ; ds:si - es:di

(i71) [<label>] CMPSW ; ds:si - es:di

(i72) [<label>] SCASB ; al - es:di

(i73) [<label>] SCASW ; ax - es:di

If you are reading the book you will notice that there are generic forms of these string instructions and I don't even want to discuss this. If you would like to read up on these and use them at any point, feel free. You will not be tested on these instructions.

P#5: Assume two strings st1 and st2 are defined as follows:

     st1  BYTE 25 dup(?)
     st2  BYTE 25 dup(?)

Write a program segment that will place a copy of st1 into st2. The size of each string in the first element of the array.

We can use repeat instructions to aid in the counting process. The repeat prefixes for string instructions are as follows:

(1) REP
(2) REPE/REPZ
(3) REPNE/REPNZ

(i74) [<label>] REP <string-op> where <string-op> is the opcode of one of the string instructions just discussed. In all cases, the CX-register is used as a counter and is decremented after execution of the string instruction.

The pseudo-code for the REP is as follows:

repeat
  Execute the string instruction & update di/si as before
  Decrement CX by 1
until (CX==0)

P#6: Repeat P#1 using the REP instruction.

(i75) [<label>] REPE <string-op>
[<label>] REPZ <string-op>

Translates into the following pseudo-code:

repeat
  Execute the string instruction & update di/si as before
  Decrement CX by 1
until (CX==0) || (ZF==0)

Again, this can be thought of as repeat while equal.

(i76) [<label>] REPNE <string-op>
[<label>] REPNZ <string-op>

Translates into the following pseudo-code:

repeat
  Execute the string instruction & update di/si as before
  Decrement CX by 1
until (CX==0) || (ZF==1)

Again, this can be thought of as repeat while not equal.

P#7: Declare a word array of size 100. Accept a search value from the keyboard and return its' index value. If the value is not found, then return -1. Return the value in the AX-register.

More on Addressing Modes

Until now we have used the following addressing modes:

(1) Register
(2) Direct
(3) Immediate

Let's introduce a few more:

(4) Register Indirect Addressing - the effective address is the 16-bit contents of the BX, DI, or SI register.

Consider the following:

name BYTE 8,'JOHN DOE'

Internally, this looks like the following:

Offset  MEM
0       08
1       4A
2       4F
3       48
4       4E
5       20
6       44 
7       4F
8       45

P#8: Show two different ways of moving the size of the array into the CL-registe r.

Solution1               Solution2
mov cl,name             lea di,name
			mov cl,[di]

Q#1: What addressing modes are being used in Solutions 1 & 2?

Note: When the BX, DI, or SI registers are used as an indirect address, the DS contents are assumed to specify the segment portion of the address unless a segment override is given as follows:

mov cx,es:[di]

(5) Indexed Addressing - the effective address of the memory operand is the sum of the displacement and the 16-bit contents of an index register (DI or SI).

Example:

mov di,1
mov al,name[di]

Note1: The offset of name within the data segment is the displacement in the mac hine language representation.

Note2: Effective Address (EA) = sum of displacement + DI

(6) Base addressing implies: EA = sum of displacement + (BP or BX) register.

Consider the following: Argument to a subprocedure are passed via the stack as follows:

push <arg1>
push <arg2>
call subproc

The stack is as follows:


Return Offset   <- SS:Sp
Return Segment
<arg2>
<arg1>

If the first two instructions of the subprocedure were:

push bp
mov bp,sp

The stack would look like:

ss:bp ->  bp        <- ss:sp
          Offset
          Segment
          <arg2>
          <arg1>

Note: The ss:bp register pair is not modified by further push and pop operations . The bp simply provided a reference point in the stack to access certain values .

P#9: Move a copy of <arg2> into the DX-register.

Douglas J. Ryan/ryandj@pacificu.edu