Longest x86 Instruction

Intel x86 has been the world’s most popular CISC instruction set for generations.  The instruction set is huge!  What might take several instructions in the MIPS or PowerPC instruction sets can be done in a single instruction with x86.  There’s almost as many unique x86 instructions as there are apps for your iPad.  Okay, that may be exaggerating a bit, but you get the point.

Questions is, what’s the longest possible instruction in the x86 instruction set?  Answer: you can form a valid x86 instruction with an infinite number of bytes!  That’s right, you could fill up an entire 64K ROM image with a single valid instruction.  To be more specific, there is no limit to the length of x86 instructions.  Cool!  Unfortunately, modern day i386 variants throw a general protection fault when attempting to decode instructions longer than 15 bytes.

So what does an infinitely-long-but-valid x86 instruction look like?  Kinda boring, actually.  You could only form an infinitely long instruction by using redundant prefixes in front on the opcodes.  Instruction prefixes are bytes pre-pended to the beginning of an instruction that can modify the default address size, data size, or segment registers used by an instruction.

For example, you can take the innocuous looking instruction:

89 E5         mov %sp,%bp

And turn it into a really long instruction:

66 66 66 66 … 66 66 89 E5       mov %sp,%bp

Now that’s just evil.

So what is the longest valid 8086 instruction that isn’t boring and evil? One such example that’s 15 bytes long is below.

lock add DWORD PTR ds:[esi+ecx*4+0x12345678],0xefcdab89

If you want to see it in ODA, try these bytes:

67 66 f0 3e 81 84 8e 78 56 34 12 89 ab cd ef

In ODA, you’ll have to manually set the machine mode to i8086.  You can do this by clicking the Options button next to the Platform dropdown. Try it out now!

Just remember, next time you’re manually assembling an x86 instruction, don’t go hog wild with those instruction prefixes.  ”Everything in moderation, including moderation.”  -Oscar Wilde

Posted in Disassembly Tidbits, Uncategorized
11 comments on “Longest x86 Instruction
  1. Ken says:

    I had no idea an instruction could be so long!

  2. Mark says:

    Hmm. So this means that all software x86 emulators, no matter how bloated, are smaller than the biggest possible instruction they need to be able to emulate to be fully compliant. Interesting. :)

  3. gallier2 says:

    Your longest instruction is not for the 8086 but for 80386 and up. It’s a 32 bit instruction. You would better state it as x86, then everbody would understand it as such. 8086 implies 16 bitness.

  4. admin says:

    Good catch! I had the title of the post right (“Longest x86 Instruction”), but I slipped up a few times in the post. Thanks! I’ll update the post to be more precise.

  5. Chaz says:

    Locks, segment and address size overrides. Technically, from the instruction decoder’s point of view, the lock and segment/address overrides are quite decoupled, it’s just unfortunate that disassemblers by convention must include them on the same line as the prefixed instruction.

  6. grepNstep1 says:

    0 down vote favorite

    I read the recent article “Longest x86 Instruction”

    http://blog.onlinedisassembler.com/blog/?p=23

    I attempted to reproduce the curious disassembly issue on a Win7x86 development platform using masm and as the article suggested, redunant prefixes.

    Talk is cheap, so here’s a toy program (masm32):

    .386 .model flat, stdcall

    option casemap:none

    includelib \x\x\kernel32.lib
    includelib \x\x\user32.lib

    include \x\x\kernel32.inc
    include \x\x\user32.inc
    include \x\x\windows.inc

    .code

    start:

    db 0F3h
    db 0F3h
    db 0F3h
    db 0F3h
    db 0F3h
    db 0F3h
    db 0F3h
    ;…6 more bytes later
    db 089h
    db 0E5h

    end start

    invoke ExitProcess, NULL

    After linking and assembling, I opened the resulting executable in windbg.

    To my disappointment, when I single step, unassemble the $exentry, etc. windbg simply sees the prefixes/bytes as individual instructions, says ‘to hell with it’ and executes only the valid instructions.

    Is there something I’m missing?

  7. admin says:

    Try using either the address or operand override prefixes (either repeating 0×66 or 0×67). Those were the prefixes mentioned in the article. I’ll be curious to hear your results!

    -Anthony

  8. mrexodia says:

    Isn’t the disassembly incorrect? The 0×66 prefix overrides the operand size to 16 bits and the 0×67 prefix overrides the address size to 16 bits. Two disassemblers I used reported

    lock add word ptr ds:[si + 0x788e], 0×3456
    adc cl, byte ptr [ecx - 0x6f103255]

    As two seperate instructions.

    Greetings

  9. simi says:

    I am trying to convert a hex file in AT89c51 to asm file, but not knowing which platform to select. Can u help me?

  10. As a matter of interest – vanishingly small, perhaps – this sort of thing dates back at least as far as the Z80, which used prefix bytes in a similar way.

    For example 0xDD is a prefix byte that means use the IX register instead of the HL register in the next instruction. Similarly 0xFD means use the IY register instead of HL.

    It is pointless, but you can have as many of these as you like in front of an instruction (only the last one has any effect).

    Is this of any use? Well… rarely… for one thing the Z80 will not acknowledge interrupts in the middle of such a sequence, so you can use them to introduce a very long interrupt latency for test purposes. You can also use them as signals to Z80 emulators that the Z80 will essentially ignore, at a cost of 4-cycles.

    Ancient history now, of course.

  11. Ali Rizvi-Santiago says:

    No limit to the size of the instructions? Infinite number of bytes? Maybe if your 80286+ disassembler is broken, but on Intel’s 32-bit architecture, it explicitly states in the reference that an instruction can have up to a maximum of 4 prefixes for an opcode. i8086/8088 might not have a limit to the number of prefixes, but every other Intel-based arch does.

    The 80286+ each have a protection mode implementation that allow one to trap (via a nmi) when specified types of memory accesses occur or when instructions get executed.. 80286 has a 16-bit pmode, and 80386+ have a 32-bit pmode. The 80286 required a soft-cycle to exit pmode, where the 80386+ didn’t.

    Each of these architectures will raise different exceptions or faults when errors occur. One of these faults, the general protection fault (#GP), is raised when a instruction that’s being decoded hits it’s max length. You can look both of these up in their architecture references. The 80286 has a maximum instruction size of up 10-bytes and then the 80386+ (32-bits) have a maximum instruction size of up to 15 bytes.

    Now, if you look at the datasheet of Intel’s iAPX 86 (or 88) line of microprocessors, you’ll see that the opcodes can vary from 1 to 6 bytes in length excluding the optional prefix bytes. However, if an interrupt occurs during the execution of an instruction (like a string operation such as scas) the processor is able to restore only 1 preceding byte of the operation. This is because during this series, each prefix was actually treated as a single-instruction. The instruction queue was 2-bytes in length, so each component of an instruction has to fit within these 2-bytes.

    So, infinite number of prefixes? Maybe..but I guess that depends on what you define a prefix as. If you define a prefix as being internally tied and augmenting an instruction instead of just simply setting a state, then no not infinite. But, if you define it as an individual operation that’s executed by the processor, then yes. That 64k of prefixes is actually 64k worth of separate instructions. Each of your 8086/8088 instructions can be up to maybe 6 + 1 bytes. The ref says that the processor is only able to remember up to one prefix byte when an instruction is interrupted.

Leave a Reply

Your email address will not be published. Required fields are marked *

*


seven − = 6

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>