x86-64 assembler

Discussions related to using the integrated assembler
guest

x86-64 assembler

Post by guest »

Copied from the Announcements section, as it will be easier to find here.

Like the (32-bit) BB4W assembler, the x86-64 assembler in BBCSDL (v0.27a and later) accepts something close to 'NASM' syntax. It's not as compatible with Microsoft's MASM - so for example indirect memory references must always be indicated explicitly by means of square brackets.

However don't expect the code below to work because most instructions of this kind only take 32-bit addresses which aren't useful in a 64-bit flat memory space:

Code: Select all

      mov ebx,[address]  ; this doesn't work
The one exception is when you are loading (only) the al/ax/eax/rax register, which can take a full 64-bit address:

Code: Select all

      mov eax,[address]  ; this works, but only when moving to al/ax/eax/rax
However all is not lost when loading (or adding to, etc.) other registers, because you can use the x86-64's RIP-relative addressing, when accessing a memory location not very far away, typically declared in your own program:

Code: Select all

      add ebx,[rel address]  ; this works, using RIP-relative addressing
...
      .address dd value
But remember that, just as in BB4W, you must ensure that executable code and any written memory locations don't share the same 4K 'page', otherwise the Self Modifying Code provision will be activated and your program will run at a fraction of the speed it should (if you only read from the memory it's fine):

Code: Select all

      add [rel address],ebx  ; memory write
...
      ] : ]^P% = (]^P% + 4095) AND -4096 : [
      .address dd value
Note the somewhat inelegant way of accessing the 64-bit program counter: ]^P% (i.e. the most significant 32-bits are in Q% and the least significant in P%).
guest

Re: x86-64 assembler

Post by guest »

The x86-64 assembler in BBCSDL (64-bit Linux edition) supports all standard (non-privileged) instructions, plus floating point, MMX, SSE, SSE2 and SSE3 instructions. It does not support SSSE3, SSE4 or AVX instructions (so the YMM and ZMM registers, when available, are not directly accessible). It recognises more than 500 instruction mnemonics and more than 1000 distinct instruction types. Of course any instruction may be synthesised from DB directives.

For those who don't already know, the main difference between the x86-64 architecture and the IA-32 architecture used by BB4W (and its assembler) is that the main register bank is extended in both breadth and depth. The general purpose integer registers are extended from 32-bits to 64-bits, and there are 16 of them instead of 8; there are also 16 XMM registers rather than 8 (but the FPU and MMX registers remain exactly the same as in IA-32):

registers.png

Most of the instructions that you are familiar with from the BB4W 32-bit assembler work exactly the same, except with this extended set of registers. There are just a couple of things to note: only the 64-bit registers can be used for addressing memory (so [rbx] is valid but [ebx] isn't) and whenever you write to a 32-bit register the top 32-bits of the corresponding 64-bit register are zeroed.
You do not have the required permissions to view the files attached to this post.
guest

Re: x86-64 assembler

Post by guest »

Should you have a need for them (e.g. for padding or alignment), the BBCSDL x86-64 assembler directly supports multi-byte NOPs, from one byte to ten bytes long:

Code: Select all

00000000491A1D73 90                              nop
00000000491A1D74 66 90                           nop word
00000000491A1D76 0F 1F 00                        nop [rax]
00000000491A1D79 0F 1F 40 01                     nop [rax + 1]
00000000491A1D7D 0F 1F 44 00 01                  nop [rax*2 + 1]
00000000491A1D82 66 0F 1F 44 00 01               nop word [rax*2 + 1]
00000000491A1D88 0F 1F 80 00 01 00 00            nop [rax + &100]
00000000491A1D8F 0F 1F 84 00 00 01 00            nop [rax*2 + &100]
00000000491A1D96 00
00000000491A1D97 66 0F 1F 84 00 00 01            nop word [rax*2 + &100]
00000000491A1D9E 00 00
0000000049181DFD 3E 66 0F 1F 84 00 00            nop word ds:[rax*2 + &100]
0000000049181E04 01 00 00
These instructions are guaranteed to 'do nothing' and be executed efficiently.

The longest 'legitimate' instruction that the x86-64 assembler will encode is 14 bytes long; here's an example:

Code: Select all

0000000049141DDF F0 64 48 81 84 80 78            lock add qword fs:[rax * 5 + &12345678], &FEDCBA98
0000000049141DE6 56 34 12 98 BA DC FE
guest

Re: x86-64 assembler

Post by guest »

One of the irritations of writing x86-64 assembler code is the lack of standardisation between the Windows and Linux 64-bit ABIs. On Windows the first four (integer or pointer) parameters of a function get passed in rcx, rdx, r8 and r9 respectively whereas in Linux they get passed in rdi, rsi, rdx and rcx. Because they both use rcx and rdx, but for different parameters, there's no easy way of writing code that will run on both platforms (fortunately if there are no more than two parameters things are much more straightforward: you can simply load both the Windows and Linux registers).

Naively what you end up having to do is something like this:

Code: Select all

      IF @platform% AND 7 THEN
        [OPT pass%
        mov rdi,parm1
        mov rsi,parm2
        mov rdx,parm3
        mov rcx,pram4
        ]
      ELSE
        [OPT pass%
        mov rcx,parm1
        mov rdx,parm2
        mov r8,parm3
        mov r9,parm4
        ]
      ENDIF
which is a pain, especially if your code includes many function calls. To make things a little easier I wrote this assembler 'macro' to do the work:

Code: Select all

      DEF FNparm(pass%, p1$, p2$, p3$, p4$)
      LOCAL G%, asm$
      asm$ = "[opt " + STR$(pass% AND &FE)
      IF @platform% AND 7 THEN
        IF p4$<>"" asm$ += " : mov rcx," + p4$
        IF p3$<>"" asm$ += " : mov rdx," + p3$
        IF p2$<>"" asm$ += " : mov rsi," + p2$
        IF p1$<>"" asm$ += " : mov rdi," + p1$
      ELSE
        IF p1$<>"" asm$ += " : mov rcx," + p1$
        IF p2$<>"" asm$ += " : mov rdx," + p2$
        IF p3$<>"" asm$ += " : mov r8,"  + p3$
        IF p4$<>"" asm$ += " : mov r9,"  + p4$
      ENDIF
      asm$ += " : ]"

      G% = OPENOUT (@tmp$ + "asmtmp.bba")
      BPUT #G%, LEN(asm$) + 4
      BPUT #G%, 0
      BPUT #G%, 0
      BPUT #G%, asm$;
      BPUT #G%, &D
      BPUT #G%, 0
      BPUT #G%, &FF
      BPUT #G%, &FF
      CLOSE #G%

      CALL @tmp$ + "asmtmp.bba"
      = pass%
Now you can load the registers like this:

Code: Select all

      OPT FNparm(pass%, "parm1", "parm2", "parm3", "parm4")
If one or more of the parameters happens itself to be a register (which is of course quite likely) you will need to be careful to ensure that it isn't modified by the macro before its contents are used. To avoid any risk of that, don't use rcx, rdx, rsi or r8 for parameters.
David Williams

Re: x86-64 assembler

Post by David Williams »

I've been reading this thread with interest, and will continue to monitor it. I don't have much time for assembler programming (IA-64 or otherwise) at the moment, but perhaps in the New Year I'll get a chance to try out the new assembler.


David.
--
guest

Re: x86-64 assembler

Post by guest »

David Williams wrote: Fri 07 Dec 2018, 16:06 I've been reading this thread with interest, and will continue to monitor it. I don't have much time for assembler programming (IA-64 or otherwise) at the moment, but perhaps in the New Year I'll get a chance to try out the new assembler.
I don't want to give the impression that x86-64 assembler programming is difficult, indeed if you've already got working IA-32 assembler code the modifications required can often be made by little more than a couple of search-and-replace operations (that was the case for 'mandel.bbc' for example, and it worked first time).

In some respects things can actually be easier, not least because the x86-64 assembler includes support for conditional move and SSE/SSE2/SSE3 instructions 'out of the box' whereas the BB4W assembler requires the use of the ASMLIB libraries. But there are a few 'gotchas' and I'll post here as and when I encounter them.