Here the first set of results is from the 32-bit coded-in-assembler version of BBCSDL and the second from the 64-bit coded-in-C version, running on the same hardware (Intel Core i7 clocked at 4 GHz):
BBC BASIC for Win32 version 1.43b Average: 79 ns Colon: 1 ns A%=B%<<C%: 53 ns Dispatch (ENDCASE): 5 ns A%=B%>>C%: 55 ns Dispatch (ENDIF): 6 ns A%=B%>>>C%: 55 ns NEXT (integer): 22 ns A%%=B%%: 71 ns NEXT (default real): 44 ns A%%=B%%+C%%: 109 ns NEXT (64-bit real): 64 ns A%%=B%%-C%%: 110 ns NEXT N%: 18 ns A%%=B%%*C%%: 107 ns NEXT n%: 41 ns A%%=B%%DIVC%%: 107 ns NEXT N: 65 ns A%%=B%%/C%%: 213 ns NEXT N#: 88 ns A%%=B%%^3: 139 ns REPEATUNTILTRUE: 28 ns A%%=B%%<<C%%: 115 ns WHILEFALSE:ENDWHILE: 24 ns A%%=B%%>>C%%: 116 ns A%=FALSE: 26 ns A%%=B%%>>>C%%: 115 ns A%=0: 34 ns A=B+PI: 101 ns A=PI: 46 ns A=B-PI: 100 ns ANTIDISESTABLISHMENT=PI: 56 ns A=B*PI: 101 ns A=(PI): 64 ns A=B/PI: 105 ns A%=1234567890: 50 ns A=B^3: 206 ns A%=&499602D2: 41 ns A=B^PI: 230 ns A=1.23456789E38: 102 ns A=SINB: 110 ns A%=B%: 30 ns A=TANB: 116 ns A=B: 69 ns A=LOGB: 104 ns ANTI=ANTI: 72 ns A=EXPB: 113 ns A%=B%+C%: 47 ns A=SQRB: 89 ns A%=B%-C%: 47 ns A=ATNB: 119 ns A%=B%*C%: 46 ns A=ABSB: 74 ns A%=B%DIVC%: 47 ns A=INTB: 88 ns A%=B%/C%: 152 ns PROC1: 39 ns A%=C%^D%: 95 ns A%=FN1: 128 ns BBC BASIC for Win64 version 1.43c Average: 85 ns Colon: 3 ns A%=B%<<C%: 94 ns Dispatch (ENDCASE): 4 ns A%=B%>>C%: 94 ns Dispatch (ENDIF): 5 ns A%=B%>>>C%: 95 ns NEXT (integer): 15 ns A%%=B%%: 62 ns NEXT (default real): 22 ns A%%=B%%+C%%: 109 ns NEXT (64-bit real): 25 ns A%%=B%%-C%%: 108 ns NEXT N%: 20 ns A%%=B%%*C%%: 108 ns NEXT n%: 25 ns A%%=B%%DIVC%%: 115 ns NEXT N: 31 ns A%%=B%%/C%%: 132 ns NEXT N#: 35 ns A%%=B%%^3: 206 ns REPEATUNTILTRUE: 43 ns A%%=B%%<<C%%: 118 ns WHILEFALSE:ENDWHILE: 33 ns A%%=B%%>>C%%: 117 ns A%=FALSE: 40 ns A%%=B%%>>>C%%: 117 ns A%=0: 45 ns A=B+PI: 108 ns A=PI: 48 ns A=B-PI: 105 ns ANTIDISESTABLISHMENT=PI: 56 ns A=B*PI: 109 ns A=(PI): 79 ns A=B/PI: 106 ns A%=1234567890: 52 ns A=B^3: 260 ns A%=&499602D2: 48 ns A=B^PI: 219 ns A=1.23456789E38: 84 ns A=SINB: 123 ns A%=B%: 46 ns A=TANB: 130 ns A=B: 58 ns A=LOGB: 126 ns ANTI=ANTI: 61 ns A=EXPB: 127 ns A%=B%+C%: 80 ns A=SQRB: 101 ns A%=B%-C%: 82 ns A=ATNB: 134 ns A%=B%*C%: 81 ns A=ABSB: 67 ns A%=B%DIVC%: 92 ns A=INTB: 78 ns A%=B%/C%: 104 ns PROC1: 33 ns A%=C%^D%: 228 ns A%=FN1: 119 ns
To some extent what is most notable is not how they differ but just how similar they are, in general; I probably wouldn't have expected that and it shows how far compilers have come. But if one drills down into the detail there are some interesting comparisons, for example that on the 32-bit version division of two integers is significantly slower than raising to an integer power, whereas on the 64-bit version the opposite is the case.
Of course this program doesn't exercise the numeric SUM() function, which is where the difference between the two versions really stood out in the case of the challenge (until I made the experimental change of in-lining the addition routine).