Author |
Performance: VUPS Procedure |
Bruce Claremont
Moderator
Posts: 623
Joined: 07.01.10 |
Posted on March 18 2010 10:39 |
|
|
We currently use the following DCL procedure on VMS systems to take a quick swag at performance. Using this VUPS procedure, a real AlphaServer 400/166 with 384MB of memory runs at 27.1 VUPs.
FreeAXP performance is highly dependent upon the Host O/S (x86 vx x64) and the speed of the underlying hardware. The beta releases of FreeAXP (1.n.n.n) will be slower than the production releases (2.0 and higher).
$! CALCULATE_VUPS
$! Use at your own risk.
$!
$ set noon
$ orig_privs = f$setprv("ALTPRI")
$ process_priority = f$getjpi(0,"PRIB")
$ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
$ cpu_round_add = 1 ! VAX = 1 - Alpha/AXP = 9
$ cpu_round_divide = cpu_round_add + 1
$ init_counter = cpu_multiplier * 525
$ init_loop_maximum = 205
$ start_cputime = f$getjpi(0,"CPUTIM")
$ loop_index = 0
$ 10$:
$ loop_index = loop_index + 1
$ if loop_index .ne. init_loop_maximum then goto 10$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ init_vups = ((init_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ loop_maximum = (init_vups * init_loop_maximum) / 10
$ base_counter = (init_counter * init_vups) / 10
$ vups = 0
$ times_through_loop = 0
$ 20$:
$ start_cputime = f$getjpi(0,"CPUTIM")
$ loop_index = 0
$ 30$:
$ loop_index = loop_index + 1
$ if loop_index .ne. loop_maximum then goto 30$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ new_vups = ((base_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ if new_vups .eq. vups then goto 40$
$ vups = new_vups
$ times_through_loop = times_through_loop + 1
$ if times_through_loop .le. 5 then goto 20$
$ 40$:
$ new_privs = f$setprv(orig_privs)
$ set message /nofacility/noidentification/noseverity/notext
$ ASSIGN/SYSTEM/EXEC 'vups' MACHINE_VUPS_RATING
$ set message /facility/identification/severity/text
$ write sys$output "Approximate System VUPs Rating : ", -
vups / 10,".", vups - ((vups / 10) * 10)
$ exit
Edited by Bruce Claremont on March 18 2010 10:41 |
|
Author |
RE: Performance: VUPS Procedure |
astrodanco
Member
Posts: 36
Joined: 04.03.10 |
Posted on March 30 2010 06:25 |
|
|
I just ran that on an Intel Core I7 920 and get:
Approximate System VUPs Rating : 29.4
But at least I could run several of these in parallel and get the same rating from each of them.
I'm not sure that using DCL for benchmarking makes for a very reliable benchmark...
|
|
Author |
RE: Performance: VUPS Procedure |
Bruce Claremont
Moderator
Posts: 623
Joined: 07.01.10 |
Posted on April 01 2010 09:01 |
|
|
Our target for FreeAXP performance is about 30% faster than the beta 254 release is currently running.
I'm open to a better benchmark is one is available. |
|
Author |
RE: Performance: VUPS Procedure |
astrodanco
Member
Posts: 36
Joined: 04.03.10 |
Posted on April 01 2010 11:46 |
|
|
I don't think there is one publicly available to measure VUPS and never was unless you're were a DEC insider. But you could compile and run the old open source drystone benchmark on the real machines and compare it to the emulated ones. |
|
Author |
RE: DCL procedure is fine for easy measurement |
VolkerHalle
Member
Posts: 104
Location: Germany
Joined: 02.04.10 |
Posted on April 02 2010 03:35 |
|
|
astrodanco wrote:
I'm not sure that using DCL for benchmarking makes for a very reliable benchmark...
Performance is a complex thing, real benchmarking as well. But for an overview and an easy measurement of the relative CPU performance, this little DCL procedure seems to be fine. DCL as an interpreter needs lots of CPU cycles to run this procedure. Not much IOs (maybe just 1) and few pagefaults.
Important thing is, everybody has to run the SAME procedure. Don't start 'optimzing' it to get 'better' VUPS numbers.
Volker. |
|
Author |
RE: Performance: VUPS Procedure |
astrodanco
Member
Posts: 36
Joined: 04.03.10 |
Posted on April 16 2010 19:36 |
|
|
With the April 15th update I'm now getting 33 on the same hardware, so I guess I can confirm seeing the 10% performance improvement. |
|
Author |
RE: Migration Specialties Test Hardware |
Bruce Claremont
Moderator
Posts: 623
Joined: 07.01.10 |
Posted on April 21 2010 11:19 |
|
|
I've posted system information on the systems we use as primary test boxes for FreeAXP and Avanti. You'll find the links in the left column on the Development page:
http://www.migrationspecialties.com/VAXAlphaEmulator.html
Scroll down to the Development Systems box and you'll see the links.
FYI: FreeAXP 271 64-bit is now running on par with a real AlphaServer 400/166.
Edited by Bruce Claremont on April 21 2010 11:21 |
|
Author |
RE: Performance: VUPS Procedure |
astrodanco
Member
Posts: 36
Joined: 04.03.10 |
Posted on April 23 2010 23:30 |
|
|
Running 271 on my Core I7 920 @ 4.0 GHz I'm now getting 37.6.
Note that a VAXserver 3900 running under SIMH (set to have 512 MB of memory) on the same Core I7 920 @ 4.0 GHz gives me 29.8.
I've found that SYS$EXAMPLES:MACRO64$PI, which is installed by the Macro 64 kit from the OpenVMS Freeware V4 CD is a good benchmark. On the same hardware running 271 (set to have 128 MB of memory) I'm able to calculate PI to 40,000 digits in 177.11 seconds. |
|
Author |
RE: Performance: VUPS Procedure |
Bruce Claremont
Moderator
Posts: 623
Joined: 07.01.10 |
Posted on April 24 2010 13:32 |
|
|
Good test. Thanks!
PI to 40,000 digits on real AlphaServer 400/166 with 384MB memory running VMS 7.3-2: 386 seconds
PI compiled with ALPHA_MACRO64 V1.0.
Test Procedure:
$! PI_TEST.COM
$! Test computing PI to 40,000 digits using Alpha Macro64
$! PI program.
$!
$! PI Program compiled using ALPHA_MACRO64 V1.0. PI program
$! and compiler obtained from Alpha Macro64 kit on Freeware
$! CD V4.
$!
$! macro/alpha_axp/object=pi sys$examples:macro64$pi
$! link pi
$!
$ write sys$output "Calculationg PI 40,000 digits..."
$ start = f$cvtime(f$time(),,"secondofyear")
$ run pi
40000
$ end = f$cvtime(f$time(),,"secondofyear")
$ delta = end - start
$ write sys$output delta
$ write sys$output "PI calculated to 40,000 digits in ''delta' seconds."
$ EXIT
|
|
Author |
RE: Performance: VUPS Procedure |
astrodanco
Member
Posts: 36
Joined: 04.03.10 |
Posted on April 24 2010 19:29 |
|
|
I forgot to mention that I compiled it with Macro-64 V1.2-108-367CC and /OPTIMIZE. Help says the default is /NOOPTIMIZE. That may make quite a difference. If your run on the real Alpha was done with unoptimized code, could you please re-run it with optimized code? Also note that the .DAT file produced contains the run time statistics in addition to the result.
Edited by astrodanco on April 24 2010 19:36 |
|
Author |
RE: Performance: VUPS Procedure |
Bruce Claremont
Moderator
Posts: 623
Joined: 07.01.10 |
Posted on April 27 2010 09:35 |
|
|
We have added Test System and Benchmark links to our Development page. The Test System links provide information on the hardware we using in testing. The Benchmark links are programs and procedures we use to measure performance on real VAX and Alpha hardware and our virtual hardware. This information allows you to generate meaningful comparison numbers for your host and emulated systems.
Test Systems Link: http://www.migrationspecialties.com/VAXAlphaEmulator.html#TestSystems
Benchmarks Link: http://www.migrationspecialties.com/VAXAlphaEmulator.html#Benchmarks
Development Page: http://www.migrationspecialties.com/VAXAlphaEmulator.html
Astrodanco: I used macro/alpha_axp/object=pi to compile the PI program. I don't know what default behavior of /OPTIMIZE is under MARCO64 V1.0 and I will not have access to the real AlphaServer for a couple of days. Would you mind downloading the PI Test, running that image, and posting the results?
Edited by Bruce Claremont on April 27 2010 09:36 |
|
Author |
RE: /OPTIMIZE does not really matter |
VolkerHalle
Member
Posts: 104
Location: Germany
Joined: 02.04.10 |
Posted on April 27 2010 20:57 |
|
|
I've compiled MACRO64$PI with MACRO-64 V1.2-108-367CC with and without /OPTIMIZE and run it using PI_TEST.COM on a real AlphaServer 1000A 5/400 (EV56) a couple of times:
/NOOPT runs for 112, 111 and 111 seconds
/OPT runs for 110, 110 and 110 seconds
Volker. |
|
Author |
RE: Performance: VUPS Procedure |
astrodanco
Member
Posts: 36
Joined: 04.03.10 |
Posted on April 28 2010 09:43 |
|
|
Interestingly enough Bruce, your PI executable seems to run a little bit faster than mine. I'm getting 172 seconds using your PI executable. |
|
Author |
RE: new CALCULATE_VUPS procedure |
VolkerHalle
Member
Posts: 104
Location: Germany
Joined: 02.04.10 |
Posted on May 05 2010 22:36 |
|
|
Bruce,
there are a couple of possible endless-loop problems in this procedure , which will only become visible when running on 1 GHz or 1.25 GHz Alphas. I've cleaned up the code and removed those problems. The procedure has been successfully tested on fast Alphas and also tested on slower Alphas. It does provide the same VUPS results as the previous version. Please consider to use this updated version.
The main problem solved was, that the loops may finish within 10 ms of CPU time on fast Alphas, which will cause a divide-by-zero and a negative result for the no. of loops to be run.
Volker.
$! CALCULATE_VUPS
$! Use at your own risk.
$!
$ set noon
$ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
$ cpu_round_add = 1 ! VAX = 1 - Alpha/AXP = 9
$ cpu_round_divide = cpu_round_add + 1
$ init_counter = cpu_multiplier * 525
$ speed_factor = 1 ! to increase no. of loops on fast CPUs
$ 9$:
$ init_loop_maximum = 205 * speed_factor
$ start_cputime = f$getjpi(0,"CPUTIM")
$ loop_index = 0
$ 10$:
$ loop_index = loop_index + 1
$ if loop_index .ne. init_loop_maximum then goto 10$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ IF end_cputime .LE. start_cputime + 1 ! not enough clock-ticks = CPU too fast
$ THEN
$ speed_factor = speed_factor + 1 ! increase no. of loops
$ WRITE SYS$OUTPUT "INFO: Preventing endless loop (10$) on fast CPUs"
$ GOTO 9$
$ ENDIF
$ init_vups = ((init_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ IF init_vups .LE. 0
$ THEN
$ WRITE SYS$OUTPUT "Calibration error -> exiting (Please report this problem)"
$ SHOW SYMB speed_factor
$ SHOW SYMB init_vups
$ SHOW SYMB init_counter
$ SHOW SYMB end_cputime
$ SHOW SYMB start_cputime
$ SHOW SYMB cpu_multiplier
$ SHOW SYMB cpu_rounding
$ SHOW CPU
$ EXIT
$ ENDIF
$ write sys$output " "
$ loop_maximum = (init_vups * init_loop_maximum) / ( 10 * speed_factor )
$ base_counter = (init_counter * init_vups) / 10
$ vups = 0
$ min_vups = %X7FFFFFFF
$ max_vups = 0
$ avg_vups = 0
$ times_through_loop = 0
$ 20$:
$ start_cputime = f$getjpi(0,"CPUTIM")
$ times_through_loop = times_through_loop + 1
$ loop_index = 0
$ 30$:
$ loop_index = loop_index + 1
$ if loop_index .ne. loop_maximum then goto 30$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ IF end_cputime .LE. start_cputime
$ THEN
$ new_vups = 0 ! can not calculate VUPS (CPU too fast)
$ WRITE SYS$OUTPUT "INFO: Loop too fast (20$) - ignoring VUPS data"
$ ELSE
$ new_vups = ((base_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ ENDIF
$ IF new_vups .LT. min_vups THEN $ min_vups = new_vups
$ IF new_vups .GT. max_vups THEN $ max_vups = new_vups
$ avg_vups = avg_vups + new_vups
$ if new_vups .eq. vups then goto 40$
$ vups = new_vups
$ if times_through_loop .le. 5 then goto 20$
$!! WRITE SYS$OUTPUT "INFO: Preventing endless loop 20$"
$ 40$:
$ vups = avg_vups / times_through_loop
$ write sys$output " Approximate System VUPs Rating : ", -
vups / 10,".", vups - ((vups / 10) * 10), -
" ( min: ", min_vups/10,".", min_vups - ((min_vups / 10) * 10), -
" max: ", max_vups/10,".", max_vups - ((max_vups / 10) * 10), " )"
$ exit |
|
Author |
RE: Performance: VUPS Procedure |
Bruce Claremont
Moderator
Posts: 623
Joined: 07.01.10 |
Posted on May 06 2010 03:40 |
|
|
I've verified Volker's update. It still produces measurements consistent with the old version. We've switched to Volker's updated version. I've also replaced our VUP3 benchmark procedure posted on the Development page with Volker's VUPS procedure. Thanks, Volker. |
|
Author |
RE: Performance: VUPS Procedure |
pnkearns
Member
Posts: 8
Joined: 11.10.10 |
Posted on December 29 2010 14:30 |
|
|
I know that VUPS is reasonable given we are talking about FreeAXP and Digital architecture, but has anyone run Dhrystones on FreeAXP?
I know there are lies, damn lies and then benchmarks. However, I'm running into a need to give some rough number comparisons to other architectures. |
|
Author |
RE: Performance: VUPS Procedure |
pnkearns
Member
Posts: 8
Joined: 11.10.10 |
Posted on December 29 2010 15:00 |
|
|
As an FYI, my results running the VUPS.COM file Bruce posted.
System configuration:
MacBook Pro notebook with 2.33 GHz Intel Core 2 Duo OS/X 10.6 -> Parallels 6 -> Windows 7 -> FreeAXP 283/OpenVMS 8.3
Results:
Run 1: 24.4 VUPS (Min: 24 Max: 25)
Run 2: 24.8 VUPS (Min: 23 Max: 25)
Edited by pnkearns on December 29 2010 15:03 |
|
Author |
RE: Performance: VUPS Procedure |
rickw
Member
Posts: 3
Joined: 03.05.11 |
Posted on May 03 2011 08:01 |
|
|
FYI on performance ratings using Volker's procedure on 4 different systems.
HP Integrity Rx6600 - quad 1.6ghz Itanium CPU's 8GB of RAM, VMS 8.3-1h1: 591.6
Compag Alpha Server DS20 - Dual 667mhz CPU's 1GB of RAM VMS 7.2-1: 204.9
FreeAXP AMD 1090T (3.2ghz hexcore) 8GB RAM VMS 8.3 (hobbyist)*: 33.0
FreeAXP HP Dv6000 Laptop - Core 2 Duo 2ghz 4GB VMS 8.3 (hobbyist): 20.8
*Note: AMD PC is my 24/7 do all server and has several apps running while testing FreeAXP.
(PS3 Media Server, Thunderbird Email Client, Miranda Chat for "ICQ,IRC,AIM,Yahoo,Google Talk,MSN", TeamSpeak3 voice chat client, Liberkey Portable Apps, SQL-lite, Visual Studio 2005, Filezilla Server to name a few...)
regards, Rick
Edited by rickw on May 03 2011 08:02 |
|
Author |
RE: Performance: VUPS Procedure |
abrsvc
Member
Posts: 108
Joined: 12.03.10 |
Posted on May 04 2011 00:53 |
|
|
Performance ratings for
Alpha DS10 466Mhz 2GB - 220.9 ( min: 220.6 max: 221.2 )
Alpha DS10 617Mhz 2GB - 306.3 ( min: 303.6 max: 308.4 )
Dan |
|
Author |
RE: Performance: VUPS Procedure |
rickw
Member
Posts: 3
Joined: 03.05.11 |
Posted on May 04 2011 03:03 |
|
|
abrsvc wrote:
Performance ratings for
Alpha DS10 466Mhz 2GB - 220.9 ( min: 220.6 max: 221.2 )
Alpha DS10 617Mhz 2GB - 306.3 ( min: 303.6 max: 308.4 )
Dan
Interesting that a DS10 shows higher ratings than my DS20.
If I may ask, which command script did you use?
(Original or Volker's updated script?)
Thanks!
Rick |
|