, 1 min read
CUDA Performance
I ran below commands under different load for my Gigabyte GTX 560 graphic card.
export LD_LIBRARY_PATH=$CUDA_PATH/lib64
time /usr/local/cuda/samples/sdk/0_Simple/matrixMul/matrixMul
time /usr/local/cuda/samples/sdk/0_Simple/matrixMulCUBLAS/matrixMulCUBLAS
I was interested in the value GFlop/s.
| Test case | console | X11 | 
|---|---|---|
| Nothing | 127.03 | n/a | 
| GPUGrid | 82.67 | 127.05 | 
| GPUGrid+Chrome | n/a | 83.52 | 
| CUBLAS: Nothing | 451.85 | 442.50 | 
| CUBLAS: GPUGrid | 186.41 | n/a | 
| CUBLAS: GPUGrid+Chrome | n/a | 212.73 | 
So one can clearly see that matrix multiplication using CUBLAS is 3.5-times faster than matrix multiplication without CUBLAS.
Furthermore, the more load you put on the graphic card the slower the matrix multiplication.
When GPUGrid runs nvidia-smi returns the following information:
Wed Jun 19 23:26:13 2013       
+------------------------------------------------------+                       
| NVIDIA-SMI 4.310.44   Driver Version: 310.44         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name                     | Bus-Id        Disp.  | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage         | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 560          | 0000:01:00.0     N/A |                  N/A |
| 54%   52C  N/A     N/A /  N/A |  30%  303MB / 1023MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+
Temperature is as below, given by sensors:
fam15h_power-pci-00c4
Adapter: PCI adapter
power1:       39.34 W  (crit = 124.95 W)
k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +18.5°C  (high = +70.0°C)
                       (crit = +90.0°C, hyst = +87.0°C)
it8721-isa-0290
Adapter: ISA adapter
in0:          +2.80 V  (min =  +2.41 V, max =  +2.32 V)  ALARM
in1:          +2.78 V  (min =  +0.19 V, max =  +2.09 V)  ALARM
in2:          +0.83 V  (min =  +0.01 V, max =  +1.03 V)
+3.3V:        +3.31 V  (min =  +2.88 V, max =  +4.63 V)
in4:          +0.34 V  (min =  +1.10 V, max =  +1.34 V)  ALARM
in5:          +2.52 V  (min =  +1.24 V, max =  +0.60 V)  ALARM
in6:          +2.35 V  (min =  +0.14 V, max =  +1.16 V)  ALARM
3VSB:         +4.82 V  (min =  +0.00 V, max =  +4.78 V)  ALARM
Vbat:         +3.34 V  
fan1:         437 RPM  (min =   39 RPM)
fan2:           0 RPM  (min =   22 RPM)  ALARM
fan3:        1421 RPM  (min =   17 RPM)
temp1:        +38.0°C  (low  = -47.0°C, high = +51.0°C)  sensor = thermistor
temp2:        +38.0°C  (low  = -101.0°C, high = -61.0°C)  ALARM  sensor = thermistor
temp3:       -128.0°C  (low  = -22.0°C, high = -11.0°C)  sensor = disabled
intrusion0:  OK
Outside temperature is 27°.