Skip to content

cuda : use CUBLAS_COMPTE_F32 insted of CUBLAS_COMPUTE_F16 #1559

New issue

Have a question about this project? Sign up for a free account to open an issue and contact its maintainers and the community.

By clicking “Sign up for ”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on ? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Nov 27, 2023

On some video cards, this can be faster.

GTX 1660

./extra/bench-all.sh 1 1
  • master
GPUConfigModelThEnc.Dec.Bch5PPCommit
GTX 1660AVX2 BLAS CUDAtiny1105.955.900.630.20f52e74d
GTX 1660AVX2 BLAS CUDAtiny-q5_01106.292.630.310.21f52e74d
GTX 1660AVX2 BLAS CUDAtiny-q5_11106.352.620.300.20f52e74d
GTX 1660AVX2 BLAS CUDAbase1236.207.561.010.33f52e74d
GTX 1660AVX2 BLAS CUDAbase-q5_01237.014.180.440.33f52e74d
GTX 1660AVX2 BLAS CUDAbase-q5_11237.014.170.430.33f52e74d
GTX 1660AVX2 BLAS CUDAsmall1890.3421.702.331.00f52e74d
GTX 1660AVX2 BLAS CUDAsmall-q5_01893.4511.791.081.02f52e74d
GTX 1660AVX2 BLAS CUDAsmall-q5_11893.4411.771.041.02f52e74d
GTX 1660AVX2 BLAS CUDAmedium12767.5854.526.802.40f52e74d
GTX 1660AVX2 BLAS CUDAmedium-q5_012779.6526.922.572.46f52e74d
GTX 1660AVX2 BLAS CUDAmedium-q5_112779.3026.962.422.46f52e74d
GTX 1660AVX2 BLAS CUDAmedium-dis12766.267.751.270.40f52e74d
  • PR
GPUConfigModelThEnc.Dec.Bch5PPCommit
GTX 1660AVX2 BLAS CUDAtiny1105.091.190.720.20c8b3bc6
GTX 1660AVX2 BLAS CUDAtiny-q5_01107.150.960.340.21c8b3bc6
GTX 1660AVX2 BLAS CUDAtiny-q5_11107.120.960.330.21c8b3bc6
GTX 1660AVX2 BLAS CUDAbase1232.681.731.180.33c8b3bc6
GTX 1660AVX2 BLAS CUDAbase-q5_01238.911.420.490.34c8b3bc6
GTX 1660AVX2 BLAS CUDAbase-q5_11238.631.440.480.34c8b3bc6
GTX 1660AVX2 BLAS CUDAsmall1921.256.144.091.02c8b3bc6
GTX 1660AVX2 BLAS CUDAsmall-q5_01899.083.481.271.03c8b3bc6
GTX 1660AVX2 BLAS CUDAsmall-q5_11899.343.441.221.03c8b3bc6
GTX 1660AVX2 BLAS CUDAmedium12891.5818.219.182.53c8b3bc6
GTX 1660AVX2 BLAS CUDAmedium-q5_012792.707.892.942.47c8b3bc6
GTX 1660AVX2 BLAS CUDAmedium-q5_112792.827.942.802.48c8b3bc6
GTX 1660AVX2 BLAS CUDAmedium-dis12890.405.221.460.42c8b3bc6

@Alvarocda
Copy link

Is there any prediction on when this PR will be merged?
It is working correctly in the tests I did with my Quadro T1000

@RaitoBezarius
Copy link

RaitoBezarius commented Oct 9, 2024

I tested this PR on a Compute Capability 3.7 GPU, and it doesn't seem to be supported, FWIW. I assume this is because there's a mixing between FP16 and FP32, I guess.

Sign up for free to join this conversation on . Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants