build log

Hello! I finally put the triple 3090 build in a case. It was a bit of struggle, and I cut the fuck out of my hands, but I got it done! Quite happy with the result, and it's good piece of mind that a water spill won't fry $3,000 in hardware.


The lads

The lads pt2


CPU + Mobo in.

Mobo closeup

Empty Right side.


GPU's in.

GPU's in. Pt2.

Close up on the vertically mounted GPU.


Finished build

Finished build pt2


info

Specs:
ASRock X670E PG Lightning
Ryzen 5 7600 (CPU inference is cringe)
128GB DDR5 6000mhz
1500w Corsair PSU
Anidees Raider XL
6TB storage
RTX 3090 MSI X Trio
RTX 3090 GIGABYTE OC
RTX 3090 EVGA FTW3


Inference information:
All tests done using TabbyAPI, with cached prompts. (Generate once, then regenerate)
Mixtral-8x7b @ 8bpw | 24 Tokens per second, 13000 context.
goliath-120b @ 4.5bpw | 5.4 Tokens per second, 4096 Context.
Xwin-70b @ 7bpw | 13.1 Tokens per second, 4096 context

I wanted to test the slowdown on splitting a model across GPU's as a control.
Mistral-7b @ fp(Split) | 24 Tokens per second, 13000 context
Mistral-7b @fp(one card) | 41 tokens per second, 13000 context
It would seem there is a signifcant drop in speed. The cards are not NVLinked, and the cards are not all in 16x lanes. Which is probably why the drop off is so massive. It's not a big deal though, almost every model is still at a useable speed. Although if we end up getting good 175b's, I might look into getting a better motherboard with more lanes.


GPU Draw during inference (Goliath):

What's nice about GPU inference is that it's pretty hard to make your computer and thus your space hot with it. It only flashes power through the cards for a few seconds at a time. Even less so with faster, smaller models like Mixtral.
The speed is mostly determined by your "main" graphics card, which in my case is the MSI X Trio. I made sure this card was not used for mining, and was refurbished, even if it did cost me an extra $70, it was good peice of mind. The other two... well they are just glorified ram sticks, they aren't doing much compute.

Edit Report
Pub: 05 Jan 2024 09:15 UTC
Edit: 31 Aug 2024 09:27 UTC
Views: 1648