Clip-fix vs. No Clip-fix

Bellow you will find two versions of this model sharing similar version numbers. From v6.5+ I will be including an optional version of the model which incorporates a patch for an issue that can arise when merging models.

The issue:

When merging models, a CLIP key called embeddings.position_ids can deviate away from it's initial position value(s). This can result in distortions/dropped chunks.

The solution:

Resetting the embeddings.position_ids back to their default position in dtype=int64.

The Result:

This is where things get a bit complicated. I would love to say that fixing the CLIP key will always results in a better output, but that's not the case. Sometimes fixing the CLIP key can actually change the models output for the worse. The results vary on a case to case bias. If a model has a horribly deviated CLIP tensor matrix then the CLIP fix will improve the models output. However, if max deviation is low and CLIP Fix is applied, the model might lose some of its uniqueness. In such a case, deciding on using the CLIP-Fixed version or the non-clip-fixed version will really depend on personal preference. I will always make my preferred version of the model the top slot in the version section bellow. In the event a merge has an incredibly broken CLIP key, I will only upload the fixed version.

General Recommendations:

Given that I will never upload a model with a noticeably broken CLIP key, the decision really comes down to your personal preference. Here's some basic observations I've noticed during testing:

The CLIP-fixed version tends to resemble standard SD v1.5 more than the non-fixed version.

The CLIP-fixed version seems to react more to tokens without attention/emphasis tags () []

The Clip-fixed version is less likely to screwup hands/eyes.

The non-fixed version tends to generate more creative and unique outputs.

The non-fixed version reacts strongly to attention/emphasis tags () []

The non-fixed version can occasionally generate outputs with messed up hands/eyes. This can be fixed with tokens such as (perfect hands:1.2) and (detailed eyes:1.2)

Note: v6.5 has a Max Deviation of: 0.00001 opposed to CLIP-fix: 0.00000
This is why I didn't feel it was necessary to exclude the non-fixed v6.5 from being uploaded.

Information based on independent testing and this research paper - https://note.com/bbcmc/n/n12c05bf109cc

Edit
Pub: 17 Feb 2023 23:22 UTC
Views: 2827