Focusing our “Attention” to Generative AI

[Source: Website]
Prompt and the output generated by Stable Diffusion 2, with only selective attention to given to certain words [Source: Author]

Giving a gist of VQGAN-CLIP

VGQAN-CLIP Traditional Diagram + Custom losses. NOTE: This isn’t our actual architecture. Our actual architecture is currently in progress. However the above image gives a fairly rough overview.

Something intriguing…

Our architecture when cross-pollinated with different genre input leads to nuanced outputs but, it’s still missing something.
The image generated after adding attention added more granularity to the output.
Our custom attention training pipeline.

Custom Attention Layers, when trained properly, can be used to chain different layers of complexity to your output.

Authors

Back to blog