Visual Muse | e-Discovery Team

Evolution of DALL·E with Demonstrations of its Current Text to Image Abilities

August 19, 2024

Ralph Losey. Published August 20, 2024.

The images shown here are to demonstrate some of the current abilities of DALL-E. They were all created by Ralph Losey using his custom GPT, Visual Muse: illustrating concepts with style, which is driven by OpenAI’s DALL-E software. Ralph has chosen one of his favorite types of images for this demonstration – “optical illusions” – since he does not often get a chance to use this image type in his blog. These images will be shown in a variety of different artistic styles, especially that of Salvador Dali, who is known for his love of optical illusions.

Left click on any image to see it alone in full size. Careful, don’t fall in!

Introduction

The first images to demonstrate DALL-E capabilities shown above are a type of “Op Art” using a classic black and white geometric style. All illustrations were created in the same day, about four hours, with about half coming out right – the way Ralph wanted – on the first try. A 50% precision rate like this is unusually high for him. Many of the images were not used to save space. Ralph’s workflow then includes use of Photoshop for final tuning and size changes. The research and writing itself took about three hours of Ralph’s time, with about 50% help from ChatGPT 4o – omni using a cyborg method. From Centaurs To Cyborgs: Our evolving relationship with generative AI (e-Discovery Team, 4/24/24).

The development of DALL·E, a generative AI model by OpenAI, from its first release in January 2021 to today, represents a significant achievement in the field of AI-driven image generation. The broad outlines of the development will be discussed in this article, but all images shown will be from the latest version of August 2024. For more examples of what DALL-E is capable of, suggest you look at the hundreds of Ralph’s illustrations in e-Discovery Team blog. A few are created using OpenAI’s main competitor in image generation, Midjourney.

Ralph’s blog images usually illustrate the topics discussed in the accompanying text. For Losey they represent a new form of expression where words and images and hyperlinks form a multimodal whole, created by a hybrid combination of Man and Machine. In his blog the human – Ralph – does almost all the work on the text, including most of the research, and the AI does most of the work on the illustrations. Those familiar with Ralph’s work in e-Discovery know this mirrors his work with multimodal hybrid search techniques, i.w. Predictive Coding. They are described in detail in the free TAR Course linked to at the top the blog.

Optical Illusion in Surrealistic Geometric style by Losey using his Visual Muse.

Combined artistic styles of watercolor, surreal and geometric by Ralph Losey

The History of OpenAI’s Generative AI Image Tool: DALL-E

1. Initial Release: DALL·E 1 (January 2021)

The initial release of DALL·E was in January 2021. The name is a homage to the famous Twentieth Century artist Salvador Dali. OpenAI’s release of DALL-E was a breakthrough moment for generative models to create images from text descriptions. DALL·E 1 utilized a modified GPT-3 architecture to generate images from text prompts. Although remarkable compared to what others had been able to achieve before, this first model exhibited limitations in image coherence, resolution, and the ability to accurately represent complex scenes. The underlying transformer architecture was effective in generating diverse and creative outputs, but the model struggled with maintaining consistency and realism across different elements of the image.

Optical illusion in style of Salvador Dali’s soft surrealistic style used in his’s 1931 painting *The Persistence of Memory*. Generated by DALL-E and Losey using Visual Muse.

2. Introduction of DALL·E 2 (April 2022)

In April 2022, OpenAI released DALL·E 2, which introduced several critical improvements:

• Enhanced Resolution and Image Quality: DALL·E 2 featured improvements in the model’s ability to generate higher-resolution images with finer details. This was achieved through the refinement of the underlying generative process, likely involving improvements in the training dataset and the introduction of more sophisticated images for training.

• Improved Compositional Understanding: The model demonstrated enhanced capabilities in handling complex prompts that required the accurate rendering of multiple objects and interactions. This improvement can be attributed to advancements in the model’s attention mechanisms, enabling better spatial awareness and coherence in generated images.

• Advanced Control Mechanisms: Users were provided with more granular control over image attributes such as style, color, and composition. This was likely facilitated by the integration of additional conditioning layers or modules within the model architecture, allowing for more targeted manipulation of the generated outputs.

Image in style of Salvador Dali’s 1954 painting *The Disintegration of the Persistence of Memory*. Generated by DALL-E and Losey.

3. August 2023 Update: Refinement and Realism

The August 2023 update was the last full training update to the model. This made possible significant refinements to DALL·E’s functionality, focusing on realism, detail, and user customization:

• Increased Realism and Texture Fidelity: The model’s ability to generate photorealistic images was markedly improved, particularly in rendering textures, lighting, and shadows. These enhancements suggest advancements in the model’s ability to learn and apply high-fidelity visual patterns from training data, potentially through the use of more complex loss functions and training techniques that prioritize visual accuracy.

• Enhanced Text Integration: DALL·E’s capability to incorporate textual elements within images saw notable improvement. This likely involved the refinement of text-to-image embedding processes and a better alignment between text tokens and their corresponding visual representations within the model.

• User Customization: The update provided users with increased control over specific aspects of image generation, such as adjusting the perspective or selecting a particular art style. This was achieved through the introduction of more sophisticated user interfaces and the likely addition of new conditional input mechanisms within the model.

• Feedback-Driven Optimization: The update also integrated feedback from users, leading to iterative adjustments that enhanced the model’s overall performance and usability. This process likely involved fine-tuning the model on user-provided data or leveraging reinforcement learning techniques to align the model outputs more closely with user preferences.

Optical illusion in style of Salvador Dali during his Atomic Mysticism period by Losey

4. Ongoing Enhancements (Post-August 2023)

After the August 2023 update, DALL·E has continued to evolve with ongoing technical enhancements:

• Refinement of Image Generation: Continuous improvements have been made to the model’s image generation capabilities, particularly in handling edge cases and complex scene compositions. These refinements suggest iterative updates to the model’s training regimen, potentially involving more diverse and higher-quality datasets.

• Increased Processing Efficiency: The model has seen improvements in processing speed, reducing latency in image generation. This is indicative of optimizations in the model’s computational efficiency, likely through algorithmic refinements or the adoption of more efficient neural network architectures.

• Advanced Control Features: The introduction of more nuanced control features has provided users with the ability to manipulate image attributes with greater precision. These features likely involve the integration of additional conditioning factors within the model, allowing for more detailed user input.

Optical illusion in a combination of Escher and Psychedelic art styles. By Ralph Losey and Visual Muse.

5. Interface and Usability Enhancements

In addition to technical improvements, there have been significant updates to the DALL·E user interface and overall usability:

• Improved User Interface: The interface has been refined to offer a more intuitive user experience, facilitating easier access to advanced features. This likely involved the integration of better design principles and user experience research into the interface development process.

• Accessibility Enhancements: Updates have been made to improve accessibility, ensuring that the platform is usable by a broader audience, including individuals with disabilities. This may involve the adoption of accessibility standards in interface design and the introduction of assistive technologies.

• Collaborative Functionality: The platform has introduced features that support collaborative use cases, enabling multiple users to contribute to the image generation process. This functionality suggests the integration of multi-user input mechanisms and enhanced session management capabilities.

6. Industry-Specific Tools and Content Moderation

Recent developments have also focused on the introduction of tools tailored to specific industries and the enhancement of content moderation mechanisms:

• Industry-Specific Tools: DALL·E has introduced features designed to meet the needs of particular industries, such as fashion, architecture, and graphic design. These tools likely involve the addition of domain-specific models or fine-tuning the base model on industry-specific datasets.

• Content Moderation Enhancements: There have been improvements in content moderation, ensuring that generated images adhere to ethical standards and legal requirements. This likely involves the integration of content filtering algorithms and the use of human-in-the-loop processes to monitor and curate outputs. Sometimes in Ralph’s opinion they go overboard in policing potential copyright violations and other guardrails. Unlike DALL-E’s main competitor, Midjourney, they do not have instant appeals and, where warranted, reversals. That can be annoying.

Optical illusion using surreal geometric forms and dreamlike qualities. Ralph Losey using Visual Muse.

Conclusion: Ongoing Development and Legal Implications

The evolution of DALL·E underscores the rapid pace of advancement in generative AI technology. This is a powerful, fun new tool for all creators to make their own images and play with the incredible abilities of generative AI. If you just stick to words and computer code, you will miss out.

Plus, it is getting better and better every month. The kind of things you can do with it now are mind bending. You may think it is all an optical illusion, but it is not. It is a great time to be alive. For me it is a relaxing hobby. That’s one reason I made, often update and freely share the Visual Muse custom GPT. It is at the OpenAI Custom GPT Store, along with thousands of other free GPTs to try out. If you like visual images and want to go from the beginner level to the intermediate and advanced levels of DALL-E use, this may be a good tool for you. Plus it can help teach you about artists and styles of art.

Each OpenAI update to DALL-E not only improves the creative capabilities of the model but also raises important legal and ethical implications of AI-generated content. As DALL·E continues to develop, legal professionals must remain vigilant in understanding these advancements to effectively navigate the associated legal challenges, including intellectual property rights, content moderation, and the ethical use of AI. To do that it helps to be able to use the tools yourself, at least somewhat. Generative AI has to be used to be understood. Otherwise, no matter how smart you are, your understanding will be superficial, maybe even illusory.