Goodbye, graphic designers? COLE combines multiple AIs to generate editable designs on demand

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.

Graphic designers and those who rely upon them take note: a new tool is here that could seemingly disrupt the profession for good.

Called COLE, named in honor of Henry Cole, recognized as the creator of the first graphical Christmas card in 1843, the new tool allows users to type in a graphic design project idea — say, “a poster for an upcoming Winter Holiday concert with people playing instruments in warm clothes among falling snow” — and have an AI generate not only the image, but the text to support it baked in.

COLE is actually a combination of different AI models — including fine-tuned versions of Meta’s Llama2-13B, DeepFloyd IF, LLaVA1.5-13B (itself a variant of Llama), and GPT-4V — as well as the open-source graphics renderer Skia. It was developed by a team of 12 researchers at Microsoft Research Asia and Peking University.

The combination of different models was chosen because of the complexity of graphic design and the dearth of available training data on one of the field’s main formats: .SVG files. Instead, the researchers came up with a different approach: “consolidating all SVG elements and additional embellishments into one unified image layer,” then having AI extract the background layer and describe that in text.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

Learn More

The COLE team trained their background modeler AI on “100,000 high-quality raw graphic design images from the internet.”

A framework, not a product…yet

As such, COLE is more like a framework than a product for now. But the results the team got from training and combining these different AI products in the service of graphic design are pretty stunning: simply typing in text prompts, like other current text-to-image generators such as OpenAI’s DALL-E 3 or Midjourney, COLE was able to generate crisp, organized, graphic designs that combined visuals with stylized text.

The latter product is no easy feat: text baked into imagery has been challenging for most AI art generators, including leaders such as Midjourney and Stable Diffusion. DALL-E 3 can produce baked-in text, but it is not 100% accurate.

Auto-generated designs with editable text and visual elements

Even more impressively, COLE produces images with distinct editable blocks for texts and objects within the image.

This allows the daisy-chained AI programs to produce an image from scratch and if the human user doesn’t like the end result, they don’t have to go back and try and revise the entire design, nor do they have to export it to another program such as Adobe Photoshop or InDesign to erase certain elements and introduce new ones.

They can do it right within the COLE framework itself, clicking on the text box to change the text displayed or the font, as well as typing new prompts for different visual elements, turning a grocery bag from a photorealistic picture to a cartoon, for example.

Image from COLE paper showing editable elements in AI generated graphic designs. Credit: Microsoft Research Asia / Peking University

As the researchers describe the system in a paper published this week on the open access site arXiv: “A scalable, high-quality graphic design generation system should ideally require minimal effort from users, produce accurate and high-quality typography information for a variety of purposes, and offer a flexible editing space.”

With COLE, they have achieved this.

Competitive and promising results

More than that, the researchers show that the results COLE spits out are “very competitive quality… even compared to the latest DALL·E 3.”

The researchers tested COLE on 200 different graphic design projects, from advertisements to event promotions and marketing materials, posting all the prompts they used in a spreadsheet here.

In addition, COLE “achieves the best quality when generating covers & headers or posters,” and is of course more capable than DALL-E 3 and other rivals when it comes to editing specific elements within the image, such as text and distinct objects.

Yet COLE is no magic bullet for graphic design — at least, not yet. The system does not allow users to change the “arrangement” or placement of its typography block, nor does it yet include multiple typography blocks placements, and it only allows for one color of typography per image. However, the researchers write that “addressing these issues is a direction we’d like to pursue in our future work.”

Good graphic design is something many people take for granted, but one done expertly, it can be an art unto itself.

Hence why people collect film and concert posters and hang them in their homes and offices — not only to remember fun experiences they may have attended, and show off their taste or allegiances, but also because said posters are aesthetically pleasing and beautiful to look at. The same is true for even more functional graphic designs, such as those appearing on road signs or license plates.

Does COLE threaten to put graphic designers out of work? Yes and no. The researchers specifically designed it to produce imagery with editable fields so that it would “allow users to further refine the output, integrating human expertise when necessary,” suggesting that graphic design training would still be useful in getting the best results from the AI framework.

However, they also note that “a task in graphic design generation that typically requires a high degree of professional expertise to develop effective prompts.” In comparison to other text-to-image generators such as DALL-E 3, which the researchers cite by name, “our COLE system…is capable of generating superior quality graphic design images while only necessitating simple user intention.”

Put another way: the researchers seem to believe that COLE would allow those without graphic design training or expertise to be able to generate high-quality designs on par with trained professionals.

Of course, this “graphic design tool for the masses” approach has already been put forth by other companies, including Adobe, and more recently, Canva. Therefore, COLE would seem to be more of a threat, or perhaps one a day a compliment (such as a feature) to those companies and their offerings.

For now, COLE is not publicly available, but researchers say a demo is coming soon to their Github project webpage.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Carl Franzen

Source link