OpenAI、ChatGPTから画像生成できる「DALL-E 3」発表─画像生成に適したプロンプトを会話から自動生成

OpenAIは米国時間9月20日、画像生成AIの「DALL-E 3」を発表した。現在リサーチプレビュー中で、2023年10月上旬に、ChatGPTの有料会員であるPlusとEnterpriseのユーザーへ提供を予定。今秋以降には、API経由とラボでの提供も開始するという。

「DALL-E 3」は、テキストから画像を生成するDALL-Eの最新バージョン。同社の対話型AI「ChatGPT」と機能統合される点が特徴だ。ChatGPT上で、テキストプロンプトを入力して画像生成を行う。

ユーザーが入力したプロンプトにもとづき、ChatGPTが指定の画像生成に適した「DALL-E 3」用プロンプトを自動生成して指示を調整する。長文のプロンプトによる複雑なイメージでも生成可能で、ChatGPT経由で画像の微修正も行える。

D3_sub1

D3_sub3 A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.

D3_sub4 A vibrant yellow banana-shaped couch sits in a cozy living room, its curve cradling a pile of colorful cushions. on the wooden floor, a patterned rug adds a touch of eclectic charm, and a potted plant sits in the corner, reaching towards the sunlight filtering through the window.

OpenAIは、従来の画像生成AIには「単語や説明を無視する傾向」があり、ユーザーはプロンプトエンジニアリングの学習を強いられていたと述べている。そのため「DALL-E 3」では、ユーザーが入力したテキストに沿って画像生成に適したプロンプトが自動生成され、画像生成する能力を飛躍的に高めたという。

以下の画像は、前身の「DALL-E 2」（左）と「DALL-E 3」（右）で同じプロンプトを使って生成したというが、全体イメージやダンク表現などに違いが表れている。

D3_sub2

▲プロンプト； “An expressive oil painting of a basketball player dunking, depicted as an explosion of a nebula.”

また今回から、クリエイティブ・コントロールを追加する。「DALL-E 3」では、存命中のアーティストのスタイルで画像生成するリクエストは拒否するよう設計。さらにクリエイターは、自分が手がけた画像を、画像生成モデルのトレーニングから除外できるようになった。除外にあたっては、専用フォームでの申請が必要だ。

安全性などへの配慮では、DALL-E 2同様、暴力的・憎悪的など有害な画像生成機能を制限。新たに、画像がAIで生成されたものと識別できるような方法も研究中という。後日情報をシェアするとのことで、現時点で詳細は明らかにしていない。

sub5