Wie man ChatGPT beibringt, bessere Befehle für Midjourney zu erstellen

Dies ist kein echtes Foto. Das Bild wurde von einer Bild-KI namens Midjourney erstellt. Der dazupassende Prompt wurde von einer KI namens ChatGPT generiert und lautete: "Capture the excitement of the bustling Italian streets as a young man rides his Vespa through the winding roads. Use a wide-angle lens to showcase the beautiful Italian architecture in the background and capture the sense of movement by setting a shutter speed of 1/1000 sec. Experiment with the aperture to achieve a shallow depth of field and blur the background. Consider the photographic style of Henri Cartier-Bresson, known for his candid street photography, and aim to capture the decisive moment of the man weaving through the crowds. Don't forget to set your camera to shoot in RAW format to bring out the rich colors and details of the scene. --ar 16:9"

Mit dem Einzug von generativen künstlichen Intelligenzen wie der Text-KI ChatGPT sowie den Bildgeneratoren Midjourney, Dall-E und Stable Diffusion in den Arbeitsalltag wächst auch die Sorge, dass ebendiese KIs so manchen Kreativjob ersetzen könnten. Erste Beruhigungsversuche gingen hier in die Richtung, dass man dafür bald sogenannte "Prompt-Designer" brauchen werde – also Expertinnen, welche die KI mit den richtigen Befehlen ("Prompts") füttern, sodass diese auch das richtige Ergebnis ausspucken.

Ein Experiment des STANDARD wirft nun die Frage auf, ob dieser Job nicht schon wieder obsolet ist – weil er nämlich ausgerechnet durch eine KI ersetzt werden könnte. So ist es möglich, ChatGPT darauf zu trainieren, bessere und ausführlichere Prompts für Midjourney zu erstellen. Dabei führen unterschiedliche Wege zum Ziel, eine grobe Anleitung wird an dieser Stelle aber gerne geteilt.

Den richtigen Kontext festlegen

Begonnen wird das Projekt durch das Aufrufen von und Einloggen bei ChatGPT. Anschließend wird ein neuer Chat mit der KI gestartet. Das ist wichtig, da ChatGPT immer im Kontext des jeweiligen Chats lernt und wir die KI anschließend mit etlichen Informationen füttern werden.

So ist es wichtig, der KI zuerst den Kontext zu vermitteln, in dem sie agiert. Sie soll sich also etwa als Fotograf oder als Grafikdesignerin ausgeben. Eine Anweisung wäre etwa: "Imagine that you are a graphic designer writing prompts for the generation of realistic photos with midjourney."

Zudem muss ChatGPT zuerst lernen, was Midjourney überhaupt ist. Dazu können zum Beispiel auch Informationen von Wikipedia hinzugefügt werden. ChatGPT lernt dadurch, dass Midjourney eine generative Bild-KI ist und auf die entsprechenden Befehle hin Bilder erstellt. Anschließend kann ChatGPT mit Informationen über die Prompts und Parameter gefüttert werden, die es auf der offiziellen Website von Midjourney gibt.

All diese Informationen sollten ChatGPT in einer Art dargebracht werden, wie man sie einem etwas dümmlichen Kind erklären würde: Das Copy-Pasten der jeweiligen Informationen in das Chatfenster sollte mit dem Hinweis versehen werden, dass hier Informationen bereitgestellt werden, die zum Lernen dienen. Und dass erwartet wird, dass diese Informationen später integriert werden, wenn man nach einem Midjourney-Prompt fragt.

Weitere Inputs und erste Ergebnisse

Ab diesem Zeitpunkt kann das Gespräch mit ChatGPT immer weiter verfeinert werden, indem weitere Informationen hinzugefügt werden. So wurde in unserem Versuch eine Liste verschiedener Kameratypen in den Chat geladen – erneut garniert mit dem Hinweis, dass das Erlernte in die eingeforderten Prompts integriert werden soll.

Sind diese Schritte einmal gesetzt, so kann ChatGPT mit einer simplen Aufforderung gefüttert werden und spuckt anschließend einen deutlich ausführlicheren Prompt für Midjourney aus. So brachte der simple Befehl "Generate an image midjourney prompt of a viking warrior" den folgen detaillierten Prompt hervor:

"As you gaze upon the image of the fierce Viking warrior before you, you can't help but feel a sense of awe at their intimidating presence. The warrior's muscular arms are adorned with intricate tattoos, marking their strength and bravery in battle. Their long, flowing hair and beard give them a wild and untamed appearance, as if they were born to conquer and dominate.

As you continue to study the image, you notice the warrior's battle-worn armor and weapons. The chainmail clinks softly as the warrior shifts their weight, ready to take on any challenger. The sharp edges of their sword and axe glint menacingly in the light, evidence of the many battles they have fought and won.

But despite their fearsome reputation and battle-hardened demeanor, there is a quiet confidence in the Viking warrior's eyes. They know they are a force to be reckoned with, and that no foe can stand in their way. As you look upon this image of a true Viking warrior, you can't help but feel inspired by their strength, courage, and unyielding determination."

Was hier auffällt: Die Ansagen sind noch recht unklar, und auch die eingeforderten Details zur Kamera fehlen. Auf die Aufforderung hin, dass diese inkludiert werden sollen und die Ansage präzise sein soll, bringt ChatGPT den folgenden Prompt hervor:

"As you stand before the image of a viking warrior, captured through a Canon EOS R6 with a 24-105mm f/4 lens, you can feel the power and strength emanating from the fierce warrior depicted. The warrior stands tall, with broad shoulders and a muscular build, holding a sword in one hand and a shield in the other. His face is fierce and determined, with a thick beard and long hair, suggesting years of battle experience. The details of his clothing and armor are impressive, with intricate patterns and textures adorning his leather and metal gear. You can almost hear the clanking of the metal and feel the weight of the shield in your own hands. This image captures the essence of a viking warrior, proud and fierce, ready to face any challenge that comes his way."

Wird dieser Befehl anschließend in Midjourney eingegeben, so lassen sich bereits teils fotorealistische Bilder erstellen.

Lernen von den Besten

Ab diesem Punkt geht die Arbeit ins Feintuning und in das stetige Optimieren über. Denn das World Wide Web ist bereits prall gefüllt mit diversen Blogs, Websites und Youtube-Channels, welche diverse Prompt-Elemente zum Verwenden im Midjourney empfehlen. Diese können verwendet werden, um ChatGPT weiter zu trainieren.

So enthält ein Blog etwa Informationen zu den verschiedenen Kamerawinkeln, ein anderer bietet unter anderem eine Sammlung verschiedener Kunstrichtungen. Und wieder ein anderer liefert eine Auflistung verschiedener Künstler und Fotografinnen, die als Liste in den Chat mit der KI geladen werden können.

Das Ergebnis ist, das ChatGPT anschließend Midjourney anweisen kann, ein Streetstyle-Foto im Stil eines bestimmten Fotografen zu produzieren. So wurde im Experiment gebeten, auf Basis des Erlernten einen Prompt für ein Streetstyle-Foto zu erstellen, das eine Frau beim Einkaufen in Bangalore zeigt. ChatGPTs Prompt dazu:

"As you continue your journey through the bustling streets of Bangalore, you come across a vibrant market where locals gather to buy fresh produce. You notice a woman with a colorful sari, skillfully weaving her way through the crowds while carrying a basket full of fruits and vegetables. Inspired by the street style photography of Henri Cartier-Bresson, you quickly grab your trusty DSLR camera equipped with a 50mm lens and set the aperture to f/2.8 to achieve a shallow depth of field. You also attach a polarizing filter to enhance the colors and reduce glare. As the woman stops to inspect some ripe tomatoes, you seize the moment to capture a candid shot of her immersed in her task, surrounded by the colorful market stalls and bustling crowds. You adjust your camera settings to a fast shutter speed of 1/500s to freeze the motion and capture the energy of the scene. With the 16:9 aspect ratio in mind, you compose the shot to emphasize the woman's determination and the vibrant atmosphere of the market."

Und die von Midjourney generierten Bilder:

Außerdem ist es möglich, ChatGPT gewisse technische Anweisungen zu geben. Etwa dass am Ende eines jeden Prompts der Befehl "--ar 16:9" stehen soll. Dieser bewirkt, dass Midjourney das Bild im 16:9-Format ausgibt. Auf den Befehl "Now, generate a midjourney prompt for a street photo of a young man riding a vespa through the streets of Italy" erstellte ChatGPT wiederum den folgenden Prompt:

"Capture the excitement of the bustling Italian streets as a young man rides his Vespa through the winding roads. Use a wide-angle lens to showcase the beautiful Italian architecture in the background and capture the sense of movement by setting a shutter speed of 1/1000 sec. Experiment with the aperture to achieve a shallow depth of field and blur the background. Consider the photographic style of Henri Cartier-Bresson, known for his candid street photography, and aim to capture the decisive moment of the man weaving through the crowds. Don't forget to set your camera to shoot in RAW format to bring out the rich colors and details of the scene. --ar 16:9"

Die daraus resultierenden Bilder von Midjourney sehen folgendermaßen aus:

Und dann gibt es freilich noch die Möglichkeit, ChatGPT mit den Prompts aus dem offiziellen Midjourney-Showcase zu trainieren. Auch dadurch lernt ChatGPT von den Besten, wie es bessere Arbeit machen kann.

ChatGPT ist vergesslich

Was an dem zuletzt genannten Ergebnis wiederum auffällt: Hier wurde seitens ChatGPT zwar der Stil des entsprechenden Fotografen eingefordert und eine passende Belichtungszeit gewählt, dafür wurden die Wahl der Kamera und des Objektivs unterschlagen. Der Rückschluss daraus: Je mehr ChatGPT lernt, desto mehr selektiert es, welche Aspekte es in den finalen Prompt schaffen. Hier sind regelmäßiges Feintuning und erneutes Einpflegen von Informationen das A und O.

Ist der Job des Prompt-Designers also schon wieder tot, so wie anfangs prognostiziert? Nein, ist er nicht – er verändert sich nur stetig und wird den entsprechenden Fachkräften ein sehr hohes Maß an Flexibilität abverlangen. Viel wichtiger dürfte aber ohnehin ein anderer Job werden, der im Zuge des KI-Booms entsteht: jener des KI-Ethikers. Denn wenn es immer einfacher wird, fotorealistische Fake-Bilder zu erstellen, dann drängt sich auch die Frage auf, was davon ethisch noch vertretbar ist und was nicht. (Stefan Mey, 28.3.2023)

Anleitung