Artificial Intelligence (AI)
Core ConceptsA broad field of computer science focused on building systems capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.
Attention Mechanism
Model ArchitectureA technique that allows models to focus on relevant parts of the input when generating each part of the output. Enables models to capture long-range dependencies in data.
Batch Processing
InferenceGenerating multiple outputs in a single request or processing multiple inputs together. More efficient than making individual requests for each generation.
Bias
SafetySystematic errors in AI outputs due to biases in training data or model design. Can result in unfair or inaccurate representations of certain groups or concepts.
CFG Scale (Classifier-Free Guidance)
PromptingA parameter that controls how closely the generated image follows the prompt. Higher values produce outputs more aligned with the prompt but may reduce diversity and quality.
Checkpoint
TrainingA saved state of a model's weights at a point during or after training. Allows resuming training, comparing versions, or using the model at different stages of development.
CLIP (Contrastive Language-Image Pre-training)
Model ArchitectureA model trained by OpenAI that learns to associate images with text descriptions. Used in image generation to guide the creation process based on text prompts.
Content Safety
SafetySystems and practices to prevent AI from generating harmful, illegal, or inappropriate content. Includes filters, classifiers, and model training restrictions.
ControlNet
Image GenerationA neural network architecture that adds conditional control to diffusion models. Allows guiding image generation with additional inputs like edge maps, depth maps, poses, or sketches.
DALL-E
ModelsOpenAI's image generation model series. DALL-E 3 is known for excellent prompt understanding and safe, high-quality outputs. Available through OpenAI's API.
Deep Learning
Core ConceptsA subset of machine learning that uses neural networks with many layers (hence 'deep') to learn complex patterns in data. Powers most modern AI image, video, and audio generation.
Diffusion Model
Generative AIA type of generative model that learns to create images by reversing a gradual noising process. Starting from pure noise, the model iteratively removes noise to generate coherent images. Used by FLUX, Stable Diffusion, and DALL-E 3.
DreamBooth
TrainingA fine-tuning technique that teaches a model to generate specific subjects (people, objects, styles) from just a few example images. Associates a unique identifier token with the subject.
Fine-tuning
TrainingAdapting a pre-trained model to a specific task or domain by training it further on a smaller, targeted dataset. More efficient than training from scratch.
FLUX
ModelsA family of high-quality image generation models by Black Forest Labs. Known for excellent prompt following, text rendering, and diverse outputs. Available in Schnell (fast), Dev (balanced), and Pro (quality) variants.
Frame Interpolation
Video GenerationGenerating intermediate frames between existing video frames to increase frame rate or create smooth slow-motion effects. AI predicts the motion between frames.
GAN (Generative Adversarial Network)
Generative AIA generative model architecture where two neural networks (generator and discriminator) compete against each other. The generator creates fake samples while the discriminator tries to distinguish them from real ones.
Generative AI
Generative AIAI systems that can create new content, including images, text, audio, and video. These models learn patterns from training data and generate novel outputs based on prompts or inputs.
Image-to-Image
Image GenerationAI generation that takes an existing image as input along with a prompt to create a modified or transformed version. Used for style transfer, editing, and variations.
Image-to-Video
Video GenerationConverting a static image into a video by generating motion and animation. The AI predicts plausible movement based on the image content.
Inference
InferenceThe process of using a trained model to generate outputs from new inputs. When you call an AI API to generate an image, you're running inference on the model.
Inpainting
Image GenerationThe process of filling in or replacing specific regions of an image while maintaining coherence with the surrounding content. Used for removing objects, editing specific areas, or extending images.
Latency
InferenceThe time between sending a request and receiving a response. For AI generation, latency includes queue time, inference time, and network transfer.
Latent Space
Generative AIA compressed representation of data learned by AI models. In image generation, manipulating points in latent space allows for smooth transitions between generated images and control over image attributes.
LoRA (Low-Rank Adaptation)
TrainingA technique for efficiently fine-tuning large models by training small adapter layers rather than all model weights. Produces small files that can be combined with base models.
Midjourney
ModelsA popular AI image generation service known for its artistic, stylized outputs. Accessed through Discord. Not available through APIs.
Music Generation
Audio GenerationAI systems that create original music compositions from prompts, existing audio, or musical notation. Can generate melodies, harmonies, and full arrangements.
Neural Network
Core ConceptsA computing system inspired by biological neural networks. Consists of interconnected nodes (neurons) organized in layers that process information and learn patterns from data.
Outpainting
Image GenerationExtending an image beyond its original boundaries by generating new content that seamlessly continues the existing image. Also called image extension or uncropping.
Prompt
PromptingThe text input given to a generative AI model to guide what it creates. Effective prompting is key to getting desired results from AI generation.
Prompt Engineering
PromptingThe practice of crafting effective prompts to achieve desired outputs from AI models. Involves understanding model behavior, using specific keywords, and iterative refinement.
Seed
InferenceA number that initializes the random number generator, allowing reproducible generations. Using the same seed, prompt, and settings produces identical outputs.
Speech-to-Text (STT)
Audio GenerationAI systems that transcribe spoken audio into written text. Also called automatic speech recognition (ASR). Used for transcription, subtitles, and voice interfaces.
Stable Diffusion
ModelsAn open-source latent diffusion model for image generation developed by Stability AI. The most widely used open model, with many community fine-tunes and extensions.
Temporal Consistency
Video GenerationThe coherence of generated video across time—ensuring objects, lighting, and style remain consistent from frame to frame without flickering or sudden changes.
Text-to-Image
Image GenerationAI systems that generate images from text descriptions (prompts). Models like FLUX, DALL-E, and Midjourney interpret natural language to create corresponding visual content.
Text-to-Speech (TTS)
Audio GenerationAI systems that convert written text into natural-sounding speech. Modern TTS can clone voices, control emotion, and produce highly realistic audio.
Text-to-Video
Video GenerationAI systems that generate video clips from text descriptions. These models create coherent motion and temporal consistency across frames while interpreting prompts.
Token
PromptingThe basic unit of text that AI models process. Words, parts of words, or punctuation marks are converted to tokens. Models have limits on the number of tokens they can process.
Training
TrainingThe process of teaching a machine learning model by showing it examples and adjusting its parameters to minimize errors. Requires large datasets and significant computational resources.
Transformer
Model ArchitectureA neural network architecture that uses self-attention mechanisms to process sequences. Powers both large language models and modern image/video generators. Key innovation: parallel processing of entire sequences.
U-Net
Model ArchitectureA neural network architecture originally designed for image segmentation, widely used in diffusion models for the denoising process. Features skip connections between encoder and decoder layers.
Upscaling
Image GenerationIncreasing the resolution of an image using AI to add detail and clarity. AI upscalers can generate plausible details that weren't in the original low-resolution image.
VAE (Variational Autoencoder)
Generative AIA neural network architecture that learns to compress data into a latent space and reconstruct it. VAEs are used in diffusion models to work in compressed latent space rather than pixel space, improving efficiency.
Voice Cloning
Audio GenerationCreating a synthetic voice that mimics a specific person's voice characteristics. Requires training on audio samples of the target voice.
Watermarking
SafetyEmbedding invisible or visible identifiers in AI-generated content to indicate its synthetic origin. Helps with authenticity verification and content provenance.