Prompt Engineering & Evaluation Word Search
HardFind each term hidden in the grid. Selecting a word reveals its definition and a link to study it in depth.
Y
J
Y
I
E
L
N
F
K
T
Q
T
H
U
J
W
L
T
C
U
A
C
I
Q
D
W
N
W
W
A
B
B
E
K
Z
V
M
N
Q
K
Z
P
K
G
O
E
P
N
B
R
E
D
J
S
H
G
X
E
V
C
R
G
G
L
O
N
R
E
A
S
O
N
I
N
G
T
Q
I
E
V
A
H
C
I
F
T
R
B
S
Z
S
M
N
U
H
M
J
T
O
H
S
W
E
F
Z
C
E
E
V
C
U
I
I
V
L
O
D
J
M
D
O
T
E
T
H
H
N
O
J
V
P
I
R
S
A
R
S
R
S
P
M
Z
N
I
P
Z
B
U
U
T
E
Y
T
O
I
T
O
H
S
O
R
E
Z
W
S
M
S
D
C
L
F
S
R
O
U
G
E
N
A
T
J
R
D
K
A
E
R
B
L
I
A
J
K
C
L
D
I
M
F
Q
L
D
U
A
H
B
J
G
G
C
X
B
Y
C
L
Y
M
U
D
F
P
M
H
F
L
Drag across letters, or tap the first and last letter. On a keyboard, use arrows + Enter.
Advertisement
Terms in this set
- Prompt Engineering The practice of designing prompts to get reliable, high-quality model output.
- Zero-shot Prompting a model to perform a task with no examples provided.
- Few-shot Prompting a model with a few examples to demonstrate the desired output.
- Chain-of-Thought Prompting a model to show step-by-step reasoning before its final answer.
- System Prompt A high-level instruction that sets a model's role, tone, and rules for a conversation.
- Prompt Injection An attack that smuggles instructions into input to override the model's intended behavior.
- Jailbreaking Crafting prompts that bypass a model's safety guardrails to elicit disallowed output.
- ROUGE A recall-oriented metric that compares generated summaries to reference text by overlap.
- BLEU A precision-oriented metric for evaluating machine translation quality.
- BERTScore An evaluation metric that compares texts by embedding similarity rather than exact overlap.
- Human Evaluation Having people rate model output for quality, relevance, or safety.
- Agent An LLM-driven system that plans steps and calls tools or APIs to complete tasks.