Prompt Engineering & Evaluation Word Search

Hard

Find each term hidden in the grid. Selecting a word reveals its definition and a link to study it in depth.

Y

J

Y

I

E

L

N

F

K

T

Q

T

H

U

J

W

L

T

C

U

A

C

I

Q

D

W

N

W

W

A

B

B

E

K

Z

V

M

N

Q

K

Z

P

K

G

O

E

P

N

B

R

E

D

J

S

H

G

X

E

V

C

R

G

G

L

O

N

R

E

A

S

O

N

I

N

G

T

Q

I

E

V

A

H

C

I

F

T

R

B

S

Z

S

M

N

U

H

M

J

T

O

H

S

W

E

F

Z

C

E

E

V

C

U

I

I

V

L

O

D

J

M

D

O

T

E

T

H

H

N

O

J

V

P

I

R

S

A

R

S

R

S

P

M

Z

N

I

P

Z

B

U

U

T

E

Y

T

O

I

T

O

H

S

O

R

E

Z

W

S

M

S

D

C

L

F

S

R

O

U

G

E

N

A

T

J

R

D

K

A

E

R

B

L

I

A

J

K

C

L

D

I

M

F

Q

L

D

U

A

H

B

J

G

G

C

X

B

Y

C

L

Y

M

U

D

F

P

M

H

F

L

Drag across letters, or tap the first and last letter. On a keyboard, use arrows + Enter.

Terms in this set

Prompt Engineering The practice of designing prompts to get reliable, high-quality model output.
Zero-shot Prompting a model to perform a task with no examples provided.
Few-shot Prompting a model with a few examples to demonstrate the desired output.
Chain-of-Thought Prompting a model to show step-by-step reasoning before its final answer.
System Prompt A high-level instruction that sets a model's role, tone, and rules for a conversation.
Prompt Injection An attack that smuggles instructions into input to override the model's intended behavior.
Jailbreaking Crafting prompts that bypass a model's safety guardrails to elicit disallowed output.
ROUGE A recall-oriented metric that compares generated summaries to reference text by overlap.
BLEU A precision-oriented metric for evaluating machine translation quality.
BERTScore An evaluation metric that compares texts by embedding similarity rather than exact overlap.
Human Evaluation Having people rate model output for quality, relevance, or safety.
Agent An LLM-driven system that plans steps and calls tools or APIs to complete tasks.