업데이트:

Motivation: Data Integrity

The CIA Triad

The CIA Triad is a foundational model in information security, outlining three core principles:

  1. Confidentiality: Is information accessible only to authorized individuals?
  2. Integrity: Is the data reliable, accurate, and has it not been improperly modified or destroyed?
  3. Availability: Do authorized users have reliable and timely access to data when they need it?

While all three are crucial, this post focuses on Integrity.


Why Do We Need Data Integrity?

Imagine sending an encrypted bank transfer instruction. If an adversary intercepts and modifies the ciphertext—even without knowing its content—the receiver might decrypt it into a completely different, yet valid-looking, instruction.

Example: An encrypted message for “Send Bob \$10” could be altered to decrypt as “Send Eve $1000”.

This shows that encryption alone doesn’t guarantee integrity. It ensures confidentiality (hiding the content), but not that the content remains unchanged. We need a separate mechanism to verify data integrity, and that’s where MAC comes in.


What is MAC?

A Message Authentication Code (MAC) is a cryptographic technique used to verify the integrity and authenticity of a message.

The process is simple: the sender and receiver share a symmetric key (a secret key). The sender calculates a short, fixed-size piece of data called a tag from the message and the secret key. The message is then sent along with this tag.

  • The receiver, having the same secret key, independently calculates the tag from the received message.
  • If the calculated tag matches the received tag, the message is considered authentic and its integrity is verified.
  • An adversary, lacking the secret key, cannot generate a valid tag for a modified or fabricated message.

Security Goal: UF-CMA

The simple definition of MAC security assumes the adversary has no prior information. However, a real-world adversary can observe many valid message-tag pairs (m, t).

This leads to a stronger security definition: UF-CMA, which stands for Unforgeability against Chosen-Message Attacks.

UF-CMA means that even if an adversary can obtain valid tags for any messages of their choosing, they still cannot forge a valid tag for any new message they haven’t requested before.

In cryptography, we always design systems assuming the most powerful adversary. If a system is secure against this worst-case scenario, it will naturally be secure against any weaker, more realistic attacks.

Properties of a MAC

A secure MAC must satisfy two fundamental properties:

  1. Correctness: If a tag $t$ is generated using key $k$ and message $m$, verifying it with the same $k$, $m$, and $t$ must always result in ‘Yes’.
  2. Unforgeability: An adversary without the key $k$ cannot produce a new message-tag pair $(m, t)$ for which Verify(k, m, t) returns ‘Yes’.

Note: This security model assumes the adversary cannot steal the key itself. Key management is a separate, but equally important, security concern.

The MAC Algorithms

A MAC scheme is composed of three algorithms:

  1. KeyGen(): Generates a random secret key $k$.
  2. Sign(k, m): Produces a tag $t$ from the key $k$ and message $m$.
  3. Verify(k, m, t): Takes the key $k$, message $m$, and tag $t$ as input and outputs ‘Yes’ or ‘No’.

Why Not Just Use a Checksum like CRC32?

Checksums and MACs have fundamentally different purposes and security levels.

Feature Checksum (e.g., CRC32) Message Authentication Code (MAC)
Purpose Detect accidental errors (e.g., network noise, data corruption). Detect malicious modifications by an adversary.
Key No secret key. The algorithm is public. Uses a secret symmetric key.
Security Level Insecure against attacks. An attacker can easily compute a new checksum for a modified message. Secure against forgery. An attacker without the key cannot compute a valid tag.

In short, a checksum protects against ‘accidents,’ while a MAC protects against ‘attacks.’


MAC in the Real World

1. Protecting System Files

Consider when you install an Operating System. The system can compute MAC tags for critical files like kernel32.dll and store them securely. The user’s password can be used as the secret key $k$.

  • Later, if a virus infects and modifies a system file from $F_1$ to $F_1’$, the change will be detected.
  • When the user logs in, the system re-calculates the tag for the current file $F_1’$ using the password $k$.
  • Since the calculated tag won’t match the original, stored tag, the system knows the file has been tampered with.

2. Securing Stateless Cookies

In a stateless architecture, a server avoids storing user session data (like login status) in its database. Instead, it embeds this information in a cookie and sends it to the user. The user sends the cookie back with each subsequent request.

  • Insecure Approach (using a hash): If the server sends a cookie like cookie: bob | hash(bob), an attacker can easily forge it. Since the hash function is public, they can create a valid-looking cookie for another user: cookie: eve | hash(eve).

  • Secure Approach (using a MAC): The server uses a secret key $k$ (known only to the server) to create a cookie: cookie: bob | mac(k, bob).

    • If an attacker changes the username to eve, they cannot compute the correct MAC value without the key $k$.
    • The server will detect the invalid tag and reject the request, showing a “Wrong authentication tag!” error.

But what if an attacker steals the entire valid cookie bob | mac(k, bob) and uses it later? This is known as a replay attack.

Problem: The Replay Attack

A MAC verifies that a message hasn’t been tampered with, but it doesn’t prevent an attacker from capturing a valid message and its tag and resending it later.

Countermeasure: To defeat replay attacks, we need to ensure each message is unique. This is typically done by including a non-repeating value in the message before generating the tag.

  • Sequence Numbers (Counters): Include a message number (1, 2, 3, …). The receiver rejects any message with a repeated or out-of-order number.
  • Timestamps: Include the current time. The receiver rejects messages that are too old.

Construction of a Secure MAC

How do we build a MAC that is provably secure?

Tag Size Matters

As with cryptographic hash functions, the output length is critical. If an adversary simply guesses a tag, their probability of success depends on the tag’s size.

  • If the tag is 1 bit long (0 or 1), the guess has a 50% chance of being correct.
  • If the tag is $n$ bits long, the probability of a random guess succeeding is $1/2^n$.

The minimum requirement is that if we want the attacker’s success probability to be less than $1/2^\lambda$, the tag length must be at least $\lambda$ bits.

Furthermore, the tag generation process must not have any discernible patterns. Our “most powerful adversary” model assumes the attacker can obtain tags for many chosen messages, allowing them to analyze and potentially learn such patterns. Therefore, a tag must appear to be generated completely randomly.

The Key Ingredient: PRF (Pseudo-Random Function)

A Pseudo-Random Function (PRF) is a function that is indistinguishable from a truly random function.

\[ F: K \times X \rightarrow Y \]

A PRF takes a secret key $k$ and an input $x$ to produce an output $y$. Its defining property is that without knowing the key $k$, an adversary who sees many input-output pairs $(x, y)$ cannot determine the relationship between them or predict the output for a new input.

In practice, cryptographic primitives like block ciphers (e.g., AES) and keyed hash functions behave like PRFs.

To build a secure MAC:

  1. Tag length must be sufficiently long.
  2. A PRF-like function must be used for tag generation.

Specific MAC Constructions

Block ciphers like AES, which we use as PRFs, can only process fixed-size messages (e.g., 16 bytes). So, how do we handle longer messages like files or emails?

This challenge led to standardized constructions like CBC-MAC and HMAC. Let’s first examine a few naive (and insecure) approaches to understand the difficulties.

Adversary’s Goal: Observe valid (Message, Tag) pairs and forge a new, valid pair (M', T').

Scenario 1: Simple Concatenation

  • Method: Split message $M$ into blocks $m_1, m_2, \dots$. Compute a tag $t_i$ for each block $m_i$ individually. The final tag is $T = t_1 || t_2 || \dots$.
  • Attack (Cut-and-Paste): An adversary can reorder the message blocks and correspondingly reorder the tag blocks to create a new valid message-tag pair.

Scenario 2: Simple XOR

  • Method: Split the message into blocks and XOR them all together. Compute a single tag on the final XORed result.
  • Attack: This is also vulnerable to reordering attacks. Since XOR is commutative ($A \oplus B = B \oplus A$), changing the block order produces the same XOR sum, making the original tag still valid for the reordered message.

Scenario 3: Numbered Concatenation

  • Method: Prepend a sequence number to each block (e.g., 1||m_1, 2||m_2, …), compute a tag for each, and concatenate the tags.
  • Attack (Mix-and-Match): An adversary who observes multiple valid messages can splice them. They can take the first block from message A (with its valid tag) and the second block from message B (with its valid tag) to forge a new, valid composite message.

CBC-MAC: The Chaining Solution

The lesson from the failed attempts is that blocks must be cryptographically chained together. This is the core idea of CBC-MAC (Cipher Block Chaining MAC).

The tag for each block depends on the result of the previous block.

  1. The first block $m_1$ is encrypted to get an intermediate tag $t_1$.
  2. The second block $m_2$ is XORed with $t_1$ before being encrypted to get $t_2$.
  3. This process continues, and the final intermediate tag becomes the MAC for the entire message.

\[ t_i = E_K(m_i \oplus t_{i-1}) \]

This chaining ensures that changing any block (e.g., $m_2$) will alter its corresponding tag ($t_2$) and all subsequent tags, preventing the simple attacks described above.

However, the basic CBC-MAC is only secure for fixed-length messages. For variable-length messages, it is vulnerable to a length extension attack.

The Attack on Variable-Length CBC-MAC

An attacker can observe a valid tag $T$ for a one-block message $M$ and use it to forge a valid tag for a new, two-block message $M || (M \oplus T)$. The final tag for this new message will cunningly be the same original tag $T$.

This led to secure variants:

  • Encrypted CBC-MAC (ECBC-MAC): The final CBC-MAC tag is encrypted one more time with a different key. An attacker who knows the intermediate tag $T$ cannot predict the final tag because they don’t know the second key.
  • CMAC: The current standard. It’s a more efficient variant that processes the last block differently (using a derived key $K_1$ or $K_2$ depending on whether padding was needed), achieving the same security as ECBC-MAC.

MAC Padding

When the last block of a message is not full, we must use padding.

  • Insecure Padding: Simply adding zeros (000...) can cause ambiguity. The messages 100 and 1000 could become indistinguishable after padding.
  • Secure Padding: The standard is to always append a 1 bit, followed by as many 0 bits as needed to fill the block. This is unambiguous.
  • Important Rule: Even if a message perfectly fits the block size, a full block of padding must be added to distinguish it from a message that was shorter and padded to that length.

PMAC (Parallel MAC)

A drawback of CBC-MAC is its sequential nature—you must process blocks one by one. This can be a bottleneck for long messages on modern multi-core processors.

PMAC was designed for parallel processing. It allows the tags for all blocks to be computed simultaneously, significantly increasing speed.

HMAC: The Industry Standard

So far, we’ve discussed MACs built from block ciphers like AES. However, we can also build a highly secure and widely used MAC from cryptographic hash functions like SHA-256. This is HMAC (Hash-based MAC).

Its structure may look complex, but it’s designed to thwart specific attacks like the length-extension attack, which is a vulnerability in naive constructions like $H(K || M)$.

The HMAC formula is: \[ HMAC(K, M) = H((K \oplus opad) || H((K \oplus ipad) || M)) \]

Where ipad and opad are fixed outer and inner padding constants.

HMAC is a de facto standard used in countless protocols, including:

  • TLS/SSL (the lock icon in your browser)
  • IPsec (VPNs)
  • SSH (secure remote shell)
  • LTE (mobile communications)

It has been standardized by bodies like NIST and IETF, and its security relies on the collision-resistance properties of the underlying hash function.

댓글남기기