August 6, 2021

Hiding data in the pixels - Least Significant Bit Steganography

encoded

Have a look at this image of trees and other miscellaneous bushes. It looks just like any other ordinary photo, but in fact the entire text of 1984 by George Orwell is embedded within this image. Steganography is the practice of embedding data within some medium, be it an image, audio file, video file or even other text files. After watching a Computerphile video (“Secrets Hidden in Images (Steganography)“), I’ve attempted to implement one of the techniques called the Least Significant Bit (LSB) Steganography (on RGB values).

The image used is in PNG format, a lossless compression (which is important , since we need to preserve the exact RGB values of each pixels). I’ve implemented LSB Steganography on images in Python, with the help of Pillow, an image processing library. We will also need to manipulate bits, which requires the help of bitarray and struct libraries:

1
2
3

from PIL import Image
from bitarray import bitarray
import struct

Let us first consider the Encoder, the program that will embed data we give in the image we provide. The plan is as follows - we want to embed our data (in this case the texts are encoded with utf-8) in the least significant bits of each RGB value. Each pixel contains 3 Bytes (24 bits) of data, 1 Byte each for Red, Green and Blue components (unsigned char with values ranging from 0 to 255). The first step is to read our image in:

class Encoder:
	def __init__(self, path):
    	# Read image and get pixels
        self.img = Image.open(path)
        self.pix = self.img.load()

By calling the .load method, self.pix can then essentially be seen as a 2D array of pixels, each containing a 3-tuple of RGB values.

Next, we read data that we want to hide in the form of bits. First, declare a bitarray for the data to store in:

def __init__(self, path):
	...
	# Data bitarray
	self.data = bitarray()
	self.size = 0

to which we can appended messages to:

def append(self, message):
	"""Append a message in the data
	message (str): the string to be added to data to be embeded"""
        # From string to bits
        data_bits = bitarray()
        data_bits.frombytes(bytes(message, encoding="utf-8"))

        if len(data_bits) + self.size > self.img.size[0] * self.img.size[1]:
            error = f"Image not large enough: only {self.img.size[0] * self.img.size[1]}" \
            + f" bits can fit in image {self.img.size[0]} by {self.img.size[1]}." \
            + f" There are {len(data_bits) + self.size} bits in given data"
            raise Exception(error)

        self.data.extend(data_bits)
        self.size += len(data_bits)

Note the size check in line 8. Since the image we read in has a fixed size, it has a capacity in terms of the data we can embed using only the LSB. In particular, the number of bits we can embed in the image is the number of pixels multiplied by 3.

Let us now choose to embed our hidden message in the order from top to bottom, from left to right, and from R to G to B. This means we start at the red value of top-left-most pixel, and move to the blue value for the next bit, and green bit. We will declare these counters as a class attribute:

def __init__(self, path):
	...
	self.row = self.col = 0# Row and column counter
	self.rgb_i = 0# RGB counter

Here I have taken an iterator approach, by using next, get_val and set_val to traverse the image array and make modifications:

def next(self):
	# Increment position
    self.rgb_i += 1
    
    # Move to next pixel
    if self.rgb_i == 3:
    	self.rgb_i = 
        self.row += 1
        
        # Move to next column (compare with image size)
        if self.row == self.img.size[0]:
        	self.row = 0
            self.col += 1
 
def get_val(self):
    """Get the unsigned char value"""
    return self.pix[self.row, self.col][self.rgb_i]

def set_val(self, val):
    """Set the RGB to be new value (0-255)"""
    # Get list of RGB value for the pixel
    rgb = list(self.pix[self.row, self.col])
   	rgb[self.rgb_i] = val
    # Write new pixel back as tuple
    self.pix[self.row, self.col] = tuple(rgb)

Now we can begin to encode data. One caveat is that when we try to decode our message from the image later, we need to know how long our message is. We convey the length of our message by first embedding a header of length 4 Bytes (unsigned int) that is the length of our data:

def encode(self, save_to):
    # Use first 4 Bytes for length of the message
    data_len = struct.pack(">L", len(self.data))
    self.bits = bitarray()
    self.bits.frombytes(data_len)

    # Add data to header
    self.bits.extend(self.data)

Then, all we have to do is go through the bits of our data, and encode them in the image we have:

def encode(self, save_to):
	...
	i = 0
    while i < len(self.bits):
        # Next bit in the data
        bit = self.bits[i]

        # Get the RGB value in bits
        rgb_val = self.get_val()
        rgb_bit = bitarray()
        rgb_bit.frombytes(bytes([rgb_val]))

        # Set last bit to be the value in our data
        rgb_bit[-1] = bit
	
    	# Interpret byte back to unsigned char and set new RGB value
        val = struct.unpack(">B", rgb_bit)[0]
        self.set_val(val)
        self.next()
        i += 1
	
    # Save image with embeded data
    self.img.save(save_to)

And we are done! To encode our data, we can run

# Read our image to embed
encoder = Encoder("sample.png")

# Read our hidden message in
with open("1984.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()

# Add lines to data
for line in lines:
    encoder.append(line)

encoder.encode()

Of course, with every Encoder comes with a Decoder. I encourage you to implement a Decoder that extracts information encoded in the image. The Decoder reads the image in as pixels, determines the header and reads data of given size in the same order the Encoder did. By then decoding the bytes in utf-8, we can recover the original message.

Decoder:

from PIL import Image
from bitarray import bitarray
import struct


class Decoder:
    def __init__(self, path):
        # Read image and get pixels
        self.img = Image.open(path)
        self.pix = self.img.load()

        # Encode each bit in the least significant bit of rgb numbers
        self.row = self.col = 0# Row and column counter
        self.rgb_i = 0# RGB counter


    def decode(self):
        header = bitarray()
        header_length = 32

        # Get data size
        i = 0
        while i < header_length:
            val = self.get_val()
            val_bits = bitarray()
            val_bits.frombytes(struct.pack(">B", val))
            last_bit = val_bits[-1]
            header.append(last_bit)

            self.next()
            i += 1

        data_size = struct.unpack(">L", header)[0]
        
        # Read data
        data = bitarray()
        i = 0
        while i < data_size:
            val = self.get_val()
            val_bits = bitarray()
            val_bits.frombytes(struct.pack(">B", val))
            last_bit = val_bits[-1]
            data.append(last_bit)

            self.next()
            i += 1

        # Decode bytes using ascii
        return data.tobytes().decode('utf-8')

    def get_val(self):
        rgb = self.pix[self.row, self.col]
        val = rgb[self.rgb_i]
        return val

    def next(self):
        # Increment position
        self.rgb_i += 1
        if self.rgb_i == 3:
            self.rgb_i = 0
            self.row += 1
            if self.row == self.img.size[0]:
                self.row = 0
                self.col += 1

            if self.col == self.img.size[1]:
                raise Exception("Image not large enough for given size")


decoder = Decoder("encoded.png")
print(decoder.decode())

About this Post

This post is written by Yifan C, licensed under CC BY-NC 4.0.