InfoSecurity India's First Magazine on Comprehensive IT Security
Menu Bar
InfoSecurity Sep 2009

Tech Focus

Understanding Digital Steganography

Digital steganography, known as the art of hiding information is one of the interesting subjects that have evolved rapidly. The wide range of applications in concealing information inside natural photographs is one of its stealth features. This article offers a brief introduction to steganography along with its constructive and destructive applications in the security domain.

Over the past 8 years, steganography has been the source of a lot of discussion, particularly as it was suspected that terrorists connected with the September 11 attacks might have used it for covert communications. This concern points out the effectiveness of steganography as a means of obscuring data. Indeed, along with encryption, steganography is one of the fundamental ways by which data can be kept confidential.

Steganography today is significantly more sophisticated than the ancient—physical steganography, allowing a user to hide large amounts of information within image or audio files. It has come a long way after the World War II period.

About Steganography

Steganography is the art and science of writing hidden messages in such a way that no one apart from the intended recipient knows of the existence of the message. While commonly thought of as messages hidden in pictures it is not limited to just pictures, although this is one the common uses, but messages can be embedded in any number of digital media types. It can even be embedded into sound files.

This practice of steganography is often called stego for short. Usually a steganographic message will appear to be something else: a picture, an article, a shopping list, or some other message—this is referred to as the covertext or in the case of digital file—the carrier.

Steganography and Cryptography

It is important to understand that steganography is very different than cryptography and the two are often confused.  With cryptography, encryption is the process of obscuring information to make it unreadable without some type of special knowledge. In this case the message is not concealed just scrambled or obscured.

The obvious advantage of steganography over cryptography is that messages do not attract any attention. A coded message that is unhidden, no matter how strong the encryption, will arouse suspicion and may in itself be problematic. For example, in some countries encryption is illegal. Stego may even be mixed with encryption so the carrier file actually carries a message that is encrypted. So even if intercepted, another barrier is presented in trying to break the encryption.

There are also additional forms of steganography which are often used in conjunction with cryptography so that the information is doubly protected; first it is encrypted and then hidden so that an adversary has to first find the information (an often difficult task in and of itself) and then decrypt it.

Advent of Digital Steganography

Modern steganography entered the world in 1985 with the advent of the personal computer applied to classical steganography problems. A problem existed where information needed to be sent safely and securely between parties across restrictive communications channels.  Two engineers, Barrie Morgan and Mike Barney, working at Datotek, a small Dallas based company, took the challenge.  Sometimes called "M2B2", they created and fielded two steganographic systems, one of which was determined to be undetectable.  Many (but not all!) of the challenges and problems they solved in the mid-1980's have been 're-discovered' in commercial and Internet available steganographic systems of today.

Development following that was slow, but has since taken off in the late nineties. Going by the number of 'stego' programs available today, there are over 725 digital steganography applications have been identified by the Steganography Analysis and Research Center. Digital steganography techniques include:

  • Concealing messages within the lowest bits of noisy images or sound files.

  • Concealing data within encrypted data or within random data. The data to be concealed is first encrypted before being used to overwrite part of a much larger block of encrypted data or a block of random data (an unbreakable cipher like the one-time_pad generate ciphertexts that look perfectly random if you don't have the private key).

  • Chaffing and winnowing.

  • Mimic functions convert one file to have the statistical profile of another. This can thwart statistical methods that help brute-force attacks identify the right solution in a ciphertext-only attack.

  • Concealed messages in tampered executable files, exploiting redundancy in the i386 instruction set.

  • Pictures embedded in video material (optionally played at slower or faster speed).

  • Injecting imperceptible delays to packets sent over the network from the keyboard. Delays in keypresses in some applications (telnet or remote desktop software) can mean a delay in packets, and the delays in the packets can be used to encode data.

  • Content-Aware Steganography hides information in the semantics a human user assigns to a datagram. These systems offer security against a non-human adversary/warden.

  • Blog-Steganography. Messages are fractionalized and the (encrypted) pieces are added as comments of orphaned web-logs (or pin boards on social network platforms). In this case the selection of blogs is the symmetric key that sender and recipient are using; the carrier of the hidden message is the whole blogosphere.

The Methodology

Steganography includes the concealment of information within computer files. In digital steganography, electronic communications may include steganographic coding inside of a transport layer, such as a document file, image file, program or protocol. Media files are ideal for steganographic transmission because of their large size. As a simple example, a sender might start with an innocuous image file and adjust the color of every 100th pixel to correspond to a letter in the alphabet, a change so subtle that someone not specifically looking for it is unlikely to notice it.

A common form of steganography is the use of JPEG files (Joint Photographic Experts Group) to hide the message. A JPEG is a commonly used standard method of lossy compression for photographic images. The file format which employs this compression is commonly also called JPEG. The most common file extensions for this format are .jpeg, .jfif, .jpg, .JPG, or .JPE although .jpg is the most common on all platforms.

Electronic images, such as jpeg files, provide the perfect “cover” because you can find them everywhere on the Internet. Even on your own machine it probably contains hundreds if not thousands of jpeg images. These images are shared and they can be posted on websites or e-mailed anywhere in the world. Steganographic techniques allow users to embed a secret file, data, or  a "payload", by slightly shifting the color values to account for the “bits” of data being hidden. The payload files can be almost anything from illegal financial transactions, off-shore account information, terrorist communications, stolen corporate data, criminal messages and any other malicious information.

The Payload

The payload is the data it is desirable to transport and to hide the user data. The carrier is the signal, stream, or data file into which the payload is hidden; contrast "channel" (typically used to refer to the type of input, such as "a JPEG image"). The resulting signal, stream, or data file which has the payload encoded into it is sometimes referred to as the package, stego file, or covert message. The percentage of bytes, samples, or other signal elements which are modified to encode the payload is referred to as the encoding density and is typically expressed as a floating-point number between 0 and 1.

In a set of files, those files considered likely to contain a payload are called suspects. If the suspect was identified through some type of statistical analysis, it might be referred to as a candidate.

Countermeasures—Steganalysis

In computing, detection of steganographically encoded packages is called steganalysis. The simplest method to detect modified files, however, is to compare them to known originals. For example, to detect information being moved through the graphics on a website an analyst can maintain known-clean copies of these materials and compare them against the current contents of the site. The differences, assuming the carrier is the same, will compose the payload. In general, using extremely high compression rate makes steganography difficult, but not impossible. While compression errors provide a hiding place for data, high compression reduces the amount of data available to hide the payload in, raising the encoding density and facilitating easier detection (in the extreme case, even by casual observation).

Figure - 1 : Image of a tree. By removing all but the last 2 bits of each color component, an almost completely black image results. Making the resulting image 85 times brighter results in the next image.

Figure - 2 : Image of a cat extracted from above image.

Applications

Digital Watermarking: There are a number of uses for steganography besides the mere novelty. One of the most widely used applications is for so-called digital watermarking. A watermark, historically, is the replication of an image, logo, or text on paper stock so that the source of the document can be at least partially authenticated. A digital watermark can accomplish the same function; a graphic artist, for example, might post sample images on her Web site complete with an embedded signature so that she can later prove her ownership in case others attempt to portray her work as their own.

Usage in modern printers: Steganography is used by some modern printers, including HP and Xerox brand color laser printers. Tiny yellow dots are added to each page. The dots are barely visible and contain encoded printer serial numbers, as well as date and time stamps.

Example from modern practice: The larger the cover message is (in data content terms—number of bits) relative to the hidden message, the easier it is to hide the latter. For this reason, digital pictures (which contain large amounts of data) are used to hide messages on the Internet and on other communication media. It is not clear how commonly this is actually done. For example: a 24-bit bitmap will have 8 bits representing each of the three color values (red, green, and blue) at each pixel. If we consider just the blue there will be 2^8 different values of blue. The difference between 11111111 and 11111110 in the value for blue intensity is likely to be undetectable by the human eye. Therefore, the least significant bit can be used (more or less undetectably) for something else other than color information. If we do it with the green and the red as well we can get one letter of ASCII (American Standard Code for Information Interchange) text for every three pixels.

Stated somewhat more formally, the objective for making steganographic encoding difficult to detect is to ensure that the changes to the carrier (the original signal) due to the injection of the payload (the signal to covertly embed) are visually (and ideally, statistically) negligible; that is to say, the changes are indistinguishable from the noise floor of the carrier.

From an information theoretical point of view, this means that the channel must have more capacity than the 'surface' signal requires, that is, there must be redundancy. For a digital image, this may be noise from the imaging element; for digital audio, it may be noise from recording techniques or amplification equipment. In general, electronics that digitize an analog signal suffer from several noise sources such as thermal noise, flicker noise, and shot noise. This noise provides enough variation in the captured digital information that it can be exploited as a noise cover for hidden data. In addition, lossy compression schemes (such as JPEG) always introduce some error into the decompressed data; it is possible to exploit this for steganographic use as well.

Apart from picture files that can host hidden information, other file formats can also hide data such as audio files, text files, web-pages and many other file formats.

Terror Applications from Media Reports

As always, in addition to the legitimate security users, the application of any critical security technology would also be used by the malicious users. The potent of steganography is such that it has been used by the terrorists for destructive motives.
 
Rumors about terrorists using steganography started first in the daily newspaper USA Today on February 5, 2001 in two articles titled "Terrorist instructions hidden online" and "Terror groups hide behind Web encryption". In July of the same year, the information looked even more precise: "Militants wire Web with links to jihad". A citation from the USA Today article: "Lately, al-Qaeda operatives have been sending hundreds of encrypted messages that have been hidden in files on digital photographs on the auction site eBay.com". These rumors were cited many times - without ever showing any actual proof - by other media worldwide, especially after the terrorist attack of 9/11.

The Italian newspaper Corriere della Sera reported that an Al Qaeda cell which had been captured at the Via Quaranta mosque in Milan had immoral images on their computers, and that these images had been used to hide secret messages (although no other Italian paper ever covered the story). The USA Today articles were written by veteran foreign correspondent Jack Kelley, who in 2004 was fired after allegations emerged that he had fabricated stories and invented sources.

In October 2001, the New York Times published an article claiming that al-Qaeda had used steganographic techniques to encode messages into images, and then transported these via e-mail and possibly via USENET to prepare and execute the September 11, 2001 terrorist attack. The Federal Plan for Cyber Security and Information Assurance Research and Development, published in April 2006 makes the following statements:

  • "…immediate concerns also include the use of cyberspace for covert communications, particularly by terrorists but also by foreign intelligence services; espionage against sensitive but poorly defended data in government and industry systems; subversion by insiders, including vendors and contractors; criminal activity, primarily involving fraud and theft of financial or identity information, by hackers and organized crime groups…" (page 9, 10).

  • "International interest in R&D for steganography technologies and their commercialization and application has exploded in recent years. These technologies pose a potential threat to national security. Because steganography secretly embeds additional, and nearly undetectable, information content in digital products, the potential for covert dissemination of malicious software, mobile code, or information is great." (page 41, 42).

  • "The threat posed by steganography has been documented in numerous intelligence reports." (page 42).

Conclusion

Steganography is a really interesting subject and outside of the mainstream cryptography and system administration that most of us deal with day after day. But it is also quite real; this is not just something that's used in the lab or an arcane subject of study in academia. Stego may, in fact, be all too real as we have known there have been several reports that the terrorist organization behind the September 11 attacks in New York City, Washington, D.C., and outside of Pittsburgh used steganography as one of their means of communication.

In addition to the terror applications, unfortunately steganography can also be used for illegitimate reasons. For instance, if someone was trying to steal data, they could conceal it in another file or files and send it out in an innocent looking email or file transfer. Furthermore, a person with a hobby of saving malicious information, or worse, to their hard drive, may choose to hide the evidence through the use of steganography.

Steganography is a fascinating and effective method of hiding data that has been used throughout history. Methods that can be employed to uncover such devious tactics, but the first step are awareness that such methods even exist. There are many good reasons as well to use this type of data hiding, including watermarking or a more secure central storage method for such things as passwords, or key processes. Regardless, the technology is easy to use and difficult to detect. The more that you know about its features and functionality, the more ahead you will be in the game.

By:R. Manoj. The author is an Assistant Editor at Fanatic Media, Bangalore. He is also an Independent Researcher, specializing in Software Security. He has an active interest in designing security algorithms for securing softwares. He can reached at infosecurity@fanaticmedia.com


Home   |   Current Issue   |   Archives   |   Subscription   |   Advertisement   |   Contacts

© 2006-07 'InfoSecurity' magazine. All rights reserved.
Website designed, developed and maintained by Fanatic Media