My favourite regexp

Do you have a favourite regular expression? That might be a tricky question for some- like the benighted masses who haven’t yet heard the gospel of regular expressions. Or maybe you have so many dear to your heart, a real Sophie’s Choice? For me, it is easy, the first non-trivial one I wrote, for a task management system called TOM. Take a look and see if you can sell what it does- to help you out (?) I have left it in the context of the line of Perl it came from.

$string =~ s/(?=.{79,})((.{0,77}[\-,;:\/!?.\ \t])|(.{78}))/$1\r\n/g;

Metametadata for steganography

So.

You’d like to slip some data past some prying eyes. For whatever reason, overtly encrypting the data isn’t possible. It’s not just that the data can’t be seen, but you can’t have anyone be aware data is even moving anywhere.

It’s not unheard of to use metadata for smuggling data. That way, the ostensible file- a Word document, a JPEG file looks innocent- it can even be opened up and read -but if you know where to look, there’s the secret data.

What’s data, in contrast to metadata? That’s a matter of intention. It’s the surface purpose of the file format.

 

But not all metadata is created equal.
The data itself can be crafted or written to convey a second meaning. For example, subtext in fiction, or taking a photo of 6 sunflowers.
You can use patterns- spaces or number of syllables in text, or colours of pixels in images.
You can encode the data in such a way that it includes extra meaning- using homoglyphs, or diacritic combining characters.

But we’re talking about metadata, I thought?
All metadata has a purpose, and is going to affect the way the file is read or displayed. Different applications may expose that metadata more visibly than others. So we have to be careful not to store our payload in what turns out to be plain sight. An example being the document title, which is often going to be displayed in the application title bar. So it might look a little suspicious in the document about cakes says “The X-94 Prototype uses Jumbillium alloy”.
One option can be to find formats where we can define our own metadata, or look for the equivalent of junk DNA.

What’s junk DNA?
Noncoding DNA or Junk DNA is DNA that doesn’t do anything (sidenote- much of it probably does do things after all). It just bulks out the genome to no effect (again, probably not true). But it sits right in there with the other DNA. Where most DNA shows the body how to make a protein, ncDNA does not.
Many file formats will have an end-of-file token, and ignore anything after that. They may have a payload length in the header, and again, everything after that is ignored.
Error correction codes have been misused in the past on audio CDs, but we could use the ECC bits to store our data- quite a percentage of the audio CD is that.

But what about metametadata?
Metametadata is metadata about metadata. It’s casting metadata as the main role, then thinking about what we can say about it.

  • GPS co-ordinates (which would be the metadata) of a location which can be looked up on a map, with a name, say, London, encoding for the letter L, or number 6. Or 50, I suppose.
  • Or GPS co-ordinates where the least significant digits carry the hidden data.
  • In databases, using wide, non-sequential ID columns to encode our data. The GUID type in Microsoft Access and SQL Server is 16 bytes wide! Feed our data through a 128 bit block cipher, and we’ll have pretty random-looking data, so given unique input plaintext, we are going to get unique cyphertext out. Prepending a sequence number to our text might be needed if we know we’ll be wanting to send repeating messages, but that eats into our budget. We can expose the GUIDs in some interface, perhaps in what would look like debug messages/comments in some HTML front end.
  • TTL fields in IP headers. Now. These won’t survive end-to-end unaltered. But, over time, we can establish a baseline number of hops, interleaved with TTLs to carry our covert meaning. Sure, it’s not a lot of data, but with the right application, we can send a lot of packets in any session, or spread out over time- perhaps looking at other data leaving that network node or time of day to ensure it doesn’t look out of place. With the baseline TTL known (have to recheck periodically in case of path changes), we can monitor these transmissions from any node in the path.
  • If we’re playing around at the packet level, we can also bury stuff in the sequence number of TCP headers, although this is going to require more co-operation at the other end, a network stack that can ignore sequence numbers on a certain port, perhaps, and simply funnels the sequence numbers to a decoding utility.
  • Both of the above techniques could also be employed as a network “knock”, the idea of refusing connections to a port until a certain series of events happens, like a secret knock.
  • Many file formats have a lot, or total flexibility with regards to the ordering of the metadata, so we can use that to encode data.

Problems with embedding data in metadata is that if it gets too big, the filesize is going to look incongruous with the overt data. It would be worth looking at a sample of JPEGs, etc, to determine the average, and have a think about what heuristics might be engaged to detect meta (or metameta) data payloads. We might scrub the EXIF clean of a JPEG, or at least flag it up as likely to be suspect, not just size, but use of unusual, or rarely used EXIF tags.

Continue reading “Metametadata for steganography”

Classified Classifieds (steganography for command & control)

Ars Technica just ran a story about a Russian hacking group making botnets, and how they control them covertly.

There’s two parts to a successful botnet- you need the zombie (infected) hosts not to be detected too quickly, as they’ll then be cured. But you also need to be able to communicate with the bots in such a way that they don’t betray their own presence, and don’t leave a trail back to what is known as the C&C (Command & Control).

That middle bit is still very tricky, since by definition, you want to have an effect- perhaps a DDOS on a target, spamming, or distribution of more malware. If a DDOS, by definition, that’s going to be very noticeable by all concerned. But going from that to alerting the owner of the specific computer (or perhaps router, or printer) is slow. ISPs aren’t known for rapid action.

Even when you do get a message to the zombie’s owner, that’s one machine.

To really knock out the botnet, you need to get at the C&C. So where is it?

Sometimes, the code of the malware will give it away. It was that that allowed the recent Wannacry attack to be mitigated.

In this case, there was no URL, but there was a co-infection with a browser extension, which gathered the URL (via bit.ly shortened URL) by scanning comments matching certain features on an Instagram post. The article doesn’t make it clear if visits to that particular post were forced, non-browser requests but I have to assume so.

I liked how this reminded my of the old-fashioned (but still in use, no doubt) posting of a coded, surface-innocent message in the newspaper classifieds. No-one looks suspicious for purchasing a copy of a newspaper.

Using bit.ly made that initial URL less suspicious, and using a URL shortening service makes sense, to keep the hidden data requirement low. Downside was that it was possible to view the statistics on visits to, ultimately, the C&C.

If you aren’t too fussy about who you infect, you can simply use comment systems (or, riskier, adverts) with this kind of steganography, on very popular websites. It might not guarantee that you get a specific visitor, unless you know about their browsing habits, but it’ll get you plenty. That way, you needed force the browser to visit anywhere. Anytime you can let the target do all the work for you, you’re minimising the subterfuge you have to engage in as an attacker, and the fewer fingerprints/smoking guns you’re leaving on their system.

Could that data be, not a direct URL, but instead, something like a BitTorrent magnet link, pointing at a distributed, decentralised C&C system using a DHT? How about a tor .onion address? There are JavaScript tor client implementations after all…

Randomness – C# mini tutorial

Note: This was some teaching material I used on the degree and professional courses to explain a little about using randomness- why and how -in your applications. The courses were based around C#, but are easily adapted to many other languages, including Java and Javascript. Just ask for a translation! Because they were slides used as part of an in-class tutorial, some parts may raise questions as much as answer them…

Note 2: A Visual Studio project is available with some starting code, and some questions (in the form of comments) for you to try to answer, available here: rand.zip

Randomness

  • What is randomness?
  • Where can we use it in our programs?
  • How can we acquire random values?
  • How can we make some values more likely than others (weighting/non-uniform distribution)?

Continue reading “Randomness – C# mini tutorial”

Clouds – animation example

Here’s a demonstrator I coded up for one of our web development classes, to illustrate some animation techniques.

Give it a try, put some clouds in your browser to while away your tea break.

Note how the clouds towards the top of the screen are larger, fewer in number, and faster moving than at the bottom of the screen, roughly simulating the wider field of view nearer the horizon. There’s a CSS gradient to try to sell the illusion too. This distribution is controlled by feeding output from the pseudorandom number generator into a function, giving a skewed distribution where larger numbers are more common than smaller numbers.

Controls are at the foot of the page. They are faded in and out using the CSS transition property, whilst the cloud animations are done using the velocity.js library. I hand-drew the clouds (PNG format, for the variable transparency needed for compositing). I apologise in advance for some of the iffier ones.

Boring Deadly Urgent Machine (Game) :: Part 1 :: The Spec

Some machines I like because of their sleek, minimalist exteriors, modernist or even brutalist megaliths of silicon, or brass.

But I love me some blinkenlights. If it’s got switches galore, laden with quadrant faders and vernier dials, festooned with vu meters, pulsing with neon lamps or LEDs, I’m going to pay that some serious attention. Throw in some industrial interconnects, be they ubiquitous BNCs, or some exotic mixed signal sockets, anything heavy duty, very very high frequency, lots of pins or fibre optic, fastened with clips, twist-locks, clamps or thumb screws….ahem.

Continue reading “Boring Deadly Urgent Machine (Game) :: Part 1 :: The Spec”

Why isn’t SVG more widely used?

I was first introduced to Scalable Vector Graphics (SVG) through Inkscape, the excellent free vector graphics drawing application. It’s available for Windows, Linux and OS X, so please give it a try. You can get it as a portable app, runnable on a Windows machine from a USB memory stick, incredibly handy to have when working on other people’s machines (so if you’ve never visited PortableApps.com, maybe now’s the time!).


Continue reading “Why isn’t SVG more widely used?”

Atollic TrueStudio and STM32Cube

TL;DR version of getting Atollic TrueStudio and STM32Cube working

I have an ST Nucleo-F411RE development board that I’d like to use for a project. It’s using a microcontroller from the same STM32 family as I’ve tried out before, but packaged onto a board with Arduino (Uno)-pattern headers, as well as the full pin breakout. I’m not sure how useful that is for my project yet, but I wanted to give it a test drive.

Most platforms have multiple development options.

The Arduino biosphere has various hardware targets, but with a common IDE. But the Arduino boards are all based on microcontrollers from Atmel (now part of Microchip, the company that also makes PIC microcontrollers). So you aren’t limited to the Arduino IDE, and if you’d like to, you can use all sorts of development toolchains and IDEs.

Continue reading “Atollic TrueStudio and STM32Cube”

Wrapping your head around it- irregular images and text

Image elements, on the web, are square. Which is fine in many situations, and suits the image content. But often that content is irregular or at least non-rectilinear in shape. Take, for example, a red ball against a transparent background. To our eyes, the content, the important visual information is actually circular. If we try to wrap text around it, the browser is unaware, and can only conform to the image element boundary.

There’s always more than one way to do it, but take a look at this method for wrapping text around non-square image content for web pages. It’s the same image file of a circle, but now the text appears to follow the contours of the visually important object, not the <img> element:

Continue reading “Wrapping your head around it- irregular images and text”