1) If you can store a buffer of at least one extra scanline, you could try thePaeth predictor + RLE. This will give reasonable prediction of the nextpixel's grayscale value, and if the prediction is OK, the result will oftencontain a string of zeroes, and the RLE will do a good job.If you can "afford it" (in other words, the FPGA is fast enough), you coulduse arithmetic coding on the resulting predicted values, with a simple order0 model, instead of RLE.Paeth + RLE will do OK on computer generated images, but not on naturalimages. Paeth + AC will do OK on both.Both will fit in 1kb of code for sure.
2) In comp.arch.fpga Melanie Nasic <> wrote:: I want the compression to be lossless and not based on perceptional: irrelevancy reductions.If it has to be lossless there's no way you can guarantee toget 2:1 compression (or indeed any compression at all!). Youmay do, with certain kinds of input, but it's all down to thestatistics of the data. The smaller your storage the lessyou can benefit from statistical variation across the image,and 1 Kbyte is very small!Given that a lossless system is inevitably 'variable bit rate'(VBR) the concept of "real time capability" is somewhat vague;the latency is bound to be variable. In real-world applicationsthe output bit-rate is often constrained so a guaranteed minimumdegree of compression must be achieved; such systems cannot be(always) lossless.From my experience I would say you will need at least a 4-linebuffer to get near to 2:1 compression on a wide range of inputmaterial. For a constant-bit-rate (CBR) system based on a 4x4integer transform see:http://www.bbc.co.uk/rd/pubs/whp/whp119.shtmlThis is designed for ease of hardware implementation rather thanultimate performance, and is necessarily lossy.3) JPEG supports lossless encoding that can fit (at least roughly) withinthe constraints you've imposed. It uses linear prediction of thecurrent pixel based on one or more previous pixels. The differencebetween the prediction and the actual value is what's then encoded. Thedifference is encoded in two parts: the number of bits needed for thedifference and the difference itself. The number of bits is Huffmanencoded, but the remainder is not.This has a number of advantages. First and foremost, it can be donebased on only the curent scan line or (depending on the predictor youchoose) only one scan line plus one pixel. In the latter case, you needto (minutely) modify the model you've outlined though -- instead ofreading, compressing, and discarding an entire scan line, then startingthe next, you always retain one scan line worth of data. As you processpixel X of scan line Y, you're storing pixels 0 through X+1 of thecurrent scan line plus pixels X-1 through N (=line width) of theprevious scan line.Another nice point is that the math involved is always simple -- themost complex case is one addition, one subtraction and a one-bit rightshift.
4) Though it's only rarely used, there's a lossless version of JPEGencoding. It's almost completely different from normal JPEG encoding.This can be done within your constraints, but would be improved if youcan relax them minutely. Instead of only ever using the current scanline, you can improve things if you're willing to place the limit atonly ever storing one scan line. The difference is that when you're inthe middle of a scan line (for example) you're storing the second halfof the previous scan line, and the first half of the current scan line,rather than having half of the buffer sitting empty. If you're storingthe data in normal RAM, this makes little real difference -- the datafrom the previous scan line will remain in memory until you overwriteit, so it's only really a question of whether you use it or ignore it.Yes. In the JPEG 2000 standard, they added JPEG LS, which is anotherlossless encoder. A full-blown JPEG LS encoder needs to store roughlytwo full scan lines if memory serves, which is outside yourconstraints. Nonetheless, if you're not worried about following thestandard, you could create more or less a hybrid between lossless JPEGand JPEG LS, that would incorporate some advantages of the latterwithout the increased storage requirements.I suspect you could improve the prediction a bit as well. In essence,you're creating a (rather crude) low-pass filter by averaging a numberof pixels together. That's equivalent to a FIR with all thecoefficients set to one. I haven't put it to the test, but I'd guessthat by turning it into a full-blown FIR with carefully selectedcoefficients (and possibly using more of the data you have in thebuffer anyway) you could probably improve the predictions. Betterpredictions mean smaller errors, and tighter compression.