gmshadow 0.2 published

I’ve just published the 0.2.1 version of my little utility gmshadow, which I use to backup mail from gmail.

The 0.2 version is now a “beta” release and features both backup and restore capability. It’s been optimized for speed and seems – at least to me – to be blazingly fast in downloading email. And the mbox and zip formats have received an overhaul, too – unfortunately breaking backwards compatibility, but it’s for a greater good. :)

It’s available over at http://www.gefvert.org/blog/downloads/gmshadow.

Storing Person-specific Sensitive Data in Databases

How do you build a system that can ask people very sensitive questions, for instance about their work, etc; and ensure confidentiality of the answers on a database level, yet allow each person to answer only once (and if necessary edit his answers)?

This is my proposed method.

  • The system itself consists of a database, a set of PHP pages, and a configuration file.
  • The PHP files identify the user and give him a unique identifier ku, say, the person’s name, SSN, or similar identity. This is only stored in the current session. Necessary referential constraints can be added to ensure the validity of the identifier, so the user specifies a real name – e.g. through a lookup in a user table.
  • The system configuration file contains a system key, ks, which is a unique, random key, and which is kept secret.
  • The database has three tables.
    • Two tables make up the questions. One for each questionnaire, and one for each question connected to the questionnaire. Each questionnaire is given a unique, random key kq, and each question has an auto-incrementing ID.
    • The third table contains the answers, which every user has specified. The unique key to each answer is given by the question ID, and a cryptographically secure hash h -> H(ks + ku + kq).

By keeping ks a secret, it is cryptologically impossible to find out the user identity of any given answer in the answers table. And by varying kq for each questionnaire, it is also impossible to track users over several questionnaires. h will, however, be the same for each given questionnaire and user.

It is thereby possible for the system to post answers from a given user into the tables, and access previous answers; and it is also possible for a user to retrieve answers, filter them, and generate statistics, without ever having access to any identities for the given answers. Even if the system is compromised, it is impossible to learn the identities of any answers without using a brute-force (or table lookup) method.

Does this sound ok?

MySQL Function to Calculate Excel- (or Delphi)-style Dates

drop function if exists exceldate;

delimiter //

-- Function that returns an Excel-style or Delphi-style date value
-- from a MySQL date. A date value of 0 represents 1899-12-30.
create function exceldate(p_date date)
    returns int
    sql security invoker
begin
    return to_days(p_date) - 693959;
end //

delimiter ;

select '1899-12-31', exceldate('1899-12-31')     -- should be 1
union
select '2011-10-18', exceldate('2011-10-18');    -- should be 40834

Luhn (mod 10) Check Digit Algorithm in MySQL

drop function if exists luhn;
drop function if exists luhn_check;

delimiter //

-- Function that calculates a Luhn (mod 10) check digit from a numeric string.
-- The behavior is undefined if the string contains anything else than digits.
-- Assumes that the string does not have a check digit added yet, so it starts
-- with a weight of 2 at the last digit.
create function luhn(p_number varchar(31))
    returns char(1)
    sql security invoker
begin
    declare i, mysum, r, weight int;

    set weight = 2;
    set mysum = 0;
    set i = length(p_number);

    while i > 0 do
        set r = substring(p_number, i, 1) * weight;
        set mysum = mysum + if(r > 9, r - 9, r);
        set i = i - 1;
        set weight = 3 - weight;
    end while;

    return (10 - mysum % 10) % 10;
end //

-- Check if a numeric string has a valid check digit. Does this by cutting off
-- the last digit, recalculating the Luhn check digit, and comparing the strings.
create function luhn_check(p_number varchar(32))
    returns boolean
    sql security invoker
begin
    declare luhn_number varchar(32);

    set luhn_number = substring(p_number, 1, length(p_number) - 1);
    set luhn_number = concat(luhn_number, luhn(luhn_number));

    return luhn_number = p_number;
end //

delimiter ;

A Small Script to Compress Movies for Cellphones

I wrote this small script for Windows to compress movies (avi, mkv etc) down to cell phone format. The encoding isn’t great, but the average two-hour movie compresses down to just about ~100 MB, which is really nice. Works fine on Nokia N85. Edit to suit your needs. It relies on ffmpeg, which you can find Windows binary downloads for here and there. Google for it, kid, google for it!

Oh, it also requires cygwin, for the perl interpreter. (Or ActivePerl, I suppose.)

ff-m4v.cmd:

@echo off
set CYGWIN=nodosfilewarning
for %%F in (%*) do (
    echo.
    echo ######## %%F #########
    echo.
    perl %~dpn0.pl %%F
)

ff-m4v.pl:

$in = shift;
$data = `ffmpeg -i '$in' 2>&1`;

$out = $in;
$out =~ s/\.[A-Za-z0-9]+$/\.m4v/;

($x, $y) = $data =~ /Video:.*?(\d+)x(\d+)/i;
$dx = 320 / $x;
$dy = 240 / $y;
$d = $dx < $dy ? $dx : $dy;

$dim = int($d * $x) . 'x' . int($d * $y);

$opt = "ffmpeg -i '$in' -f mp4 -vcodec mpeg4 -b 80000 -r 12 -acodec aac -ar 16000 -ab 48000 -ac 1 -threads 2 -y -s $dim '$out'";
print "$opt\n";
system($opt);

The parameters evaluate to mpeg4 encoding, 80 kbit/s bitrate for video, 12 frames per second; AAC encoding with 48 kbit/s bitrate, mono sound. It compresses the movie down to a maximum of 320x240 pixels, but preserves aspect ratio (effectively becoming 320x196, 320x160 or whatever). If you want to use 2-pass encoding, it's easy; just repeat the system($opt) call with "-pass 1" and "-pass 2" parameters.

Put both in the same folder, make sure ffmpeg.exe is in the system path, and run "ff-m4v *.avi".

CMD File to Run the Same Programs at Work Every Day

I usually sign on to work by starting at least four different programs: Microsoft Lync, Miranda, a SIP telephone, and a little break clock. I used to have a Python script for it, but I thought that installing ActivePython on a Windows machine just to start four programs is a bit of an overkill. So I wrote a CMD file instead. It’s really amazing what you can beat the old CMD interpreter into doing.

startToday.cmd:

@echo off

tasklist /nh > __tasklist.txt

for /f "delims=" %%F in (startToday.files) do (
    findstr /b /i "%%~nxF" __tasklist.txt > nul
    if errorlevel 1 (
        echo starting %%F
        start "" "%%F"
    )
)

del /q /f __tasklist.txt

startToday.files:

C:\Program Files\CounterPath\X-Lite\x-lite.exe
C:\Program Files\Miranda IM\miranda32.exe
C:\Program Files\Microsoft Lync\communicator.exe
C:\Home\stuff\software projects\BreakTime\BreakTime.exe

Voila!

Making Beautiful Lists

This is a nice way to make beautiful lists. Take any old list of unordered elements, for instance:

  • Apples
  • Bananas
  • Blistering barnacles
  • Gargoyles
  • Oranges
  • Pears
  • Sunfruit

Looks good? Sure. But sometimes, you want it horizontally. Let’s add the following style sheet to it.

    ul                     { list-style: none; padding: 0; margin: 0; }
    ul li                  { list-style: none; padding: 0; margin: 0; display: inline; }
    ul li:after            { content: '\B7'; padding: 0 0.2em 0 0.5em; }
    ul li:last-child:after { content: '' }

And what do we get? Something as beautiful as this:

  • Apples
  • Bananas
  • Blistering barnacles
  • Gargoyles
  • Oranges
  • Pears
  • Sunfruit

A nice, horizontal list; and the best thing about it, is that there are no little dots in the HTML code, no handling for first or last elements (no ul class=”last”), nothing. Just pure css. And, of course, each element can be styled, you can make links, or any old thing. Seems to work fine in Firefox and Chrome; IE8 doesn’t seem to get the last “ul li:last-child:after” element, but all you get is a little dot after the last element. Worse things could happen.

Backup up MySQL Tables Using Only CMD.EXE and 7-zip

A small batch file I wrote today, to back up MySQL databases, table by table.

So many times have I needed to restore only a specific table, and yet it’s so difficult to dig out the correct table from the whole database backup file (or even worse, server backup file).

This little file will store the whole database/server in individual files, and if you run it on a daily basis, will keep history in “backup.n” directories.

@echo off

rem Enable delayed expansion - otherwise the backup.n folder renaming won't work
setlocal enabledelayedexpansion

rem Change to the script's home folder
%~d0
cd %~p0

rem Rotate previous backup folders
if exist backup.9 rd /s /q backup.9
for /l %%i in (8 -1 1) do (
    set /a j=%%i + 1
    if exist backup.%%i move backup.%%i backup.!j! > nul
)
if exist backup move backup backup.1 > nul

md backup

rem Change these to reflect your MySQL connection parameters
set OPT=-ubackup -pbackup -hlocalhost

rem List all the databases and process them
mysql %OPT% -e"show databases" --skip-column-names -B | findstr /v "information_schema" > _db.txt
for /f %%d in (_db.txt) do call :backupdb %%d

rem Clean up, clean up, everybody wants to clean up
del /f /q _dbt.txt
del /f /q _db.txt

dir backup

goto :eof

:backupdb
    rem List the tables in the current database
    echo %1
    mysql %OPT% -e"show tables" --skip-column-names -B %1 > _dbt.txt
    md backup\work

    rem Dump each table into an SQL file
    echo  - dumping data
    for /f %%t in (_dbt.txt) do mysqldump --opt %OPT% -rbackup\work\%%t.sql %1 %%t

    rem Compress them with 7-zip into a single zip file
    echo  - compressing
    cd backup\work
    ..\..\7z -y a ..\%1.zip * > nul

    rem Clean up
    cd ..
    rd /s /q work
    cd ..
    echo.

    goto :eof

Download 7-zip, extract 7z.exe and 7z.dll and put in the same folder as the “backup.cmd” file above, and you should be good to go. Change the -u and -p parameters in the “set OPT” statement to reflect the login settings for your database server.

Persistence

/**
 *  NF_Persistence
 *
 *  Agent Smith: "Why, Mr. Anderson, why? Why do you persist?"
 *  Neo:         "Because I choose to."
 *
 *  PHP Version 5
 *
 *  @category   NiftyFramework
 *  @package    NiftyFramework
 *  @subpackage Database
 *  @author     Mats Gefvert

 *  @license    http://www.sun.com/cddl/ Common Development and Distribution License
 */

class NF_PersistenceRelationMap
{
    ....

I make myself laugh sometimes. Is that a good sign? :)

Speeding Up Delphi’s TStringList.IndexOf

Delphi’s TStringList is one of the objects I love the most. If it’s sorted (StringList.Sorted := true) then you can use it to parse huge chunks of data quickly.

For instance, looping through an enormous amount of IP addresses and keeping count of how many times they appeared, is easily done using the following code (not compiled or checked for syntax errors):

ls := TStringList.Create;
ls.Sorted := true;
for ip in ipAddresses do begin
  n := ls.IndexOf(ip);
  if n = -1 then
    ls.AddObject(ip, TObject(1))
  else
    ls.Objects[n] := TObject(Integer(ls.Objects[n]) + 1);
end;

It’s very efficient. Since TStringList.IndexOf always does a binary search, it operates in log2(n) time, and using Objects as integers allows us to keep track of count without messing with the string data.

But there are things we can do to speed it up. For instance, TStringList.IndexOf relies on TStringList.Find, which itself uses AnsiCompareStr, which is a slow Windows call, taking locale and its mother into consideration. Overriding this with our own method should be worthwhile. (The code below is adapted straight from the Classes unit.)

type
  TStringListEx = class(TStringList)
  public
    function Find(const S: string; var Index: Integer): Boolean; override;
  end;

function TStringListEx.Find(const S: string; var Index: Integer): Boolean;
var
  L, H, I, C: Integer;
begin
  Result := False;
  L := 0;
  H := Count - 1;
  while L <= H do
  begin
    I := (L + H) shr 1;
    C := CompareStr(Get(I), S);
    if C < 0 then L := I + 1 else
    begin
      H := I - 1;
      if C = 0 then
      begin
        Result := True;
        if Duplicates <> dupAccept then L := I;
      end;
    end;
  end;
  Index := L;
end;

We’ve replaced AnsiCompareStr with Delphi’s own CompareStr, which is a highly optimized FastCode routine. There are some drawbacks – things will always be sorted in byte order and no case-sensitivity is done. But we don’t care about this – it can always be done afterwards; right now, speed is the main importance.

And it turns out that using the above code, in pure examples, can slash execution time with up to about 80%. Dramatic savings, indeed. In my own example, where I analyze ftp log data, I managed to cut execution time on 122 MB of data down from 7 seconds down to 3.1 seconds.

Best of all, since TStringList.Find is declared virtual, we don’t need to change any types anywhere, just do a TStringListEx.Create instead of a TStringList.Create and you’re good to go.