GeoIP and MySQL

For my own and possibly others’ reference, these are quick notes on how to use GeoIP data from MaxMind in their new split file formats. Older tutorials describe using the GeoIP data from a time when they were one file, it seems now MaxMind have split into two files.

The files are split into Location and Blocks, so we create two tables to accommodate this:

CREATE TABLE location (
 LocId INT(8) NOT NULL,
 PRIMARY KEY (LocId),
 country CHAR(2),
 region VARCHAR(255),
 city VARCHAR(255),
 postalCode VARCHAR(255),
 latitude DECIMAL(20,17),
 longitude DECIMAL(20,17),
 metroCode VARCHAR(50),
 areaCode VARCHAR(50)
) ENGINE=INNODB;
CREATE TABLE blocks (
 startIpNum INT(10) NOT NULL,
 EndIpNum INT(10) NOT NULL,
 LocId INT(8) NOT NULL,
 FOREIGN KEY (LocID) REFERENCES location(LocId) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=INNODB;

Now it’s time to import the data from the files. Since LocId is a foreign key to the blocks table, location must be inserted first.

mysql> LOAD DATA LOCAL INFILE '/root/GeoLiteCity_20090601/GeoLiteCity-Location.csv' INTO TABLE location FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (LocId, country, region, city, postalCode, latitude, longitude, metroCode, areaCode);
Query OK, 248276 rows affected, 13 warnings (9.65 sec)
Records: 248277 Deleted: 0 Skipped: 1 Warnings: 9
mysql> LOAD DATA LOCAL INFILE '/root/GeoLiteCity_20090601/GeoLiteCity-Blocks.csv' INTO TABLE blocks FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (startIpNum, EndIpNum, LocId);
Query OK, 4132041 rows affected, 65535 warnings (15 min 20.44 sec)
Records: 4132041 Deleted: 0 Skipped: 0 Warnings: 2739156

That done, we try a select.

mysql> select A.country from location A
  JOIN blocks B on A.LocId = B.LocId
  AND 404232216 >= startIpNum AND 404232216 <= EndIpNum;
+---------+
| country |
+---------+
| US      |
+---------+
1 row in set (2 min 3.58 sec)

As I had no indexes yet, the query took some time. So let’s add them:

mysql> CREATE INDEX startIp_ind ON blocks (startIpNum);
Query OK, 4132041 rows affected (9 min 22.23 sec)
Records: 4132041 Duplicates: 0 Warnings: 0
mysql> CREATE INDEX endIp_ind ON blocks (endIpNum);
Query OK, 4132041 rows affected (9 min 22.33 sec)
Records: 4132041 Duplicates: 0 Warnings: 0
mysql> CREATE INDEX LocId_ind ON location (LocId);
Query OK, 248276 rows affected (4.08 sec)
Records: 248276 Duplicates: 0 Warnings: 0

And voila:

mysql> select A.country from location A
  JOIN blocks B on A.LocId = B.LocId
  AND 404232216 >= startIpNum AND 404232216 <= EndIpNum;
+---------+
| country |
+---------+
| US      |
+---------+
1 row in set (0.55 sec)

Also, a better SQL query could’ve been more efficient, but hey – this is for illustrative purposes.