IBM system scans 10 billion files in 43 minutes

IBM researchers have broken their own record for file scanning of large-scale storage systems.

IBM has successfully scanned 10 billion files in just 43 minutes, opening the doors to access of zettabytes of information storage.

This means a massive improvement on the previous record, a relatively sluggish one billion files scanned in three hours.

This was partly due to the unification of data environment on a single platform rather than being spread across several separately managed platforms.

It also meant reducing and simplifying data management tasks, so more information can be stored in one place rather than having to get more storage.

The advance came using the General Parallel File System (GPFS), first developed back in 1998.

The new record was set using GPFS running on a cluster of 10 eight core systems, IBM says, with an algorithm to make full use of all cores at all times.

According to Doug Balog, VP for storage at IBM, this shows potential for “much larger data environments to be unified on a single platform”, which will simplify data mangement tasks like “data placement, aging, backup and migration of individual files.”

The researchers beat IBM’s own personal best, with the previous record set at the Supercomputing conference in 2007.

With ever more data being created and analysed IBM hope it will be able to do more than just break records.