When you’re using your computer, you usually type something into Microsoft word and save it. The next time you log on, you’re able to continue your work, without needing to retype what you typed the previous day. This information is stored in a file, and is therefore saved, even when the computer is turned off. These files are all stored on your hard drive. But what is a file? Just a bunch of bytes stored on your hard drive.
Files are all stored on a hard disk drive (HDD) or solid state drive (SSD) (see wikipedia), which allow data to be written to physical locations on the disk. But ‘we’ (people) don’t access a file by its physical location. The hard disk doesn’t know what a file or directory (folder) is. So how do we access our files on the hard disk?
So now we’re ready to understand what a filesystem is. It’s the glue between the user, who is used to seeing files and directories, and the hard disk, which only understands addresses and bytes. There is no single filesystem that is used - different Operating Systems (OSes) use different filesystems by default. Every filesystem has some method for viewing the available files, creating a file, and deleting a file. Most filesystem organize the files on the disk, with an index(list) of the files on the drive, which includes the name of the file, where that file is physically stored, and other useful information.
It is also possible to partition (split up) a disk into multiple pieces. One common method is the Master Boot Record (MBR), which is used by Windows and Linux. The MBR contains a table of partitions, which includes the filesystem used and the physical address of where the partition starts.
NTFS is a proprietary filesystem first developed by Microsoft in 1993. It uses a Master File Table (MFT) which contains the index of files. Each entry in the MFT is 1KB (in practice, in theory it could be different). The file is stored within fixed-size segments called clusters, and if the file is too large to fit in a single cluster, is stored in multiple clusters.
When a file is deleted, its entry in the index as ‘deleted’, which indicates that it can be reused. ( http://support.microsoft.com/kb/174619/EN-US/ ). However, for performance reasons, the data is not removed from the hard drive.
Simple file recovery programs for NTFS read through the MFT, and list the files that are marked as deleted, and attempt to copy those back to a location.
NTFS also contains a transaction log of actions, and is activated in Windows Vista and 7 - it is disabled in Windows XP. This can be used to examine previous actions that were done to the filesystem. <see http://www.forensicfocus.com/index.php?name=Forums&file=viewtopic&t=5014 > <Interesting reference: http://whereismydata.wordpress.com/category/0-forensics/ >
FAT, or File Allocation Table is a group of filesystems that use a simple list of the files on the disk. This is done by dividing the disk into ‘sectors’, and putting an entry for each sector at the beginning of the filesystem.
A file is marked ready for deletion by replacing the first byte of the filename with 0xe5 (http://www.easeus.com/data-recovery-ebook/file-deletion-in-FAT32.htm) . The file itself is deleted by setting the cluster address to 0, which indicates that it’s empty. (http://www.win.tue.nl/~aeb/linux/fs/fat/fat-1.html)
ext3 and ext4 are both filesystems commonly used by Linux OS’s. They may use a journal to store changes before they are actually stored on the disk (to prevent data corruption), depending on the configuration. index-nodes (inodes) are used to store information about each file on the disk.
When a file is deleted, the location of the data is zero’d out, similar to FAT filesystems.
ext3grep and extundelete can recover files on ext3/4 systems by searching the file system's journal for old copies of the inode. (http://extundelete.sourceforge.net/)
(ext3grep: http://carlo17.home.xs4all.nl/howto/undelete_ext3.html (http://code.google.com/p/ext3grep/ )
As you know by now, filesystems do not usually zero-out the data when they delete it. Instead, they simply remove the knowledge of where it is. File-carving is the process of reconstructing files from scanning the raw bytes of the disk and reassembling them. This is usually done by examining the header (first few bytes) and footer (the last few bytes) of a file.