Birth time of a file on GNU Linux distributions

Posted on 17 February 2019

Tags: software-archeology, BSD, GNU, coreutils, Linux, Unix, translated, pinned

Table of Contents:

Unix File Timestamps
Birth time support in Linux filesystems
- How to read btime value with fs debug tools
- Reading btime from Linux ext4 volume on a FreeBSD
Birth time support in GNU Linux distributions
What does btime actually mean and what it can be used for?
References

Have you ever wondered why it is that on GNU/Linux distributions stat(1) command from GNU Coreutils reports 3 classic unix time stamps access, modify and change and on top of that, there is also something called birth, but without any value? Why is the empty birth field even there? I became interested in this less known timestamp few months ago while I was debugging a rare problem, trying to get as much additional evidence as possible. And even though it didn’t help me in the end, I gradually learned a bit about it’s history and it’s future. Hence this post lies somewhere between software archeology and Linux Weekly News, and here you will learn where the timestamp comes from, how to work with it in GNU/Linux distributions and what it’s future looks like.

Let’s start with an example so that we get a better idea what I’m writing about here. The following output shows stat executed on a file stored on ext4 filesystem. It was done Fedora 29 with GNU Coreutils 8.30 and Linux kernel 4.19.8:

$ stat public_html
  File: public_html
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: fd07h/64775d	Inode: 7341469     Links: 2
Access: (0755/drwxr-xr-x)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Context: unconfined_u:object_r:httpd_user_content_t:s0
Access: 2018-12-09 12:59:16.321622899 +0100
Modify: 2018-05-19 21:35:13.813112882 +0200
Change: 2018-12-09 02:51:38.961313721 +0100
 Birth: -
$

Unix File Timestamps

Unix file timestamps, as returned by stat syscall, are defined in POSIX standard in the following way:

Timestamp	Field in `struct stat`	Meaning
access	`st_atim`	last data access
modify	`st_mtim`	last data modification
change	`st_ctim`	last file status (inode) change

Originally, timestamp fields in in the stat structure stored time with second precision and ended with e (eg. st_atime for access time). Later these timestamps were extended to handle nanosecond resolution, and new naming scheme without last e was chosen for compatibility reasons. Linux implements nanosecond resolution for these timestamp fields since 2.5.48, see stat(2). From now on I’m just going to refer to access time as atime, to modify time as mtime and so on regardless of the difference in resolution between st_atim and st_atime.

Important (and for this blog post the most striking) detail is that there is no file creation timestamp in POSIX standard.

That said if people unfamiliar with Unix systems look briefly at sheer timestamp field names in stat structure (see the table above), they may conclude that ctime likely stands for creation time and thus consider file creation or birth time as supported. Further reading in related man pages will quickly show that such conclusion is incorrect though. Likewise if you read paper The UNIX Time-Sharing System from 1973 authored by Unix creators today, you may believe that ctime originally meant creation time and that it’s purpose changed over time, since creation timestamp is directly mentioned there:

The entry found thereby (the file’s i-node) contains the description of the file: … time of creation, last use, and last modification

Nevertheless as clarified on wikipedia, that would be another misunderstanding. Few early versions of Research Unix actually supported a creation timestamp, as we can see eg. in stat(2) man page of version 3 from February 1973:

NAME            stat  --  get file status

SYNOPSIS        sys stat; name; buf  / stat = 18.

DESCRIPTION     name  points to a null-terminated string naming a
file; buf is the address of a 34(10) byte buffer into  which  in-
formation  is  placed  concerning the file.  It is unnecessary to
have any permissions at all with respect to the file, but all di-
rectories leading to the file must be readable.

After stat, buf has the following format:

buf, +1         i-number
+2,+3           flags (see below)
+4              number of links
+5              user ID of owner
+6,+7           size in bytes
+8,+9           first indirect block or contents block
+22,+23         eighth indirect block or contents block
+24,+25,+26,+27 creation time
+28,+29,+30,+31 modification time
+32,+33         unused

Btw another interesting thing is that stat command from this version doesn’t include creation time in it’s output, at least based on stat(1) man page (I wasn’t able to find it’s source code):

NAME            stat  --  get file status

SYNOPSIS        stat name1 ...

DESCRIPTION     stat gives several kinds of information about one
or more files:

   i-number
   access mode
   number of links
   owner
   size in bytes
   date and time of last modification
   name (useful when several files are named)

But right in the following release of Research Unix, which was version 4 from November 1973, there is no creation timestamp anymore. It’s worth noting that V4 was first Unix version written in c. And thanks to that, fields in stat structure were given their names, which differs a bit compared to the names used today. See stat(2) manpage from V4:

stat  get file status (stat = 18.)
sys stat; name; buf stat(name, buf)
char *name;
struct  inode  *buf;  points to a null-terminated string naming a
file; is the address of a 36(10) byte buffer into which  informa-
tion  is  placed  concerning the file.  It is unnecessary to have
any permissions at all with respect to the file, but all directo-
ries leading to the file must be readable.  After has the follow-
ing structure (starting offset given in bytes):
struct {
   char  minor;         /* +0: minor device of i-node */
   char  major;         /* +1: major device */
   int   inumber        /* +2 */
   int   flags;         /* +4: see below */
   char  nlinks;        /* +6: number of links to file */
   char  uid;           /* +7: user ID of owner */
   char  gid;           /* +8: group ID of owner */
   char  size0;         /* +9: high byte of 24-bit size */
   int   size1;         /* +10: low word of 24-bit size */
   int   addr[8];       /* +12: block numbers or device number */
   int   actime[2];     /* +28: time of last access */
   int   modtime[2];    /* +32: time of last modification */
};

Timestamp known as ctime was first instroduced later in version 7 from January 1979. And right from the beginning it’s meaning was the same as we understand it today, which shows that ctime never used to mean creation time.

struct	stat
{
	dev_t	st_dev;
	ino_t	st_ino;
	unsigned short st_mode;
	short	st_nlink;
	short  	st_uid;
	short  	st_gid;
	dev_t	st_rdev;
	off_t	st_size;
	time_t	st_atime;
	time_t	st_mtime;
	time_t	st_ctime;
};

Timestamps of the stat structure stayed in this form through POSIX standardization to to the present day (with the already mentioned exception of nanosecond time precision extension).

<hint> For some time I used to confuse meaning of mtime (modify) and ctime (change) so that when I was searching for files by date, I had to look it up every now and then. But based on the information presented above, we can come up with a nice hint to help us remember the difference: ctime is the timestamp sometimes incorrectly considered to represent creation time and which was not present in almost first 10 years of Unix history. So ctime must be less important than mtime, and so ctime must describe change in inode file metadata (which is something I’m not interested in most cases anyway). </hint>

It’s hard to say when unix timestamp representing creation time aka birth time (btime) showed up for the first time, because it’s quite likely that it happened in some proprietary Unix system. And since such systems are now dead, without publicly available documentation or source code under reasonable license, I’m not willing to put additional resources and effort into looking up something like that. But for Unix like systems with free and open source code, the first place is clear: timestamp for creation time was introduced for the 1st time as late as in 2003 in FreeBSD 5.0 with introduction of UFS2 filesystem. And from there it spread to other BSD systems, such as NetBSD or OpenBSD.

It’s worth noticing how the FreeBSD developers added the birth time support. Since there was enough unused space in stat structure of FreeBSD, adding new timestamp under name st_birthtime into stat struct wasn’t a problem from the compatibility perspective. Right now the timestamp is called st_birthtim, so that it’s meaning and name match convention from the current POSIX standard (see the table above).

OpenSolaris provides support for creation time as well, but it’s developers have avoided any changes in stat structure. To get a btime value it’s necessary to use fgetattr(3C) call and from the list of returned attributes read A_CRTIME. This implementation seems to be already present in the very 1st public commit of OpenSolaris project from 2005, so it likely comes from Solaris.

I don’t know whether anyone is considering adding btime into POSIX standard, I failed to find any information about that. My guess is that that nobody cares nowadays.

Birth time support in Linux filesystems

Like other Unix kernels, Linux hasn’t supported btime for a long time. Ext3 or reiserfs filesystems don’t store btime for it’s files and stat(2) syscall returns structure of the same name without this timestamp as well. Birth time support in Linux has been talked about for quite some time, but compared to FreeBSD it wasn’t possible to simply add the new timestamp into free padding space of existing stat structure, because Linux doesn’t offer enough free space there. More viable option turned out to be adding btime into proposed new xstat() syscall, but merging of this syscall got unfortunately stuck for some time.

Nevertheless Linux developers started to introduce btime support into new filesystems well before the consensus about xstat() was reached. For example ext4 received btime support in 2007 as a part of a patch implementing nanosecond timestamp resolution (disk format of ext4 is stable since kernel 2.6.28 from December 2008). Btrfs got it in 2012, while it’s disk format is stable since about November 2013. XFS, which originally didn’t implement btime, received the support as part of a change adding metadata checksums in 2013 and so it’s available since kernel 3.10. This means that even on a quite old distro, a filesystem likely already stores btime of every file, even though this information is not directly available for users to see. Without a kernel support, one can only get to this information using filesystem debugging tools, which are unique for each filesystem, and which obviously require root privileges and direct access to a block device with the filesystem.

How to read btime value with fs debug tools

First of all, let’s create new partitions to play with:

# mkfs.ext4 /dev/vdc
# mount /dev/vdc /mnt/test_ext4
# echo "ext4" > /mnt/test_ext4/testfile

# mkfs.xfs /dev/vdd
# mount /dev/vdd /mnt/test_xfs
# echo "xfs" > /mnt/test_xfs/testfile

For the record: I used a machine running Debian Stretch (current stable) for this demonstration to be more realistic. That is because Stretch already contains kernel with btime support for both ext4 and XFS, but at the same time it can’t yet pass btime values into user space. The procedure shown below would work on more modern distros as well, but when you already have an option to get btime value via syscall directly, it doesn’t make sense to do it the hard way. I would have covered btrfs as well, but I was unable to figure out how to get btime value out of it in a similar way.

To deal with ext4 filesystem we are going to use debugfs and it’s stat command, which reports file birth time as crtime:

# TZ=CET debugfs -R 'stat testfile' /dev/vdc
debugfs 1.43.4 (31-Jan-2017)
Inode: 12   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 1318526178    Version: 0x00000000:00000001
User:     0   Group:     0   Project:     0   Size: 5
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5c66c5ee:2060896c -- Fri Feb 15 15:00:14 2019
 atime: 0x5c66c600:ee5ed49c -- Fri Feb 15 15:00:32 2019
 mtime: 0x5c66c5ee:2060896c -- Fri Feb 15 15:00:14 2019
crtime: 0x5c66c5ee:2060896c -- Fri Feb 15 15:00:14 2019
Size of extra inode fields: 32
Inode checksum: 0x0721e8ea
EXTENTS:
(0):32897

For a file stored on XFS volume we will need xfs_db instead. But first of all we need to find out inode of the file we are interested in and then either umount the filesystem, or alternatively we can just remount it in a read only mode. In xfs db output, the creation timestamp is listed as v3.crtime.sec and v3.crtime.nsec:

# ls -i /mnt/test_xfs/testfile
99 /mnt/test_xfs/testfile
# umount /mnt/test_xfs
# TZ=CET xfs_db /dev/vdd
xfs_db> inode 99
xfs_db> print
core.magic = 0x494e
core.mode = 0100644
core.version = 3
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 0
core.gid = 0
core.flushiter = 0
core.atime.sec = Fri Feb 15 16:11:36 2019
core.atime.nsec = 155502016
core.mtime.sec = Fri Feb 15 16:11:36 2019
core.mtime.nsec = 155502016
core.ctime.sec = Fri Feb 15 16:11:36 2019
core.ctime.nsec = 155502016
core.size = 0
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 559694043
next_unlinked = null
v3.crc = 0x40d2f493 (correct)
v3.change_count = 3
v3.lsn = 0x100000002
v3.flags2 = 0
v3.cowextsize = 0
v3.crtime.sec = Fri Feb 15 16:11:36 2019
v3.crtime.nsec = 155502016
v3.inumber = 99
v3.uuid = 425730b5-1254-45db-8e31-87f25c75f6cd
v3.reflink = 0
v3.cowextsz = 0
u3 = (empty)

Note that if you try use command stat -v from xfs_io instead, btime value won’t be shown:

# mount /dev/vdd /mnt/test_xfs
# TZ=CET xfs_io -r /mnt/test_xfs/testfile -c 'stat -v'
fd.path = "/mnt/test_xfs/testfile"
fd.flags = non-sync,non-direct,read-only
stat.ino = 99
stat.type = regular file
stat.size = 0
stat.blocks = 0
stat.atime = Fri Feb 15 16:11:36 2019
stat.mtime = Fri Feb 15 16:11:36 2019
stat.ctime = Fri Feb 15 16:11:36 2019
fsxattr.xflags = 0x0 []
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.cowextsize = 0
fsxattr.nextents = 0
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136

Another possible pitfall with XFS is that XFS btime support is not available in RHEL 7, since as we just noted, XFS can handle btime since kernel 3.10.

And obviously even if you mount XFS volume on a system running kernel with btime support, it would not matter if the filesystem itself was created using an older format which doesn’t support it. This applies to all filesystems which haven’t originally implemented btime.

Reading btime from Linux ext4 volume on a FreeBSD

Funny thing is that when you create ext4 filesystem volume on Linux, and then mount it on a FreeBSD machine, you will be able to read btime values of files stored there using native FreeBSD stat command. This works despite the fact that FreeBSD obviously provides only limited support for Linux filesystems, e.g. ext4 volume can be mounted via ext2fs in read only mode only (there is an option to use FUSE implementation instead which can handle read write mode, but I haven’t tried that).

This is what happens when we try to read btime on ext4 volume from the previous example on FreeBSD 12. Timestamps in the output are listed in the following order: atime, mtime, ctime and btime.

# mount -t ext2fs -o ro /dev/vtbd1 /mnt/test_ext4
# cat /mnt/test_ext4/testfile
ext4
# env TZ=CET stat /mnt/test_ext4/testfile
92 12 -rw-r--r-- 1 root wheel 127754 5 "Feb 15 15:00:32 2019" "Feb 15 15:00:14 2019" "Feb 15 15:00:14 2019" "Feb 15 15:00:14 2019" 4096 8 0 /mnt/test_ext4/testfile

Birth time support in GNU Linux distributions

So far we have covered support of btime timestamp in Linux filesystems. But for this to be useful, we need to be able to pass this information from kernel to user space. And as I have already noted above, the original plan was that btime value would be available via xstat() syscall, but then merging of this syscall got stuck, and then after few years it appeared again in a different form as statx(), which was eventually merged in Linux kernel 4.11 from April 2017. GNU C Library supports btime since glibc 2.28 from August 2018. Which means that we can try this out already eg. with Fedora 29.

The following code demonstrates how to use statx(2) to read a btime value for given file. The fact that we can specify which metadata we are interested in makes it possible for kernel to avoid fetching values which we won’t plan to use. This is by the way one of the selling points of statx(2) compared to stat(2).

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char**argv)
{
	struct statx stx = { 0, };
	if (argc != 2) {
		printf("Usage: %s FILE\n", argv[0]);
		printf("Display btime of a given file in UNIX time format.\n");
		return EXIT_SUCCESS;
	}
	int rc = statx(AT_FDCWD, argv[1], AT_SYMLINK_NOFOLLOW, STATX_BTIME, &stx);
	if (rc == 0) {
		if (stx.stx_btime.tv_sec != 0) {
			printf("@%u.%u\n", stx.stx_btime.tv_sec, stx.stx_btime.tv_nsec);
		} else {
			printf("-\n");
		}
	} else {
		perror("statx");
	}
	return rc;
}

If you use a distro with Linux kernel 4.11 or later, but still with glibc version older than 2.28, you will have to call statx(2) via syscall(2) instead.

The program reports sheer timestamp value in unix epoch format. The leading at sign is just a hack to make it straightforward for date tool to decode it’s value:

$ make btime
cc     btime.c   -o btime
$ ./btime btime
@1550254543.238843517
$ ./btime btime | date -f- --rfc-3339=ns
2019-02-15 19:15:43.238843517+01:00

When we mount the ext4 volume from previous experiments, we will get expected result:

$ ./btime /mnt/test_ext4/testfile | date -f- --rfc-3339=ns
2019-02-15 15:00:14.135799387+01:00

Stat from GNU Coreutils

As I noted in the beginning, stat(1) command from GNU Coreutils reports btime as “-”. We may wonder why the stat tool bothers with this at all, because until recently, Linux kernel didn’t provide any way to get this information. That said a closer look reveals that gnulib, which is used by GNU stat to read values from stat struct, was enhanced with btime support thanks to BSD systems back in 2007. And that btime related code was introduced into stat(1) in 2010. I guess that it was somehow related to the xstat(2) syscall effort, which failed to be merged into the kernel in the end. In any case, thanks to all this, since GNU Coreutils 8.6 from 2010, stat(1) reports btime as “-” in all cases no matter what filesystem is involved when executed on Linux, while eg. on BSD systems or Solaris it can actually report birth time value as long as the filesystem supports it.

Further look at the coreutils source code reveals that thanks to a hack implementing btime support for Solaris, it’s not that hard to add Linux btime support using statx(2) syscall in a similar way:

diff --git a/configure.ac b/configure.ac
index 669e9d1f2..081728c96 100644
--- a/configure.ac
+++ b/configure.ac
@@ -318,6 +318,8 @@ if test $ac_cv_func_getattrat = yes; then
   AC_SUBST([LIB_NVPAIR])
 fi

+AC_CHECK_FUNCS([statx])
+
 # SCO-ODT-3.0 is reported to need -los to link programs using initgroups
 AC_CHECK_FUNCS([initgroups])
 if test $ac_cv_func_initgroups = no; then
diff --git a/src/stat.c b/src/stat.c
index 0a5ef3cb4..189328cab 100644
--- a/src/stat.c
+++ b/src/stat.c
@@ -1007,6 +1007,24 @@ get_birthtime (int fd, char const *filename, struct stat const *st)
     }
 #endif

+#if HAVE_STATX
+  if (ts.tv_nsec < 0)
+    {
+      struct statx stx = { 0, };
+      if ((fd < 0
+           ? statx(AT_FDCWD, filename, AT_SYMLINK_NOFOLLOW, STATX_BTIME, &stx)
+           : statx(fd, "", AT_EMPTY_PATH, STATX_BTIME, &stx))
+          == 0)
+        {
+          if (stx.stx_btime.tv_sec != 0)
+            {
+              ts.tv_sec = stx.stx_btime.tv_sec;
+              ts.tv_nsec = stx.stx_btime.tv_nsec;
+            }
+        }
+    }
+#endif
+
   return ts;
 }

The idea behind this hack is that stat calls classic stat(2) syscall to get file metadata as before, but then it also uses statx(2) to get just btime. This is good enough for us to start playing with this feature:

$ touch ~/tmp/test
$ ./stat ~/tmp/test
  File: /home/martin/tmp/test
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd07h/64775d    Inode: 7377267     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2019-02-15 19:52:40.499658659 +0100
Modify: 2019-02-15 19:52:40.499658659 +0100
Change: 2019-02-15 19:52:40.499658659 +0100
 Birth: 2019-02-15 19:52:40.499658659 +0100
$ touch ~/tmp/test
$ ./stat ~/tmp/test
  File: /home/martin/tmp/test
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd07h/64775d    Inode: 7377267     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2019-02-15 19:52:46.598671520 +0100
Modify: 2019-02-15 19:52:46.598671520 +0100
Change: 2019-02-15 19:52:46.598671520 +0100
 Birth: 2019-02-15 19:52:40.499658659 +0100

That said this solution is obviously not optimal, as it uses 2 syscalls instead of just one. And needless to say it’s not utilizing possibilities of statx(2) syscall design. So far I haven’t tried to come up with a more polished patch, and based on the fact that nobody replied to my coreutils list message, my guess is that nobody is actively working on it right now.

Stat from GNU Bash

I got a bit worried when I was reading release notes of Bash 5.0 from January 2019, which introduces:

New loadable builtins: rm, stat, fdflags.

At first I was thinking that adding btime support into stat from GNU coreutils won’t be enough, as since bash 5.0, most users will be working with stat implemented as a shell builtin, in a similar way bash users mostly work with bash implementation of time instead of /bin/time command line tool:

$ type -a time
time is a shell keyword
time is /usr/bin/time
time is /bin/time

But it turned out that this is not the case, because it’s so called dynamic loadable builtin. The source code of bash implementation of stat is located in examples/loadables/ directory, and builtins placed there (such as cat.c or sleep.c) are not available in binary packages of bash (which you can check for yourself when you inspect output of enable -a command). The idea behind this is that when you need to optimize a shell script, which is eg. using sleep in a loop, you can compile corresponding builtin (or write your own) and load it into bash via enable -f. Personally I would rather use pyton in such cases, but if you can’t avoid bash (eg. because it’s some big legacy script), the option is there.

And as I indicated above, bash builtin implementation of stat doesn’t read btime values:

$ enable -f ~/projects/bash/examples/loadables/stat stat
$ help stat
stat: stat [-lL] [-A aname] file
    Load an associative array with file status information.

    Take a filename and load the status information returned by a
    stat(2) call on that file into the associative array specified
    by the -A option.  The default array name is STAT.  If the -L
    option is supplied, stat does not resolve symbolic links and
    reports information about the link itself.  The -l option results
    in longer-form listings for some of the fields. The exit status is 0
    unless the stat fails or assigning the array is unsuccessful.
$ stat ~/tmp/test
$ for i in "${!STAT[@]}"; do echo $i = ${STAT[$i]}; done
nlink = 1
link = /home/martin/tmp/test
perms = 0664
inode = 7377267
blksize = 4096
device = 64775
atime = 1550256766
type = -
blocks = 0
uid = 1000
size = 0
rdev = 0
name = /home/martin/tmp/test
mtime = 1550256766
ctime = 1550256766
gid = 1000

But in this case it seems to me that instead of adding btime support into this stat builtin function, it would be better to write another one, which would be able to fully take advantage of new statx(2) syscall.

Other tools

Unfortunately btime support in base components of GNU Linux distributions still doesn’t go beyond statx(2) wrapper in glibc. Like above mentioned stat implementations, no basic tool such as ls or find can work with btime on Linux right now. As in the case of stat tool, some of these tools already have some btime related functionality, e.g.find can already search files by btime, it only lacks a way to read btime value on Linux. On the other hand since btime value is available only via new statx(2) syscall, changes in these tools may not be always as straightforward as one could assume at first.

It will also depend on whether it will be possible to change btime via syscalls such as utimes(2) or utimensat(2). Right now, it’s not possible to change btime of a file. On one hand it makes sense, birth time doesn’t change later by definition but on the other hand it also means that we can’t archive file including it’s btime value via cp -a nor restore it from backup via rsync. Because of this, implementation of btime handling in GNU tar will likely take bit longer, as it’s not clear why would anyone implement btime handling there, when this information can’t be restored on Linux during archive extraction.

At this point it’s worth pointing out that FreeBSD provides a way to change btime of a file via utimes(2) from the beginning, as explained in original UFS2 design paper.

What does btime actually mean and what it can be used for?

What does a birth time of a file actually mean? It looks like the answer is clear, but it’s not that simple. It’s a low level information about a file. Kernel assigns it when a file is created, and doesn’t provide any way to change it later. That said you will easily lose this information eg. when you copy a file, no matter if you try to preserve file attributes, since from the kernel perspective, it’s a new file and btime can’t be changed. Another common use case when you lose btime is when an application renames a temporary file overwriting existing file to perform atomic write.

Btw it haven’t occurred to me before that such atomic rename operation is done by vim every time you write to a file (note the change of inode and birth time of the file):

$ rm ~/tmp/test
$ touch ~/tmp/test
$ stat.hacked ~/tmp/test
  File: /home/martin/tmp/test
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: fd07h/64775d	Inode: 7377286     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2019-02-17 09:51:45.483720811 +0100
Modify: 2019-02-17 09:51:45.483720811 +0100
Change: 2019-02-17 09:51:45.483720811 +0100
 Birth: 2019-02-17 09:51:45.483720811 +0100
$ vim ~/tmp/test
$ stat.hacked ~/tmp/test
  File: /home/martin/tmp/test
  Size: 5         	Blocks: 8          IO Block: 4096   regular file
Device: fd07h/64775d	Inode: 7377267     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2019-02-17 09:52:17.151767057 +0100
Modify: 2019-02-17 09:52:17.151767057 +0100
Change: 2019-02-17 09:52:17.156767065 +0100
 Birth: 2019-02-17 09:52:17.151767057 +0100

And this finally brings us to the question: What is btime actually good for? Judging from the time it took for complete btime support to reach Linux kernel, nobody seems to give it a high priority. This is also apparent from fact that btime related changes often appear in larger commits whose main purpose is dealing with something else. For example ext4 introduced btime in a patch implementing nanosecond timestamps. Similarly XFS added btime as part of adding metadata checksums and statx(2) syscall wasn’t designed just to read btime. And last but not least, during the whole 50 year long unix history, nobody suggested to have it included in POSIX standard.

When we look at the reasons for btime implementation in Linux, besides sheer “UFS2/ZFS has it as well” we often see references to Samba and Windows compatibility. Unfortunately Samba can’t directly take advantage of Linux btime, since Windows allows birth time of a file to be changed. And while NTFS-3G could use Linux btime to report birth time of NTFS files, nobody will work on this until there is statx(2) support available in FUSE and at least all coreutils tools are be able to work with btime. Moreover NTFS-3G can already report birth time on Linux via extended attributes, although using just ls would certainly be more convenient.

In any case the new timestamp can be used for debugging of a strange behaviour, when every pice of additional evidence is useful no matter if the problem is caused by an attacker, malware or a bug. Besides these “forensic” cases, btime could be used for the opposite purposes as well. For example one could try to hide a small amount of data in the timestamp. Paradoxically the current status quo (btime support implemented only in Linux kernel) is actually good for both cases. Forensic analysis benefits from the fact that a less known timestamp is less likely to be falsified. And the opposite use cases can take advantage of the fact that btime misuse is not directly visible.

References

File creation times from lwn.net (2010),
stat (system call) from Wikipedia,
Comparison of file systems from Wikipedia,
question How to find creation date of file? from unix.stackexchange.com,
task_diag and statx() from lwn.net (2016),
paper Anti-forensics in ext4: On secrecy and usability of timestamp-baseddata hiding,
paper and talk Forensic Timestamp Analysis of ZFS from BSDCan 2014.

Historical sources:

Unix History Repository
Enhancements to the Fast Filesystem To Support Multi-Terabyte Storage Systems: description of UFS2 filesystem design, including it’s birth time implementation
OpenSolaris project repository