Have you ever wondered why it is that on GNU/Linux distributions
stat(1)
command from GNU
Coreutils reports 3 classic unix time
stamps access, modify and change and on top of that, there is also
something called birth, but without any value? Why is the empty birth field
even there?
I became interested in this less known timestamp few months ago while I was
debugging a rare problem, trying to get as much additional evidence as
possible. And even though it didn’t help me in the end, I gradually learned a
bit about it’s history and it’s future. Hence this post lies somewhere between
software archeology and Linux Weekly News, and here you will
learn where the timestamp comes from, how to work with it in GNU/Linux
distributions and what it’s future looks like.
Let’s start with an example so that we get a better idea what I’m writing about here. The following output shows stat executed on a file stored on ext4 filesystem. It was done Fedora 29 with GNU Coreutils 8.30 and Linux kernel 4.19.8:
$ stat public_html
File: public_html
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: fd07h/64775d Inode: 7341469 Links: 2
Access: (0755/drwxr-xr-x) Uid: ( 1000/ martin) Gid: ( 1000/ martin)
Context: unconfined_u:object_r:httpd_user_content_t:s0
Access: 2018-12-09 12:59:16.321622899 +0100
Modify: 2018-05-19 21:35:13.813112882 +0200
Change: 2018-12-09 02:51:38.961313721 +0100
Birth: -
$
Unix File Timestamps
Unix file timestamps, as returned by stat syscall, are defined in POSIX standard in the following way:
Timestamp | Field in struct stat |
Meaning |
---|---|---|
access | st_atim |
last data access |
modify | st_mtim |
last data modification |
change | st_ctim |
last file status (inode) change |
Originally, timestamp fields in in the stat
structure stored time with second
precision and ended with e
(eg. st_atime
for access time). Later
these timestamps were extended to handle nanosecond resolution, and new naming
scheme without last e
was chosen for compatibility reasons.
Linux implements nanosecond resolution for these timestamp fields since 2.5.48,
see stat(2)
.
From now on I’m just going to refer to access time as atime, to modify
time as mtime and so on regardless of the difference in resolution between
st_atim
and st_atime
.
Important (and for this blog post the most striking) detail is that there is no file creation timestamp in POSIX standard.
That said if people unfamiliar with Unix systems look briefly at sheer
timestamp field names in stat
structure (see the table above), they may
conclude that ctime likely stands for creation time and thus consider file
creation or birth time as supported.
Further reading in related man pages will quickly show that such conclusion is
incorrect though.
Likewise if you read paper The UNIX Time-Sharing
System from 1973 authored by
Unix creators today, you may believe that ctime originally meant creation
time and that it’s purpose changed over time, since creation timestamp is
directly mentioned there:
The entry found thereby (the file’s i-node) contains the description of the file: … time of creation, last use, and last modification
Nevertheless as clarified on
wikipedia, that would be another misunderstanding.
Few early versions of Research
Unix actually supported a
creation timestamp, as we can see eg. in
stat(2)
man page of version 3
from February 1973:
NAME stat -- get file status
SYNOPSIS sys stat; name; buf / stat = 18.
DESCRIPTION name points to a null-terminated string naming a
file; buf is the address of a 34(10) byte buffer into which in-
formation is placed concerning the file. It is unnecessary to
have any permissions at all with respect to the file, but all di-
rectories leading to the file must be readable.
After stat, buf has the following format:
buf, +1 i-number
+2,+3 flags (see below)
+4 number of links
+5 user ID of owner
+6,+7 size in bytes
+8,+9 first indirect block or contents block
+22,+23 eighth indirect block or contents block
+24,+25,+26,+27 creation time
+28,+29,+30,+31 modification time
+32,+33 unused
Btw another interesting thing is that stat
command from this version doesn’t
include creation time in it’s output, at least based on
stat(1)
man page
(I wasn’t able to find it’s source code):
NAME stat -- get file status
SYNOPSIS stat name1 ...
DESCRIPTION stat gives several kinds of information about one
or more files:
i-number
access mode
number of links
owner
size in bytes
date and time of last modification
name (useful when several files are named)
But right in the following release of Research Unix, which was version 4 from
November 1973, there is no creation timestamp anymore.
It’s worth noting that V4 was first Unix version written in c.
And thanks to that, fields in stat structure were given their names, which
differs a bit compared to the names used today.
See stat(2)
manpage from
V4:
stat get file status (stat = 18.)
sys stat; name; buf stat(name, buf)
char *name;
struct inode *buf; points to a null-terminated string naming a
file; is the address of a 36(10) byte buffer into which informa-
tion is placed concerning the file. It is unnecessary to have
any permissions at all with respect to the file, but all directo-
ries leading to the file must be readable. After has the follow-
ing structure (starting offset given in bytes):
struct {
char minor; /* +0: minor device of i-node */
char major; /* +1: major device */
int inumber /* +2 */
int flags; /* +4: see below */
char nlinks; /* +6: number of links to file */
char uid; /* +7: user ID of owner */
char gid; /* +8: group ID of owner */
char size0; /* +9: high byte of 24-bit size */
int size1; /* +10: low word of 24-bit size */
int addr[8]; /* +12: block numbers or device number */
int actime[2]; /* +28: time of last access */
int modtime[2]; /* +32: time of last modification */
};
Timestamp known as ctime was first instroduced later in version 7 from January 1979. And right from the beginning it’s meaning was the same as we understand it today, which shows that ctime never used to mean creation time.
struct stat
{
dev_t st_dev;
ino_t st_ino;
unsigned short st_mode;
short st_nlink;
short st_uid;
short st_gid;
dev_t st_rdev;
off_t st_size;
time_t st_atime;
time_t st_mtime;
time_t st_ctime;
};
Timestamps of the stat structure stayed in this form through POSIX standardization to to the present day (with the already mentioned exception of nanosecond time precision extension).
<hint> For some time I used to confuse meaning of mtime (modify) and ctime (change) so that when I was searching for files by date, I had to look it up every now and then. But based on the information presented above, we can come up with a nice hint to help us remember the difference: ctime is the timestamp sometimes incorrectly considered to represent creation time and which was not present in almost first 10 years of Unix history. So ctime must be less important than mtime, and so ctime must describe change in inode file metadata (which is something I’m not interested in most cases anyway). </hint>
It’s hard to say when unix timestamp representing creation time aka birth time (btime) showed up for the first time, because it’s quite likely that it happened in some proprietary Unix system. And since such systems are now dead, without publicly available documentation or source code under reasonable license, I’m not willing to put additional resources and effort into looking up something like that. But for Unix like systems with free and open source code, the first place is clear: timestamp for creation time was introduced for the 1st time as late as in 2003 in FreeBSD 5.0 with introduction of UFS2 filesystem. And from there it spread to other BSD systems, such as NetBSD or OpenBSD.
It’s worth noticing how the FreeBSD developers added the birth time support.
Since there was enough unused space in stat structure of FreeBSD,
adding new timestamp under name st_birthtime
into stat struct
wasn’t a problem from the compatibility perspective. Right now the timestamp
is called
st_birthtim
,
so that it’s meaning and name match convention from the current POSIX standard
(see the table above).
OpenSolaris provides support
for creation time as well, but it’s developers have avoided any changes in stat
structure. To get a btime value it’s necessary to use
fgetattr(3C) call
and from the list of returned attributes read A_CRTIME
.
This implementation seems to be already present in the very 1st public commit
of OpenSolaris project from 2005, so it likely comes from Solaris.
I don’t know whether anyone is considering adding btime into POSIX standard, I failed to find any information about that. My guess is that that nobody cares nowadays.
Birth time support in Linux filesystems
Like other Unix kernels, Linux hasn’t supported btime for a long time.
Ext3 or reiserfs filesystems don’t store btime for it’s files and
stat(2)
syscall
returns structure of the same name without this timestamp as well.
Birth time support in Linux has been talked about for quite some
time,
but compared to FreeBSD it wasn’t possible to simply add the new timestamp
into free padding space of existing stat
structure, because Linux doesn’t
offer enough free space there.
More viable option turned out to be adding btime into proposed new xstat()
syscall, but merging of this syscall got unfortunately stuck for some
time.
Nevertheless Linux developers started to introduce btime support into new
filesystems well before the consensus about xstat()
was reached.
For example ext4 received btime support in 2007 as a part of a patch
implementing nanosecond timestamp
resolution
(disk format of ext4 is stable since kernel
2.6.28 from December 2008).
Btrfs got it in 2012,
while it’s disk format is stable since about November 2013.
XFS, which originally didn’t implement btime, received the support as part of
a change adding metadata checksums in
2013
and so it’s available since kernel 3.10.
This means that even on a quite old distro, a filesystem likely already stores
btime of every file, even though this information is not directly available
for users to see. Without a kernel support, one can only get to this
information using filesystem debugging tools, which are unique for each
filesystem, and which obviously require root privileges and direct access to a
block device with the filesystem.
How to read btime value with fs debug tools
First of all, let’s create new partitions to play with:
# mkfs.ext4 /dev/vdc
# mount /dev/vdc /mnt/test_ext4
# echo "ext4" > /mnt/test_ext4/testfile
# mkfs.xfs /dev/vdd
# mount /dev/vdd /mnt/test_xfs
# echo "xfs" > /mnt/test_xfs/testfile
For the record: I used a machine running Debian Stretch (current stable) for this demonstration to be more realistic. That is because Stretch already contains kernel with btime support for both ext4 and XFS, but at the same time it can’t yet pass btime values into user space. The procedure shown below would work on more modern distros as well, but when you already have an option to get btime value via syscall directly, it doesn’t make sense to do it the hard way. I would have covered btrfs as well, but I was unable to figure out how to get btime value out of it in a similar way.
To deal with ext4 filesystem we are going to use
debugfs
and it’s
stat
command, which reports file birth time as crtime
:
# TZ=CET debugfs -R 'stat testfile' /dev/vdc
debugfs 1.43.4 (31-Jan-2017)
Inode: 12 Type: regular Mode: 0644 Flags: 0x80000
Generation: 1318526178 Version: 0x00000000:00000001
User: 0 Group: 0 Project: 0 Size: 5
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x5c66c5ee:2060896c -- Fri Feb 15 15:00:14 2019
atime: 0x5c66c600:ee5ed49c -- Fri Feb 15 15:00:32 2019
mtime: 0x5c66c5ee:2060896c -- Fri Feb 15 15:00:14 2019
crtime: 0x5c66c5ee:2060896c -- Fri Feb 15 15:00:14 2019
Size of extra inode fields: 32
Inode checksum: 0x0721e8ea
EXTENTS:
(0):32897
For a file stored on XFS volume we will need xfs_db
instead.
But first of all we
need to find out inode of the file we are interested in and then either umount
the filesystem, or alternatively we can just remount it in a read only mode.
In xfs db output, the creation timestamp is listed as v3.crtime.sec
and
v3.crtime.nsec
:
# ls -i /mnt/test_xfs/testfile
99 /mnt/test_xfs/testfile
# umount /mnt/test_xfs
# TZ=CET xfs_db /dev/vdd
xfs_db> inode 99
xfs_db> print
core.magic = 0x494e
core.mode = 0100644
core.version = 3
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 0
core.gid = 0
core.flushiter = 0
core.atime.sec = Fri Feb 15 16:11:36 2019
core.atime.nsec = 155502016
core.mtime.sec = Fri Feb 15 16:11:36 2019
core.mtime.nsec = 155502016
core.ctime.sec = Fri Feb 15 16:11:36 2019
core.ctime.nsec = 155502016
core.size = 0
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 559694043
next_unlinked = null
v3.crc = 0x40d2f493 (correct)
v3.change_count = 3
v3.lsn = 0x100000002
v3.flags2 = 0
v3.cowextsize = 0
v3.crtime.sec = Fri Feb 15 16:11:36 2019
v3.crtime.nsec = 155502016
v3.inumber = 99
v3.uuid = 425730b5-1254-45db-8e31-87f25c75f6cd
v3.reflink = 0
v3.cowextsz = 0
u3 = (empty)
Note that if you try use command stat -v
from xfs_io
instead, btime value
won’t be shown:
# mount /dev/vdd /mnt/test_xfs
# TZ=CET xfs_io -r /mnt/test_xfs/testfile -c 'stat -v'
fd.path = "/mnt/test_xfs/testfile"
fd.flags = non-sync,non-direct,read-only
stat.ino = 99
stat.type = regular file
stat.size = 0
stat.blocks = 0
stat.atime = Fri Feb 15 16:11:36 2019
stat.mtime = Fri Feb 15 16:11:36 2019
stat.ctime = Fri Feb 15 16:11:36 2019
fsxattr.xflags = 0x0 []
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.cowextsize = 0
fsxattr.nextents = 0
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
Another possible pitfall with XFS is that XFS btime support is not available in RHEL 7, since as we just noted, XFS can handle btime since kernel 3.10.
And obviously even if you mount XFS volume on a system running kernel with btime support, it would not matter if the filesystem itself was created using an older format which doesn’t support it. This applies to all filesystems which haven’t originally implemented btime.
Reading btime from Linux ext4 volume on a FreeBSD
Funny thing is that when you create ext4 filesystem volume on Linux, and then
mount it on a FreeBSD machine,
you will be able to read btime values of files stored there
using native FreeBSD stat
command. This works despite the fact that FreeBSD
obviously provides only limited support for Linux filesystems, e.g. ext4 volume
can be mounted via
ext2fs
in read only mode only (there is an option to use FUSE implementation instead
which can handle read write mode, but I haven’t tried that).
This is what happens when we try to read btime on ext4 volume from the previous example on FreeBSD 12. Timestamps in the output are listed in the following order: atime, mtime, ctime and btime.
# mount -t ext2fs -o ro /dev/vtbd1 /mnt/test_ext4
# cat /mnt/test_ext4/testfile
ext4
# env TZ=CET stat /mnt/test_ext4/testfile
92 12 -rw-r--r-- 1 root wheel 127754 5 "Feb 15 15:00:32 2019" "Feb 15 15:00:14 2019" "Feb 15 15:00:14 2019" "Feb 15 15:00:14 2019" 4096 8 0 /mnt/test_ext4/testfile
Birth time support in GNU Linux distributions
So far we have covered support of btime timestamp in Linux filesystems.
But for this to be useful, we need to be able to pass this information from
kernel to user space. And as I have already noted above, the original plan was
that btime value would be available via xstat()
syscall, but then merging
of this syscall got stuck, and then
after few years it appeared again in a different form as
statx()
, which was eventually merged in
Linux kernel
4.11
from April 2017.
GNU C Library supports
btime since
glibc 2.28
from August 2018. Which means that we can try this out already eg. with
Fedora 29.
The following code demonstrates how to use
statx(2)
to
read a btime value for given file.
The fact that we can specify which metadata we are interested in makes it
possible for kernel to avoid fetching values which we won’t plan to use. This
is by the way one of the selling points of statx(2)
compared to stat(2)
.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char**argv)
{
struct statx stx = { 0, };
if (argc != 2) {
("Usage: %s FILE\n", argv[0]);
printf("Display btime of a given file in UNIX time format.\n");
printfreturn EXIT_SUCCESS;
}
int rc = statx(AT_FDCWD, argv[1], AT_SYMLINK_NOFOLLOW, STATX_BTIME, &stx);
if (rc == 0) {
if (stx.stx_btime.tv_sec != 0) {
("@%u.%u\n", stx.stx_btime.tv_sec, stx.stx_btime.tv_nsec);
printf} else {
("-\n");
printf}
} else {
("statx");
perror}
return rc;
}
If you use a distro with Linux kernel 4.11 or later, but still with glibc
version older than 2.28, you will have to call statx(2)
via syscall(2)
instead.
The program reports sheer timestamp value in unix epoch format. The leading at
sign is just a hack to make it straightforward for date
tool to decode it’s
value:
$ make btime
cc btime.c -o btime
$ ./btime btime
@1550254543.238843517
$ ./btime btime | date -f- --rfc-3339=ns
2019-02-15 19:15:43.238843517+01:00
When we mount the ext4 volume from previous experiments, we will get expected result:
$ ./btime /mnt/test_ext4/testfile | date -f- --rfc-3339=ns
2019-02-15 15:00:14.135799387+01:00
Stat from GNU Coreutils
As I noted in the beginning, stat(1)
command from GNU Coreutils reports
btime as “-”. We may wonder why the stat tool bothers with this at all, because
until recently, Linux kernel didn’t provide any way to get this information.
That said a closer look reveals that
gnulib, which is used by GNU stat to
read values from stat struct, was enhanced with btime support thanks to BSD
systems back in
2007.
And that btime related code was introduced into stat(1)
in
2010.
I guess that it was somehow related to the xstat(2)
syscall effort, which
failed to be merged into the kernel in the end.
In any case, thanks to all this, since GNU Coreutils 8.6 from 2010,
stat(1)
reports btime as “-” in all cases no matter what filesystem is
involved when executed on Linux, while eg. on BSD systems or Solaris it can
actually report birth time value as long as the filesystem supports it.
Further look at the coreutils source code reveals that thanks to a hack
implementing btime support for Solaris, it’s not that hard to add
Linux btime support using statx(2)
syscall in a similar way:
diff --git a/configure.ac b/configure.ac
index 669e9d1f2..081728c96 100644--- a/configure.ac
+++ b/configure.ac
@@ -318,6 +318,8 @@ if test $ac_cv_func_getattrat = yes; then
AC_SUBST([LIB_NVPAIR])
fi
+AC_CHECK_FUNCS([statx])
+
# SCO-ODT-3.0 is reported to need -los to link programs using initgroups
AC_CHECK_FUNCS([initgroups])
if test $ac_cv_func_initgroups = no; thendiff --git a/src/stat.c b/src/stat.c
index 0a5ef3cb4..189328cab 100644--- a/src/stat.c
+++ b/src/stat.c
@@ -1007,6 +1007,24 @@ get_birthtime (int fd, char const *filename, struct stat const *st)
}
#endif
+#if HAVE_STATX
+ if (ts.tv_nsec < 0)
+ {
+ struct statx stx = { 0, };
+ if ((fd < 0
+ ? statx(AT_FDCWD, filename, AT_SYMLINK_NOFOLLOW, STATX_BTIME, &stx)
+ : statx(fd, "", AT_EMPTY_PATH, STATX_BTIME, &stx))
+ == 0)
+ {
+ if (stx.stx_btime.tv_sec != 0)
+ {
+ ts.tv_sec = stx.stx_btime.tv_sec;
+ ts.tv_nsec = stx.stx_btime.tv_nsec;
+ }
+ }
+ }
+#endif
+
return ts; }
The idea behind this hack is that stat calls classic stat(2)
syscall to get
file metadata as before, but then it also uses statx(2)
to get just
btime. This is good enough for us to start playing with this feature:
$ touch ~/tmp/test
$ ./stat ~/tmp/test
File: /home/martin/tmp/test
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: fd07h/64775d Inode: 7377267 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ martin) Gid: ( 1000/ martin)
Access: 2019-02-15 19:52:40.499658659 +0100
Modify: 2019-02-15 19:52:40.499658659 +0100
Change: 2019-02-15 19:52:40.499658659 +0100
Birth: 2019-02-15 19:52:40.499658659 +0100
$ touch ~/tmp/test
$ ./stat ~/tmp/test
File: /home/martin/tmp/test
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: fd07h/64775d Inode: 7377267 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ martin) Gid: ( 1000/ martin)
Access: 2019-02-15 19:52:46.598671520 +0100
Modify: 2019-02-15 19:52:46.598671520 +0100
Change: 2019-02-15 19:52:46.598671520 +0100
Birth: 2019-02-15 19:52:40.499658659 +0100
That said this solution is obviously not optimal, as it uses 2 syscalls
instead of just one. And needless to say it’s not utilizing possibilities of
statx(2)
syscall design. So far I haven’t tried to come up with
a more polished patch, and based on the fact that nobody replied to my
coreutils list
message,
my guess is that nobody is actively working on it right now.
Stat from GNU Bash
I got a bit worried when I was reading release notes of Bash 5.0 from January 2019, which introduces:
- New loadable builtins: rm, stat, fdflags.
At first I was thinking that adding btime support into stat from GNU coreutils
won’t be enough, as since bash 5.0, most users will be working with stat
implemented as a shell builtin,
in a similar way bash users mostly work with bash implementation of time
instead of /bin/time
command line tool:
$ type -a time
time is a shell keyword
time is /usr/bin/time
time is /bin/time
But it turned out that this is not the case, because it’s so called dynamic
loadable
builtin.
The source code of bash implementation of
stat
is located in
examples/loadables/
directory, and builtins placed there (such as cat.c
or sleep.c
) are not
available in binary packages of bash (which you can check for
yourself when you inspect output of enable -a
command). The idea behind this
is that when you need to optimize a shell script, which is eg. using sleep
in a loop, you can compile corresponding builtin (or write your own) and load
it into bash via enable -f
. Personally I would rather use pyton in such
cases, but if you can’t avoid bash (eg. because it’s some big legacy script),
the option is there.
And as I indicated above, bash builtin implementation of stat
doesn’t read
btime values:
$ enable -f ~/projects/bash/examples/loadables/stat stat
$ help stat
stat: stat [-lL] [-A aname] file
Load an associative array with file status information.
Take a filename and load the status information returned by a
stat(2) call on that file into the associative array specified
by the -A option. The default array name is STAT. If the -L
option is supplied, stat does not resolve symbolic links and
reports information about the link itself. The -l option results
in longer-form listings for some of the fields. The exit status is 0
unless the stat fails or assigning the array is unsuccessful.
$ stat ~/tmp/test
$ for i in "${!STAT[@]}"; do echo $i = ${STAT[$i]}; done
nlink = 1
link = /home/martin/tmp/test
perms = 0664
inode = 7377267
blksize = 4096
device = 64775
atime = 1550256766
type = -
blocks = 0
uid = 1000
size = 0
rdev = 0
name = /home/martin/tmp/test
mtime = 1550256766
ctime = 1550256766
gid = 1000
But in this case it seems to me that instead of adding btime support into this
stat
builtin function, it would be better to write another one, which would
be able to fully take advantage of new statx(2)
syscall.
Other tools
Unfortunately btime support in base components of GNU Linux distributions
still doesn’t go beyond statx(2)
wrapper in glibc.
Like above mentioned stat
implementations, no basic tool such as ls
or
find
can work with btime on Linux right now.
As in the case of stat
tool, some of these tools already have some btime
related functionality, e.g.find
can already search files by btime,
it only lacks a way to read btime value on Linux.
On the other hand since btime value is available only via new statx(2)
syscall, changes in these tools may not be always as straightforward as one
could assume at first.
It will also depend on whether it will be possible to change btime via
syscalls such as
utimes(2)
or
utimensat(2)
.
Right now, it’s not possible to change btime of a file. On one hand it makes
sense, birth time doesn’t change later by definition but on the other hand it
also means that we can’t archive file including it’s btime value via cp -a
nor restore it from backup via rsync.
Because of this, implementation of btime handling in GNU tar will likely take
bit longer, as it’s not clear why would anyone implement btime handling there,
when this information can’t be restored on Linux during archive extraction.
At this point it’s worth pointing out that FreeBSD provides a way to change
btime of a file via
utimes(2)
from the beginning, as explained in original
UFS2 design paper.
What does btime actually mean and what it can be used for?
What does a birth time of a file actually mean? It looks like the answer is clear, but it’s not that simple. It’s a low level information about a file. Kernel assigns it when a file is created, and doesn’t provide any way to change it later. That said you will easily lose this information eg. when you copy a file, no matter if you try to preserve file attributes, since from the kernel perspective, it’s a new file and btime can’t be changed. Another common use case when you lose btime is when an application renames a temporary file overwriting existing file to perform atomic write.
Btw it haven’t occurred to me before that such atomic rename operation is done by vim every time you write to a file (note the change of inode and birth time of the file):
$ rm ~/tmp/test
$ touch ~/tmp/test
$ stat.hacked ~/tmp/test
File: /home/martin/tmp/test
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: fd07h/64775d Inode: 7377286 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ martin) Gid: ( 1000/ martin)
Access: 2019-02-17 09:51:45.483720811 +0100
Modify: 2019-02-17 09:51:45.483720811 +0100
Change: 2019-02-17 09:51:45.483720811 +0100
Birth: 2019-02-17 09:51:45.483720811 +0100
$ vim ~/tmp/test
$ stat.hacked ~/tmp/test
File: /home/martin/tmp/test
Size: 5 Blocks: 8 IO Block: 4096 regular file
Device: fd07h/64775d Inode: 7377267 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ martin) Gid: ( 1000/ martin)
Access: 2019-02-17 09:52:17.151767057 +0100
Modify: 2019-02-17 09:52:17.151767057 +0100
Change: 2019-02-17 09:52:17.156767065 +0100
Birth: 2019-02-17 09:52:17.151767057 +0100
And this finally brings us to the question: What is btime actually good for?
Judging from the time it took for complete btime support to reach Linux kernel,
nobody seems to give it a high priority.
This is also apparent from fact that btime related changes often appear in
larger commits whose main purpose is dealing with something else.
For example ext4 introduced btime in a patch implementing nanosecond
timestamps. Similarly XFS added btime as part of adding metadata checksums
and statx(2)
syscall wasn’t designed just to read btime.
And last but not least, during the whole 50 year long unix history, nobody
suggested to have it included in POSIX standard.
When we look at the reasons for btime implementation in Linux, besides sheer
“UFS2/ZFS has it as well” we often see references to Samba and Windows
compatibility.
Unfortunately Samba can’t directly take advantage of Linux btime, since Windows
allows birth time of a file to be changed.
And while NTFS-3G could use Linux
btime to report birth time of NTFS files, nobody will work on this until there
is statx(2)
support available in FUSE and at least all coreutils tools are
be able to work with btime.
Moreover NTFS-3G can already report birth time on Linux via extended
attributes, although using just ls
would certainly be more convenient.
In any case the new timestamp can be used for debugging of a strange behaviour, when every pice of additional evidence is useful no matter if the problem is caused by an attacker, malware or a bug. Besides these “forensic” cases, btime could be used for the opposite purposes as well. For example one could try to hide a small amount of data in the timestamp. Paradoxically the current status quo (btime support implemented only in Linux kernel) is actually good for both cases. Forensic analysis benefits from the fact that a less known timestamp is less likely to be falsified. And the opposite use cases can take advantage of the fact that btime misuse is not directly visible.
References
Related articles:
- File creation times from lwn.net (2010),
- stat (system call) from Wikipedia,
- Comparison of file systems from Wikipedia,
- question How to find creation date of file? from unix.stackexchange.com,
- task_diag and statx() from lwn.net (2016),
- paper Anti-forensics in ext4: On secrecy and usability of timestamp-baseddata hiding,
- paper and talk Forensic Timestamp Analysis of ZFS from BSDCan 2014.
Historical sources:
- Unix History Repository
- Enhancements to the Fast Filesystem To Support Multi-Terabyte Storage Systems: description of UFS2 filesystem design, including it’s birth time implementation
- OpenSolaris project repository