summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2019-09-21 13:37:39 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2019-09-21 13:37:39 -0700
commit70cb0d02b58128db07fc39b5e87a2873e2c16bde (patch)
tree43c0a4eb00f192ceb306b9c52503b2d54bc59660 /Documentation/filesystems
parent104c0d6bc43e10ba84931c45b67e2c76c9c67f68 (diff)
parent040823b5372b445d1d9483811e85a24d71314d33 (diff)
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o: "Added new ext4 debugging ioctls to allow userspace to get information about the state of the extent status cache. Dropped workaround for pre-1970 dates which were encoded incorrectly in pre-4.4 kernels. Since both the kernel correctly generates, and e2fsck detects and fixes this issue for the past four years, it'e time to drop the workaround. (Also, it's not like files with dates in the distant past were all that common in the first place.) A lot of miscellaneous bug fixes and cleanups, including some ext4 Documentation fixes. Also included are two minor bug fixes in fs/unicode" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits) unicode: make array 'token' static const, makes object smaller unicode: Move static keyword to the front of declarations ext4: add missing bigalloc documentation. ext4: fix kernel oops caused by spurious casefold flag ext4: fix integer overflow when calculating commit interval ext4: use percpu_counters for extent_status cache hits/misses ext4: fix potential use after free after remounting with noblock_validity jbd2: add missing tracepoint for reserved handle ext4: fix punch hole for inline_data file systems ext4: rework reserved cluster accounting when invalidating pages ext4: documentation fixes ext4: treat buffers with write errors as containing valid data ext4: fix warning inside ext4_convert_unwritten_extents_endio ext4: set error return correctly when ext4_htree_store_dirent fails ext4: drop legacy pre-1970 encoding workaround ext4: add new ioctl EXT4_IOC_GET_ES_CACHE ext4: add a new ioctl EXT4_IOC_GETSTATE ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE jbd2: flush_descriptor(): Do not decrease buffer head's ref count ext4: remove unnecessary error check ...
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/ext4/bigalloc.rst32
-rw-r--r--Documentation/filesystems/ext4/blockgroup.rst10
-rw-r--r--Documentation/filesystems/ext4/blocks.rst4
-rw-r--r--Documentation/filesystems/ext4/directory.rst2
-rw-r--r--Documentation/filesystems/ext4/group_descr.rst9
-rw-r--r--Documentation/filesystems/ext4/inodes.rst4
-rw-r--r--Documentation/filesystems/ext4/super.rst20
7 files changed, 53 insertions, 28 deletions
diff --git a/Documentation/filesystems/ext4/bigalloc.rst b/Documentation/filesystems/ext4/bigalloc.rst
index c6d88557553c..72075aa608e4 100644
--- a/Documentation/filesystems/ext4/bigalloc.rst
+++ b/Documentation/filesystems/ext4/bigalloc.rst
@@ -9,14 +9,26 @@ ext4 code is not prepared to handle the case where the block size
exceeds the page size. However, for a filesystem of mostly huge files,
it is desirable to be able to allocate disk blocks in units of multiple
blocks to reduce both fragmentation and metadata overhead. The
-`bigalloc <Bigalloc>`__ feature provides exactly this ability. The
-administrator can set a block cluster size at mkfs time (which is stored
-in the s\_log\_cluster\_size field in the superblock); from then on, the
-block bitmaps track clusters, not individual blocks. This means that
-block groups can be several gigabytes in size (instead of just 128MiB);
-however, the minimum allocation unit becomes a cluster, not a block,
-even for directories. TaoBao had a patchset to extend the “use units of
-clusters instead of blocks” to the extent tree, though it is not clear
-where those patches went-- they eventually morphed into “extent tree v2”
-but that code has not landed as of May 2015.
+bigalloc feature provides exactly this ability.
+
+The bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to
+use clustered allocation, so that each bit in the ext4 block allocation
+bitmap addresses a power of two number of blocks. For example, if the
+file system is mainly going to be storing large files in the 4-32
+megabyte range, it might make sense to set a cluster size of 1 megabyte.
+This means that each bit in the block allocation bitmap now addresses
+256 4k blocks. This shrinks the total size of the block allocation
+bitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also
+means that a block group addresses 32 gigabytes instead of 128 megabytes,
+also shrinking the amount of file system overhead for metadata.
+
+The administrator can set a block cluster size at mkfs time (which is
+stored in the s\_log\_cluster\_size field in the superblock); from then
+on, the block bitmaps track clusters, not individual blocks. This means
+that block groups can be several gigabytes in size (instead of just
+128MiB); however, the minimum allocation unit becomes a cluster, not a
+block, even for directories. TaoBao had a patchset to extend the “use
+units of clusters instead of blocks” to the extent tree, though it is
+not clear where those patches went-- they eventually morphed into
+“extent tree v2” but that code has not landed as of May 2015.
diff --git a/Documentation/filesystems/ext4/blockgroup.rst b/Documentation/filesystems/ext4/blockgroup.rst
index baf888e4c06a..3da156633339 100644
--- a/Documentation/filesystems/ext4/blockgroup.rst
+++ b/Documentation/filesystems/ext4/blockgroup.rst
@@ -71,11 +71,11 @@ if the flex\_bg size is 4, then group 0 will contain (in order) the
superblock, group descriptors, data block bitmaps for groups 0-3, inode
bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining
space in group 0 is for file data. The effect of this is to group the
-block metadata close together for faster loading, and to enable large
-files to be continuous on disk. Backup copies of the superblock and
-group descriptors are always at the beginning of block groups, even if
-flex\_bg is enabled. The number of block groups that make up a flex\_bg
-is given by 2 ^ ``sb.s_log_groups_per_flex``.
+block group metadata close together for faster loading, and to enable
+large files to be continuous on disk. Backup copies of the superblock
+and group descriptors are always at the beginning of block groups, even
+if flex\_bg is enabled. The number of block groups that make up a
+flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
Meta Block Groups
-----------------
diff --git a/Documentation/filesystems/ext4/blocks.rst b/Documentation/filesystems/ext4/blocks.rst
index 73d4dc0f7bda..bd722ecd92d6 100644
--- a/Documentation/filesystems/ext4/blocks.rst
+++ b/Documentation/filesystems/ext4/blocks.rst
@@ -10,7 +10,9 @@ block groups. Block size is specified at mkfs time and typically is
4KiB. You may experience mounting problems if block size is greater than
page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory
pages). By default a filesystem can contain 2^32 blocks; if the '64bit'
-feature is enabled, then a filesystem can have 2^64 blocks.
+feature is enabled, then a filesystem can have 2^64 blocks. The location
+of structures is stored in terms of the block number the structure lives
+in and not the absolute offset on disk.
For 32-bit filesystems, limits are as follows:
diff --git a/Documentation/filesystems/ext4/directory.rst b/Documentation/filesystems/ext4/directory.rst
index 614034e24669..073940cc64ed 100644
--- a/Documentation/filesystems/ext4/directory.rst
+++ b/Documentation/filesystems/ext4/directory.rst
@@ -59,7 +59,7 @@ is at most 263 bytes long, though on disk you'll need to reference
- File name.
Since file names cannot be longer than 255 bytes, the new directory
-entry format shortens the rec\_len field and uses the space for a file
+entry format shortens the name\_len field and uses the space for a file
type flag, probably to avoid having to load every inode during directory
tree traversal. This format is ``ext4_dir_entry_2``, which is at most
263 bytes long, though on disk you'll need to reference
diff --git a/Documentation/filesystems/ext4/group_descr.rst b/Documentation/filesystems/ext4/group_descr.rst
index 0f783ed88592..7ba6114e7f5c 100644
--- a/Documentation/filesystems/ext4/group_descr.rst
+++ b/Documentation/filesystems/ext4/group_descr.rst
@@ -99,9 +99,12 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
* - 0x1E
- \_\_le16
- bg\_checksum
- - Group descriptor checksum; crc16(sb\_uuid+group+desc) if the
- RO\_COMPAT\_GDT\_CSUM feature is set, or crc32c(sb\_uuid+group\_desc) &
- 0xFFFF if the RO\_COMPAT\_METADATA\_CSUM feature is set.
+ - Group descriptor checksum; crc16(sb\_uuid+group\_num+bg\_desc) if the
+ RO\_COMPAT\_GDT\_CSUM feature is set, or
+ crc32c(sb\_uuid+group\_num+bg\_desc) & 0xFFFF if the
+ RO\_COMPAT\_METADATA\_CSUM feature is set. The bg\_checksum
+ field in bg\_desc is skipped when calculating crc16 checksum,
+ and set to zero if crc32c checksum is used.
* -
-
-
diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst
index e851e6ca31fa..a65baffb4ebf 100644
--- a/Documentation/filesystems/ext4/inodes.rst
+++ b/Documentation/filesystems/ext4/inodes.rst
@@ -472,8 +472,8 @@ inode, which allows struct ext4\_inode to grow for a new kernel without
having to upgrade all of the on-disk inodes. Access to fields beyond
EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within
``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as
-of October 2013) the inode structure is 156 bytes
-(``i_extra_isize = 28``). The extra space between the end of the inode
+of August 2019) the inode structure is 160 bytes
+(``i_extra_isize = 32``). The extra space between the end of the inode
structure and the end of the inode record can be used to store extended
attributes. Each inode record can be as large as the filesystem block
size, though this is not terribly efficient.
diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
index 6eae92054827..93e55d7c1d40 100644
--- a/Documentation/filesystems/ext4/super.rst
+++ b/Documentation/filesystems/ext4/super.rst
@@ -58,7 +58,7 @@ The ext4 superblock is laid out as follows in
* - 0x1C
- \_\_le32
- s\_log\_cluster\_size
- - Cluster size is (2 ^ s\_log\_cluster\_size) blocks if bigalloc is
+ - Cluster size is 2 ^ (10 + s\_log\_cluster\_size) blocks if bigalloc is
enabled. Otherwise s\_log\_cluster\_size must equal s\_log\_block\_size.
* - 0x20
- \_\_le32
@@ -447,7 +447,7 @@ The ext4 superblock is laid out as follows in
- Upper 8 bits of the s_wtime field.
* - 0x275
- \_\_u8
- - s\_wtime_hi
+ - s\_mtime_hi
- Upper 8 bits of the s_mtime field.
* - 0x276
- \_\_u8
@@ -466,12 +466,20 @@ The ext4 superblock is laid out as follows in
- s\_last_error_time_hi
- Upper 8 bits of the s_last_error_time_hi field.
* - 0x27A
- - \_\_u8[2]
- - s\_pad
+ - \_\_u8
+ - s\_pad[2]
- Zero padding.
* - 0x27C
+ - \_\_le16
+ - s\_encoding
+ - Filename charset encoding.
+ * - 0x27E
+ - \_\_le16
+ - s\_encoding_flags
+ - Filename charset encoding flags.
+ * - 0x280
- \_\_le32
- - s\_reserved[96]
+ - s\_reserved[95]
- Padding to the end of the block.
* - 0x3FC
- \_\_le32
@@ -617,7 +625,7 @@ following:
* - 0x80
- Enable a filesystem size of 2^64 blocks (INCOMPAT\_64BIT).
* - 0x100
- - Multiple mount protection. Not implemented (INCOMPAT\_MMP).
+ - Multiple mount protection (INCOMPAT\_MMP).
* - 0x200
- Flexible block groups. See the earlier discussion of this feature
(INCOMPAT\_FLEX\_BG).