[ Pobierz całość w formacie PDF ]
.Any block bu er that has been used to readdata from a block device or to write data to it goes into the bu er cache.Over timeit may be removed from the cache to make way for a more deserving bu er or it mayremain in the cache as it is frequently accessed.Block bu ers within the cache are uniquely ident ed by the owning device identi erand the block number of the bu er.The bu er cache is composed of two functionalparts.The rst part is the lists of free block bu ers.There is one list per supportedbu er size and the system's free block bu ers are queued onto these lists when theyare rst created or when they have been discarded.The currently supported bu ersizes are 512, 1024, 2048, 4096 and 8192 bytes.The second functional part is thecache itself.This is a hash table which is a vector of pointers to chains of bu ersthat have the same hash index.The hash index is generated from the owning deviceidenti er and the block number of the data block.Figure 9.7 shows the hash tabletogether with a few entries.Block bu ers are either in one of the free lists or theyare in the bu er cache.When they are in the bu er cache they are also queued ontoLeast Recently Used LRU lists.There is an LRU list for each bu er type and theseare used by the system to perform work on bu ers of a type, for example, writingbu ers with new data in them out to disk.The bu er's type re ects its state andLinux currently supports the following types:clean Unused, new bu ers,locked Bu ers that are locked, waiting to be written,dirty Dirty bu ers.These contain new, valid data, and will be written but so farhave not been scheduled to write,shared Shared bu ers,unshared Bu ers that were once shared but which are now not shared,Whenever a le system needs to read a bu er from its underlying physical device, ittrys to get a block from the bu er cache.If it cannot get a bu er from the bu ercache, then it will get a clean one from the appropriate sized free list and this newbu er will go into the bu er cache.If the bu er that it needed is in the bu er cache,then it may or may not be up to date.If it is not up to date or if it is a new blockbu er, the le system must request that the device driver read the appropriate blockof data from the disk.Like all caches, the bu er cache must be maintained so that it runs e ciently andfairly allocates cache entries between the block devices using the bu er cache.Linuxuses the bdflush kernel daemon to perform a lot of housekeeping duties on thecache but some happen automatically as a result of the cache being used.9.3.1 The bdflush Kernel DaemonSee bdflushin fs buffer.cThe bdflush kernel daemon is a simple kernel daemon that provides a dynamicresponse to the system having too many dirty bu ers; bu ers that contain data thatmust be written out to disk at some time.It is started as a kernel thread at systemstartup time and, rather confusingly, it calls itself k ushd" and that is the namethat you will see if you use the ps command to show the processes in the system.Mostly this daemon sleeps waiting for the number of dirty bu ers in the system togrow too large.As bu ers are allocated and discarded the number of dirty bu ers inthe system is checked.If there are too many as a percentage of the total number ofbu ers in the system then bdflush is woken up.The default threshold is 60 but,if the system is desperate for bu ers, bdflush will be woken up anyway.This valuecan be seen and changed using the update command:update -dp ybdflush version 1.40: 60 Max fraction of LRU list to examine for dirty blocks1: 500 Max number of dirty blocks to write each time bdflush activated2: 64 Num of clean buffers to be loaded onto free list by refill_freelist3: 256 Dirty block threshold for activating bdflush in refill_freelist4: 15 Percentage of cache to scan for free clusters5: 3000 Time for data buffers to age before flushing6: 500 Time for non-data dir, bitmap, etc buffers to age before flushing7: 1884 Time buffer cache load average constant8: 2 LAV ratio used to determine threshold for buffer fratricide.All of the dirty bu ers are linked into the BUF DIRTY LRU list whenever they aremade dirty by having data written to them and bdflush tries to write a reasonablenumber of them out to their owning disks.Again this number can be seen andcontrolled by the update command and the default is 500 see above.9.3.2 The up date ProcessThe update command is more than just a command; it is also a daemon.When runas superuser during system initialisation it will periodically ush all of the olderSeesys bdflush in dirty bu ers out to disk.It does this by calling a system service routine that doesfs buffer.cmore or less the same thing as bdflush.Whenever a dirty bu er is nished with,it is tagged with the system time that it should be written out to its owning disk.Every time that update runs it looks at all of the dirty bu ers in the system lookingfor ones with an expired ush time.Every expired bu er is written out to disk.9.4 The proc File SystemThe proc le system really shows the power of the Linux Virtual File System.Itdoes not really exist yet another of Linux's conjuring tricks , neither the procdirectory nor its subdirectories and its les actually exist.So how can you catproc devices? The proc le system, like a real le system, registers itself with theVirtual File System.However, when the VFS makes calls to it requesting inodes asits les and directories are opened, the proc le system creates those les and direc-tories from information within the kernel.For example, the kernel's proc devicesle is generated from the kernel's data structures describing its devices.The proc le system presents a user readable windowinto the kernel's inner work-ings.Several Linux subsystems, such as Linux kernel modules described in chap-ter 12, create entries in the the proc le system.9.5 Device Special FilesLinux, like all versions of UnixTM presents its hardware devices as special les.So,for example, dev null is the null device.A device le does not use any data space inthe le system, it is only an access point to the device driver.The EXT2 le systemand the Linux VFS both implement device les as special types of inode.There aretwo types of device le; character and block special les.Within the kernel itself,pthe device drivers implement le semantices: you can open them, close them andso on.Character devices allow I O operations in character mode and block devicesrequire that all I O is via the bu er cache.When an I O request is made to a devicele, it is forwarded to the appropriate device driver within the system.Often thisis not a real device driver but a pseudo-device driver for some subsystem such asthe SCSI device driver layer.Device les are referenced by a major number, whichidenti es the device type, and a minor type, which identi es the unit, or instanceof that major type.For example, the IDE disks on the rst IDE controller in theseesystem have a major number of 3 and the rst partition of an IDE disk would haveinclude linuxmajor.h for all ofa minor number of 1.So, ls -l of dev hda1 gives:Linux's ma jordevice numbers.$ brw-rw---- 1 root disk 3, 1 Nov 24 15: 09 dev hda1Within the kernel, every device is uniquely described by a kdev t data type, this istwo bytes long, the rst byte containing the minor device number and the secondSee include -byte holding the major device number.The IDE device above is held within thelinux kdev t.hkernel as 0x0301.An EXT2 inode that represents a block or character device keepsthe device's major and minor numbers in its rst direct block pointer
[ Pobierz całość w formacie PDF ]