Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!dimacs.rutgers.edu!seismo!uunet!spool.mu.edu!think.com!mintaka!bloom-picayune.mit.edu!athena.mit.edu!captkidd From: captkidd@athena.mit.edu (Ivan Cavero Belaunde) Newsgroups: comp.sys.mac.programmer Subject: HFS, extent, B*-trees, oh joy... Message-ID: <1991Feb13.210702.9407@athena.mit.edu> Date: 13 Feb 91 21:07:02 GMT Sender: news@athena.mit.edu (News system) Distribution: usa Organization: Massachusetts Institute of Technology Lines: 96 I'm trying to write a routine that takes a file identifier (filename and WDRefNum or filename, dirID and vRefNum) and returns a pointer to a block containing a list of the block numbers on that volume occupied by a file. Although I understand the basic procedure pretty clearly, there are a couple of stumbling blocks that are driving me crazy. Basically a file consists of one or more extents (chunks of contiguous blocks in the volume). Information about the extents is stored in a B*-tree, on which each node has the following structure: struct { NodePtr ndFLink; // Ptr to next leaf node NodePtr ndBLink; // Ptr to prev leaf node char ndType; // Node type char ndLevel; // Node level short ndNRecs; // # of records in node < record 0 > < record 1 > .... < record n > < free space > < offset to free space > < offset to record n > .... < offset to record 1 > < offset to record 0 > } NodeRec, *NodeRec; This structure doesn't seem to be very well defined, however. IM says that all nodes (no matter what kind) are a fixed size, and thus one can access the fields at the end which point to the records (which can be of varying size). I haven't been able to find this fixed size documented anywhere, however (anyone know?). (It's not absolutely necessary, since I know the size records I'm dealing with, but it'd be better). The structure of the records is not clearly documented either, but it would seem to be: struct { unsigned char xkrKeyLen; // Key length char xkrFkType; // Fork type long xkrFNum; // File number short xkrFABN; // Allocation block # within file union { ExtentDesc extents[3]; // Extent descriptors (3 of them) // (if leaf node) long dnNode; // Ptr to node in lowever level // (if index node) } } ExtentRecord; struct { short xtFirstBlk; // # of extent's 1st allocation block short xtNumBlks; // Number of blocks in extent } ExtentDesc; The extents B*-tree is not in RAM, however. It's in a file (second FCB in the FCB buffer). So my question is: what's the format of this file (so I can actually navigate the B*-tree)? Is there a header before the actual data comes? And do the node pointers effectively translate to marks within the extent file? Now it would seem that what I need to do is: 1) Open the file and get a refNum for it. 2) Get the info from the FCB (file number) 3) Get info from the corresponding VCB (vcbAlBlSt for calculating actual block #s, vcbXTRef [the extent file might not always be the second FCB in the FCB buffer]) 4) Navigate the extent tree, searching for the first extent record, then using the data to calculate the search key for the next extent record, and so on, building the list of block #s as we go. Now, has anyone out there done this before? Are there any hairy issues as far as moving the mark for the extent file FCB behind the file manager's back? What about the extent cache, and is there a way to flush it (does it need to be)? And finally, are my data structure definitions correct? I can't seem to find formal definitions *anywhere*, and I just deduced those from the descriptions in IM4. Or maybe there's a really obscure call somewhere that will get me the info I need :-) Doubt it, tho'. I'd really appreciate any insights anyone has on this topic. Thanx y'all. -Ivan Cavero Belaunde Internet: captkidd@ATHENA.MIT.EDU "I suppose you think that was funny." "I don't know. I'll have to consult our humor officer. Mr. Spock, was that funny?" "I shall have to analyze it, sir. It may take time." -From Peter David's "Star Trek" comic