This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: NTFS fragmentation redux


On Sunday 19 November 2006 11:49 pm, Linda Walsh wrote:
> Some time back (~Aug), there was a discussion about NTFS's file
> fragmentation problem.
>
> Some notes at the time:
>
> From:  Vladimir Dergachev
>
> >        I have encountered a rather puzzling fragmentation
> > that occurs when writing files using Cygwin.
>
> ...
>
> >        a small Tcl script that, when run, creates
> > files fragmented into about 300 pieces on my system)
>
>         &&
>
> On 03 August 2006 18:50, Vladimir Dergachev wrote:
> > I guess this means that sequential writes are officially broken on NTFS.
> > Anyone has any idea for a workaround ? It would be nice if a simple
> > tar zcvf a.tgz * does not result in a completely fragmented file.
>
> 	&&
>
> On Aug  3 14:54, Vladimir Dergachev wrote:
> > What I am thinking about is modifying cygwin's open and write calls so
> > that they preallocate files in chunks of 10MB (configurable by an
> > environment variable).
>
> ------------
>
> The "fault" is the behavior of the file system.
> I compared NTFS with ext3 & xfs on linux (jfs & reiser hide how many
> fragments a file is divided into).
>
> NTFS is in the middle as far as fragmentation performance.  My disk
> is usually defragmented, but the built-in Windows defragmenter doesn't
> defragment free space.
>
> I used a file size of 64M and proceeded copying that file to
> a destination file using various utils.
>
> With Xfs (linux), I wasn't able to fragment the target file.  Even
> writing 1K chunks in append mode, the target file always ended up
> in 1 64M fragment.
>
> With Ext3 (also linux), it didn't seem to matter the copy method,
> cp, dd(blocksize 64M), and rsync all produced a target file with
> 2473 fragments.

This is curious - how do you find out fragmentation of ext3 file ? I do not 
know of a utility to tell me that. 

>From indirect observation ext3 does not have fragmentation nearly that bad 
until the filesystem is close to full or I would not be able to reach 
sequential read speeds (the all-seeks speed is about 6 MB/sec for me, I was 
getting 40-50 MB/sec). This was on much larger files though.

Which journal option was the filesystem mounted with ?

>
> NTFS using cygwin, varies the fragment size based on the the tool
> writing the output.
> "cp" produced the most fragments at 515 fragments.
> "rsync" came next with 19 fragments.
> "dd" (using a bs=32M or bs=64M) did best at 1 fragment.
> using "dd" and using a block size of 8k produced the same
> results as "cp".
>
> It appears cygwin does exactly the right thing as far as file
> writes are concerned -- it writes the output using the block size
> specified by the client program you are running.  If you use a
> small block size, NTFS allocates space for each write that you do.
> If you use a big block size, NTFS appears to look for the first
> place that the entire write will fit.  Back in DOS days, the
> built-in COPY command buffered as much data as would fit in
> memory then wrote it out -- meaning it would be like to create
> the output with a minimal number of fragments.
>
> If you want your files to be unfragmented, you need to use a
> file copy (or file write) util that uses a large buffer size --
> one that (if possible), writes the entire file in 1 write.

I actually implemented a workaround that calls "fsutil file createnew 
FILESIZE" to preallocate space and then write data in append mode
(after doing seek 0).

                thank you !

                        Vladimir Dergachev

>
> In the "tar zcvf a.tgz *" case, I'd suggest piping the output of
> tar into "dd" and use a large blocksize.
>
> Linda



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]