Jump to content

fwrite slow for big files?


Josh
 Share

Recommended Posts

fwrite can correctly write very large files. I wrote a single 126 GB file with it. However, after a the first few seconds the calls to fssek start "hanging". Writing 1000 integers went from taking 40 milliseconds to 16 seconds. I'm just writing one integer at a time.

I can't find any mention of this behavior online anywhere. Is this a know thing?

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

When dealing with big files (over 10GB), you must use fseeko, ftello, etc.
https://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_lib_ref/f/fseek.html
 

#define _FILE_OFFSET_BITS 64    // this is needed to tell C that we wanna go 64 bit
#include <cstdio>
int main(int argc, char **argv)
{
	unsigned long long filesize=0;  // this suits vast numbers, upto 18,446,744,073,709,551,615
	FILE *fi;
	fi=fopen(argv[1],"rb");
	fseeko(fi,0,SEEK_END);      // fseek does not work with ten gig files, so let's use fseeko
	filesize=ftello(fi);        // same thing with ftello
	fclose(fi);
    printf("FileSize=%llu\n",filesize);
}

 

Ryzen 9 RX 6800M ■ 16GB XF8 Windows 11 ■
Ultra ■ LE 2.53DWS 5.6  Reaper ■ C/C++ C# ■ Fortran 2008 ■ Story ■
■ Homepage: https://canardia.com ■

Link to comment
Share on other sites

Okay, this simple test does not produce any strange results. Maybe this is happening because I am reading from one big file and writing to another big file, on a mechanical drive:

#include "UltraEngine.h"

using namespace UltraEngine;

#define STEP 5

int main(int argc, const char* argv[])
{
    String path = "D:/test.tmp";
    shared_ptr<Stream> stream;
    uint64_t tm;
    int count = 0;

    Print("----------\nWrite test\n----------");
    stream = WriteFile(path);
    tm = Millisecs();
    count = 0;
    while (true)
    {
        stream->WriteDouble(0);
        count++;
        if (count == 100000)
        {
            stream->Flush();
            count = 0;
            auto now = Millisecs();
            Print("W: " + String(now - tm));
            tm = now;
        }
        if (stream->GetSize() > 1024ull * 1024ull * 1024ull * 6ull) break;// stop after 6.0 GB
    }
    stream->Close();

    Print("----------\nRead test\n----------");
    stream = ReadFile(path);
    tm = Millisecs();
    count = 0;
    while (not stream->Eof())
    {
        stream->ReadDouble();
        count++;
        if (count == 100000)
        {
            stream->Flush();
            count = 0;
            auto now = Millisecs();
            Print("R: " + String(now - tm));
            tm = now;
        }
    }
    stream->Close();

    return 0;
}

 

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Yeah, if I randomize the read position the delays go from 40 milliseconds to 4 seconds. The lesson here is that big data processing can only be done on an SSD, not because of the IO speed necessarily but because of the seek speed.

  • Like 3

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

In case anyone is curious, I am trying to process this data for lunar terrain mapping:
https://imbrium.mit.edu/DATA/

Here's image data for the moon:
https://imbrium.mit.edu/DATA/LOLA_GDR/CYLINDRICAL/

And here's the raw laser altimeter data:
https://imbrium.mit.edu/DATA/LOLA_RDR/

  • Like 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

I am working with a 128 GB USB drive now. These sure have come down in price!

Strange results when I try resizing a 109440 x 36482 image using StreamBuffers.

I am printing the time elapsed for every 1000 pixels that get processed. The routine is doing reads from one file and writes to another. CPU usage dropped to zero and it just hung. The longest pause was one full minute! After that it started going fast again, and keeps buzzing along happily at 1000 pixels every 30 milliseconds:

Resizing albedo...
30
38
30
30
30
31
161
65629
1355
1363
1400
1367
1409
1385
9208
16202
1355
634
5705
11285
8836
3867
562
5278
8825
8839
7043
5396
8838
11279
8636
1921
489
63
63
62
64
72
82
2673
293
340
327
322
2774
360
328
357
356
2758
314
306
332
348
2748
309
322
297
333
2785
316
313
308
304
2734
204
29
28
30
29
28
29
29
28
29
31
28
29
29
29
28
29
31
29
29
...

 

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

I left this running for several hours and it only got through about 5000 out of 98,000 columns...which means it would take about three days to process! Not good.

I tried enabling write caching on my HDD, but it actually runs at the same speed as the USB drive (which Windows says does not allow write caching), after some long initial pauses.

I think reading and writing to disk at the same time is probably just a really bad idea, even if they are two different disks. The reason I am doing this is because the image size is irregular and I want to resize the whole thing to power-of-two, which it is very close to. I can't really split the image up into tiles at this resolution because it doesn't divide evenly. Well, maybe this one could be split into 12 x 4 tiles, but other images might not work so well, and it adds another layer of confusion.

In order to maintain accuracy I think I will need to implement a Pixmap::Blit method that can optionally accept floating point coordinates that act like texture coordinates. We already have Pixmap::Sample and that will help a lot.

That way I can create small tile images in system memory, one at a time, and blit the big pixmap stored in virtual memory onto the tiles, without creating any distortion in the image data. When you finish each tile you save it to a file and then move on to the next area, but you aren't constantly switching between read and write to process each pixel.

I'm going to write a blog about this but I want to keep my notes here so I can go back and view it later.

  • Like 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...