Copy File C++ Fstream Assignment

Over ten years ago I first wrote an article about reading an entire file into memory in C++. It has become my most copied work (always without attribution), and every time I’ve set up a new site or blog, I’ve included an updated version – which again usually ranks among the most popular pages or posts there. So, it’s time again for the 2014 version of How to read an entire file into memory in C++.

One of the reasons I’ve been repeating this topic, and presumably why it’s been so popular, is that pretty much every bit of code you see out there showing how to do it is wrong. Either it’s wrong because it is horribly and unnecessarily inefficient, or simply because it doesn’t work. By that I mean it provokes undefined behaviour, though obviously it seems to work on the platforms people try it on. So before I get into the “right” answers, let me show some of the wrong answers, and explain why they’re wrong.

Bad idea #1: stream iterators

I can almost guarantee that whenever you ask for advice on how to read an entire file into memory, you will see the following code as a suggested solution (assuming is a file stream opened in input mode):

// Bad code: slow auto s = std::string{}; std::copy(std::istreambuf_iterator<char>{in}, std::istreambuf_iterator<char>{}, std::back_inserter(s));

Occasionally, though sadly all too rarely, you might see this slightly better version:

// Bad code: slow auto s = std::string{ std::istreambuf_iterator<char>{in}, std::istreambuf_iterator<char>{}};

Or equivalently:

// Bad code: slow auto s = std::string{}; s.assign(std::istreambuf_iterator<char>{in}, std::istreambuf_iterator<char>{});

But even then, this is a terrible idea. Oh, it’s elegant, I admit – even beautiful – and the pattern is useful (as I’ll explain in a moment). But this is the code equivalent of transferring a bag of rice to a pot one grain at a time. It is excruciatingly slow. This is just about the slowest way you possibly can read a file without actively going out of your way to sabotage the effort. (Actually, it turns out that there might be less efficient options for extremely large files. More on that shortly.)

There is a place for this kind of pattern, and that is when you actually do want to do some kind of processing on each element. For example, if you want to read in only the punctuation:

auto s = std::string{}; std::copy_if(std::istreambuf_iterator<char>{in}, std::istreambuf_iterator<char>{}, std::back_inserter(s), [&loc](auto c) { return std::ispunct(c, loc); });

Or, an even more useful paradigm is when the “processing” you want to do is formatting – in which case you’d use :

auto s = std::vector<double>{ std::istream_iterator<double>{in}, std::istream_iterator<double>{}};

Bad idea #2: seeking to the end to get the size

So reading in a character at a time is a dumb idea. The obvious ideal, then is to read the whole file in, in one big wallop of a read. And that’s the logic behind the other suggestion you’re almost guaranteed to see if you ask for advice on how to read an entire file into memory:

// Bad code; undefined behaviour in.seekg(0, std::ios_base::end); s.resize(in.tellg()); in.seekg(0); in.read(&s[0], s.size());

This idea is actually even worse than the previous one; while the previous idea was slow, at least it didn’t introduce undefined behaviour.

I’ll explain why this is undefined behaviour, but you’ll have to pardon me while I take a brief excursion into C.

Suppose you were writing C, and you had a (that you know points to a file stream, or at least a seekable stream), and you wanted to determine how many characters to allocate in a buffer to store the entire contents of the stream. Your first instinct would probably be to write code like this:

// Bad code; undefined behaviour fseek(p_file, 0, SEEK_END); long file_size = ftell(p_file);

Seems legit. But then you start getting weirdness. Sometimes the reported size is bigger than the actual file size on disk. Sometimes it’s the same as the actual file size, but the number of characters you read in is different. What the hell is going on?

There are two answers, because it depends on whether the file has been opened in text mode or binary mode.

Just in case you donlt know the difference: in the default mode – text mode – on certain platforms, certain characters get translated in various ways during reading. The most well-known is that on Windows, newlines get translated to “” when written to a file, and translated the other way when read. In other words, if the file contains “Hello\r\nWorld”, it will be read as “Hello\nWorld”; the file size is 12 characters, the string size is 11. Less well-known is that 0x1A (or Ctrl-Z) is interpreted as the end of the file, so if the file contains “Hello\x1AWorld”, it will be read as “Hello”. Also, if the string in memory is “Hello\x1AWorld” and you write it to a file in text mode, the file will be “Hello”. In binary mode, no translations are done – whatever is in the file gets read in to your program, and vice versa.

Immediately you can guess that text mode is going to be a headache – on Windows, at least. More generally, according to the C standard:

The function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. , usable by the function for returning the file position indicator for the stream to its position at the time of the call; .

In other words, when you’re dealing with a file opened in text mode, the value that returns is useless… except in calls to . In particular, it doesn’t necessarily tell you how many characters are in the stream up to the current point.

So you can’t use the return value from to tell you the size of the file, the number of characters in the file, or for anything (except in a later call to ). So you can’t get the file size that way.

Okay, so to hell with text mode. What say we work in binary mode only? As the C standard says: That sounds promising.

And, indeed, it is. If you are at the end of the file, and you call , you will find the number of bytes in the file. Huzzah! Success! All we need to do now is get to the end of the file. And to do that, all you need to do is with , right?

Wrong.

Once again, from the C standard:

or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.

To understand why this is the case: Some platforms store files as fixed-size records. If the file is shorter than the record size, the rest of the block is padded. When you seek to the “end”, for efficiency’s sake it just jumps you right to the end of the last block… possibly long after the actual end of the data, after a bunch of padding.

So, here’s the situation in C:

  • You can’t get the number of characters with in text mode.
  • You can get the number of characters with in binary mode… but you can’t seek to the end of the file with .

Which basically means you’re totally boned in C… but what about C++? Do the C++ file streams – and its relatives – suffer the same limitations?

Well, they all use , and according to the C++ standard:

The restrictions on reading and writing a sequence controlled by an object of class are the same as for reading and writing with the Standard C library s.

Presumably the intention is that the C++ file I/O could be built on top of the C file I/O, though of course that isn’t required. That certainly implies that any seeking and telling in a C++ file stream is expected to have the same restrictions as seeking and telling in a C file stream. I may be wrong, but I’d like to see the language in the standard that says so.

So all that means that:

// Bad code; undefined behaviour in.seekg(0, std::ios_base::end); s.resize(in.tellg()); in.seekg(0); in.read(&s[0], s.size());

does not work either for text or binary mode file streams.

The solution (general)

This might shock you, but if you want to read a file into a string, by far the best method… is also arguably the simplest. It shouldn’t shock you. It’s just C++ logic that the simplest method should be the best. So, here it is:

auto ss = std::ostringstream{}; ss << in.rdbuf(); auto s = ss.str();

You can do it in a single line, if you want:

auto s = static_cast<std::ostringstream&>( std::ostringstream{} << in.rdbuf()).str();

(The cast is unfortunately necessary because the insertion operator returns a , not a . is okay because we obviously know the cast is sound.)

You could also use the lambda enclosing technique I mentioned in a previous post:

auto s = [&in]{ std::ostringstream ss{}; ss << in.rdbuf(); return ss.str(); }();

The reason this code is the “right” solution is because stream-to-stream I/O like that is almost guaranteed to be highly optimized by your standard library. (Partly because it’s so easy to do, yet it gives dramatic performance payoffs.) Naturally, the method of is also going to be highly optimized. You’re working along with your standard library developers here, and it pays.

This is the answer to the problem of reading an entire file into memory. Even if you don’t want the results in a string – let’s say you want them in a – this is still the best solution: just do this, then move the string wholesale right into your vector (which should be pretty darn fast, since you’re going from one contiguous array of trivially copyable items to another). It does waste memory temporarily, but it’s so freaking fast that it’s still worth it.

Is this a perfect solution? Well, no. In certain edge cases, there may be better options. The one wrinkle in this method is that there is no way to move the data in the into a … you have to copy it, which means that if you’ve just loaded a 10 MiB file into memory, you’re going to have (for a short time) two copies of that file in memory. That’s not really a problem for files smaller than a couple hundred kilobytes. But if you want to read huge files into memory – for whatever reason – read on.

The solution (really large files)

Let’s go back to the idea of getting the number of characters in the file, then allocating the whole thing and reading it all in one big wallop. The reason it didn’t work was because the idea of using the seek/tell functions to… well… seek and tell… was flawed. Happily, there is another way to do it portably.

The key is the function. Ignore the “ignoring” part for a moment, and focus on the fact that it has to read the characters to know how many to ignore. Assuming your stream is buffered (and I seriously hope it would be, if you’re doing this kind of thing), that means “ignoring” everything until the end of the file is basically simply a matter of repeatedly incrementing counters and pointers and testing for EOF and the end of the buffer, with the occasional (which implies that if you use a huge buffer, it will be even faster). All that incrementing and comparing is not ideal, but it’s still pretty damn fast, and it’s about as close as we can practically hope to get to ideal using file streams.

So ignoring is a fast way to skip to the end of a file, but now how to do we get the number of characters that were ignored? Well, that’s easy. There’s a function for that: . It tells you the number of characters read in the last read operation… and is a read operation.

Now we have almost all the pieces of the puzzle. What we want to do is:

  1. Save the current position with . The value is meaningless as a character count, remember, but it can be used in later.
  2. Use with a character count of . The second argument of is already what we want.
  3. Get the number of characters with .
  4. Restore the stream to the start position with . This will also clear the EOF flag.
  5. Create our string (or vector!) with the right number of characters allocated (using the earlier result of ).
  6. Do that one, big, monster read with .

In code, that translates to:

auto const start_pos = in.tellg(); in.ignore(std::numeric_limits<std::streamsize>::max()); auto const char_count = in.gcount(); in.seekg(start_pos); auto s = std::string(char_count, char{}); // or std::vector<char>(char_count); // or you can use unsigned char or signed char in.read(&s[0], s.size());

This is something that should really be encapsulated in its own function, possibly with a little bit of error checking. In addition, it would be nice if it could be templated to support not only strings but also vectors of the character type, signed or unsigned. That might look something like this:

template <typename Container = std::string, typename CharT = char, typename Traits = std::char_traits<char>> auto read_file_into_memory( std::basic_ifstream<CharT, Traits>& in, typename Container::allocator_type alloc = {}) { // With an is_contiguous traits type, this can be // generalized to *any* container, and much easier. // You could do this with enable-if, too, to // to completely remove this function from the // overload set if the container type is wrong... // but i think static assert is more appropriate in // this context, and it will give more readable // errors. static_assert( // Allow only strings... std::is_same<Container, std::basic_string<CharT, Traits, typename Container::allocator_type>>::value || // ... and vectors of the plain, signed, and // unsigned flavours of CharT. std::is_same<Container, std::vector<CharT, typename Container::allocator_type>>::value || std::is_same<Container, std::vector< std::make_unsigned_t<CharT>, typename Container::allocator_type>>::value || std::is_same<Container, std::vector< std::make_signed_t<CharT>, typename Container::allocator_type>>::value, "only strings and vectors of ((un)signed) CharT allowed"); // You could also add other static assertions, like // confirming that the char type is trivially // copyable. auto const start_pos = in.tellg(); in.ignore( std::numeric_limits<std::streamsize>::max()); auto const char_count = in.gcount(); in.seekg(start_pos); auto container = Container(std::move(alloc)); container.resize(char_count); if (0 != container.size()) { // reinterpret_cast is necessary if we want to // allow vector<char>, vector<unsigned char> (and // vector<signed char>, i guess). It's safe because // the enable-if guarantees that we're just dealing // with signed/unsigned variants. // Though if you're paranoid, you can put some // static asserts in to confirm this. in.read(reinterpret_cast<CharT*>(&container[0]), container.size()); } return container; }

This solution is slower than just using the stream buffer insertion operator for the kinds of files you normally want to completely read into memory – less than a couple hundred kiB or so – and a hell of a lot more work, which is why I recommend the stream buffer insertion operator by default. However it has the benefit of making sure that huge files are only loaded to one place in memory – not two – and it is a lot faster for huge files (a megabyte or more).

On the downside, this method requires reading through the entire file twice, and being able to seek back to the beginning. If you are dealing with a system with very slow I/O, or cases where you can’t seek around in the stream (like network streams), this solution won’t work for you.

The solution (worst case scenario)

The first solution you should reach for is the stream buffer insertion solution. If you are dealing with really large files – at least a megabyte – you might consider using the “get character count then do one big read” method. If you are dealing with both really large files (or your memory constraints are so tight that you cannot afford to have the file contents duplicated in memory even for a moment) and you cannot seek in the stream… well, then you have to do things the hard way.

“The hard way” means you have to write a loop that reads in chunks until EOF. Choosing the right chunk size to get a good balance of performance with as little wasted memory as possible is tricky – it depends on your platform and situation. Ideally you want the chunk size to be the same as the stream’s buffer size; I’m just going to use as an estimate.

In previous incarnations of this article, I used a vector of vectors for this. Each loop iteration I would try to fill up an entire chunk-sized vector with a single read, and then add it to the vector of vectors (usually the last vector would likely not be a full chunk). If you’re wondering why I didn’t just use a vector and let it grow… well, that requires a brief sidetrack into how vectors grow.

When a vector needs to grow, it allocates a new chunk of memory big enough to fit the new size (and then some, usually), then copies all of the data from the old memory to the new, then deletes the old memory. That’s good, normally, because it ensures the data is contiguous in memory, which makes for super fast processing. But when you’re dealing with huge chunks of memory… well… consider this: Normally when a vector has to grow, it doubles its size. So imagine you have a mebibyte (220, or 1,048,576) of data already in the vector and you want to add another 10 kiB chunk. The vector will allocate a new contiguous memory block of 2 MiB – 2,097,152 bytes, so the vector is currently holding on to 31,45,728 bytes – then it has to copy 1 MiB of data from the old memory to the new memory. You’re fine now for several more chunks, but if the file is bigger than 2 MiB, the vector will have to reallocate again. That means it will have to allocate 4,194,304 contiguous bytes – while already holding on to 2,097,152 contiguous bytes, so it owns 6,291,456 bytes total in two huge blocks – then copy 2,097,152 bytes from the old block to the new one.

You can see how this will quickly become a problem. When the vector already holds 268,435,456 bytes (256 MiB, in a contiguous block of memory) and it has to allocate another (10 kiB) chunk, it has to allocate an additional 536,870,912 byte (512 MiB) contiguous block of memory then copy 268,435,456 bytes. Allocation is slow in general, but when you have to find a huge amount of memory, and it all has to be a contiguous block, it can take a long time, probably triggering paging or shuffling around of virtual memory. Put another way, it is probably easier and faster to allocate 100 × 100 kiB blocks than it is to allocate a single 10 MiB block.

So that is why I didn’t use a single vector, but rather a vector of smaller blocks. That way there were many more frequent allocations, but those allocations were small and easily accommodated – and no copying at all was necessary (the data in existing blocks was just left as is, no need to copy it into a newly allocated block).

Some time later I realized I’d had a bit of a brain fart. I’d forgotten about that rarely used container in the standard library: the deque.

Few C++ programmers seem to know about the deque. Fewer still understand it. They may get that it enables fast insert and delete at both ends, but… what does that really mean? How is it implemented. Well, here is a quick description of the most natural implementation of a . Look familiar?

That’s right, in previous versions of this article, I completely forgot about deque and reimplemented it.

Well, let’s not make that mistake again. Without the necessity of having to implement my own container, this turns out to be a breeze. What we want to do is set up a loop that reads chunks, then stores them in a deque. That’s as simple as something like this:

template <typename CharT, typename Traits, typename Allocator = std::allocator<CharT>> auto read_file_into_memory( std::basic_ifstream<CharT, Traits>& in, Allocator alloc = {}) { using std::begin; using std::end; auto const chunk_size = std::size_t{BUFSIZ}; auto container = std::deque<CharT, Allocator>( std::move(alloc)); auto chunk = std::array<char, chunk_size>{}; while ( in.read(chunk.data(), chunk.size()) || in.gcount()) container.insert(end(container), begin(chunk), begin(chunk) + in.gcount()); return container; }

After you’ve read all your file data in, you’ve got it in a deque, which may bother some people – a vector would be better, or a string, maybe. Well I did some checking. The additional overhead of growing contiguous memory chunks (as in vectors and strings) is so freaking high, that it is faster to read the whole file into a deque then copy it all into a string or vector than it is to read it into a string or vector directly. A lot faster, and the point where it really starts to matter is incredibly small – smaller than 100 kiB.

In other words, if you really want it in a vector or string, it’s faster to read it into a deque then copy it. In case you’re balking at the extra space cost of having two copies in memory… that was basically going to happen anyway as the vector/string resized. So even if you want your data in a vector or string, read it into a deque first, then copy it over.

But, really, why not just leave it in the deque? It’s random-access, its performance is close to vector/string, and the fact that its made up of several chunks means its easier on memory than one huge contiguous chunk (the operating system’s memory manager can more easily shuffle the chunks around if it needs to make more space).

Summary

Reading data from a stream – usually a file stream, but the same logic applies to any stream – completely into a single container is not that hard, but there is a lot of misinformation out there about how to do it.

If you’re just reading the character data as is, with no transformation (such as formatting), don’t use stream iterators (or streambuf iterators), and don’t copy character by character. And don’t use the stream’s position to count characters – it doesn’t work.

The most basic and general way to fully read a stream into a string is to use the stream buffer insertion operator overload:

template <typename Char, typename Traits, typename Allocator = std::allocator<CharT>> auto read_stream_into_string( std::basic_istream<CharT, Traits>& in, Allocator alloc = {}) { std::basic_ostringstream<CharT, Traits, Allocator> ss(std::basic_string<CharT, Traits, Allocator>( std::move(alloc))); if (!(ss << in.rdbuf()); throw std::ios_base::failure{"error"}; return ss.str(); }

This is stupendously fast, even for large files (even up to 100 MiB), and it works for all streams. It doesn’t do any seeking on the stream, which is important for some stream types, like network streams. Unless you are dealing with enormous files, this should be your solution of choice.

If you are dealing with enormous files, it can be faster to count all the characters first, then do one big allocation and one big whopper of a read:

template <typename Container = std::string, typename CharT = char, typename Traits = std::char_traits<char>> auto read_stream_into_container( std::basic_istream<CharT, Traits>& in, typename Container::allocator_type alloc = {}) { static_assert( // Allow only strings... std::is_same<Container, std::basic_string<CharT, Traits, typename Container::allocator_type>>::value || // ... and vectors of the plain, signed, and // unsigned flavours of CharT. std::is_same<Container, std::vector<CharT, typename Container::allocator_type>>::value || std::is_same<Container, std::vector< std::make_unsigned_t<CharT>, typename Container::allocator_type>>::value || std::is_same<Container, std::vector< std::make_signed_t<CharT>, typename Container::allocator_type>>::value, "only strings and vectors of ((un)signed) CharT allowed"); auto const start_pos = in.tellg(); if (std::streamsize(-1) == start_pos) throw std::ios_base::failure{"error"}; if (!in.ignore( std::numeric_limits<std::streamsize>::max())) throw std::ios_base::failure{"error"}; auto const char_count = in.gcount(); if (!in.seekg(start_pos)) throw std::ios_base::failure{"error"}; auto container = Container(std::move(alloc)); container.resize(char_count); if (0 != container.size()) { if (!in.read(reinterpret_cast<CharT*>(&container[0]), container.size()) throw std::ios_base::failure{"error"}; } return container; }

Finally, if you are dealing enormous files, and want to be able to support streams that cannot seek, read in chunks at a time into a deque:

template <typename CharT, typename Traits, typename CharO = CharT, typename Allocator = std::allocator<CharO>> auto read_file_into_deque( std::basic_istream<CharT, Traits>& in, Allocator alloc = {}) { static_assert( std::is_same<CharT, CharO>::value || std::is_same<std::make_unsigned_t<CharT>, CharO>::value || std::is_same<std::make_signed_t<CharT>, CharO>::value, "char type of deque must be same " "as stream char type " "(possibly signed or unsigned)"); using std::begin; using std::end; auto const chunk_size = std::size_t{BUFSIZ}; auto container = std::deque<CharO, Allocator>( std::move(alloc)); auto chunk = std::array<CharO, chunk_size>{}; while ( in.read( reinterpret_cast<CharT*>(chunk.data()), chunk.size()) || in.gcount()) container.insert(end(container), begin(chunk), begin(chunk) + in.gcount()); return container; }

That should just about cover all cases of reading a file into memory.

In summary:

  1. By default, use the stream buffer insertion operator overload method.
  2. If you’re expecting large files (at least a megabyte minimum on average) and you can seek on the stream, use the “get character count then allocate and read” method.
  3. If you’re expecting enormous files (at least several hundreds of megabytes, on average) and you don’t want to seek on stream, read the file in chunks into a deque.

If you have any other suggestions, particularly for good UI for any of these 3 functions, let me know.


How to read an entire file into memory in C++ by Explicit C++ is licensed under a Creative Commons Attribution 4.0 International License.

Input/Output with files

C++ provides the following classes to perform output and input of characters to/from files:

  • ofstream: Stream class to write on files
  • ifstream: Stream class to read from files
  • fstream: Stream class to both read and write from/to files.

These classes are derived directly or indirectly from the classes , and . We have already used objects whose types were these classes: is an object of class and is an object of class . Therefore, we have already been using classes that are related to our file streams. And in fact, we can use our file streams the same way we are already used to use and , with the only difference that we have to associate these streams with physical files. Let's see an example:



This code creates a file called and inserts a sentence into it in the same way we are used to do with , but using the file stream instead.

But let's go step by step:

Open a file


The first operation generally performed on an object of one of these classes is to associate it to a real file. This procedure is known as to open a file. An open file is represented within a program by a stream object (an instantiation of one of these classes, in the previous example this was ) and any input or output operation performed on this stream object will be applied to the physical file associated to it.

In order to open a file with a stream object we use its member function :


Where is a null-terminated character sequence of type (the same type that string literals have) representing the name of the file to be opened, and is an optional parameter with a combination of the following flags:

ios::inOpen for input operations.
ios::outOpen for output operations.
ios::binaryOpen in binary mode.
ios::ateSet the initial position at the end of the file.
If this flag is not set to any value, the initial position is the beginning of the file.
ios::appAll output operations are performed at the end of the file, appending the content to the current content of the file. This flag can only be used in streams open for output-only operations.
ios::truncIf the file opened for output operations already existed before, its previous content is deleted and replaced by the new one.

All these flags can be combined using the bitwise operator OR (). For example, if we want to open the file in binary mode to add data we could do it by the following call to member function :



Each one of the member functions of the classes , and has a default mode that is used if the file is opened without a second argument:

classdefault mode parameter
ofstreamios::out
ifstreamios::in
fstreamios::in | ios::out

For and classes, and are automatically and respectively assumed, even if a mode that does not include them is passed as second argument to the member function.

The default value is only applied if the function is called without specifying any value for the mode parameter. If the function is called with any value in that parameter the default mode is overridden, not combined.

File streams opened in binary mode perform input and output operations independently of any format considerations. Non-binary files are known as text files, and some translations may occur due to formatting of some special characters (like newline and carriage return characters).

Since the first task that is performed on a file stream object is generally to open a file, these three classes include a constructor that automatically calls the member function and has the exact same parameters as this member. Therefore, we could also have declared the previous object and conducted the same opening operation in our previous example by writing:



Combining object construction and stream opening in a single statement. Both forms to open a file are valid and equivalent.

To check if a file stream was successful opening a file, you can do it by calling to member with no arguments. This member function returns a bool value of true in the case that indeed the stream object is associated with an open file, or false otherwise:



Closing a file

When we are finished with our input and output operations on a file we shall close it so that its resources become available again. In order to do that we have to call the stream's member function . This member function takes no parameters, and what it does is to flush the associated buffers and close the file:



Once this member function is called, the stream object can be used to open another file, and the file is available again to be opened by other processes.

In case that an object is destructed while still associated with an open file, the destructor automatically calls the member function .

Text files

Text file streams are those where we do not include the flag in their opening mode. These files are designed to store text and thus all values that we input or output from/to them can suffer some formatting transformations, which do not necessarily correspond to their literal binary value.

Data output operations on text files are performed in the same way we operated with :



Data input from a file can also be performed in the same way that we did with :



This last example reads a text file and prints out its content on the screen. We have created a while loop that reads the file line by line, using . The value returned by is a reference to the stream object itself, which when evaluated as a boolean expression (as in this while-loop) is true if the stream is ready for more operations, and false if either the end of the file has been reached or if some other error occurred.

Checking state flags

In addition to , which checks whether the stream is ready for input/output operations, other member functions exist to check for specific states of a stream (all of them return a bool value):

bad()
Returns true if a reading or writing operation fails. For example in the case that we try to write to a file that is not open for writing or if the device where we try to write has no space left.
fail()
Returns true in the same cases as bad(), but also in the case that a format error happens, like when an alphabetical character is extracted when we are trying to read an integer number.
eof()
Returns true if a file open for reading has reached the end.
good()
It is the most generic state flag: it returns false in the same cases in which calling any of the previous functions would return true.

In order to reset the state flags checked by any of these member functions we have just seen we can use the member function , which takes no parameters.

get and put stream pointers

All i/o streams objects have, at least, one internal stream pointer:

, like , has a pointer known as the get pointer that points to the element to be read in the next input operation.

, like , has a pointer known as the put pointer that points to the location where the next element has to be written.

Finally, , inherits both, the get and the put pointers, from (which is itself derived from both and ).

These internal stream pointers that point to the reading or writing locations within a stream can be manipulated using the following member functions:

tellg() and tellp()

These two member functions have no parameters and return a value of the member type , which is an integer data type representing the current position of the get stream pointer (in the case of ) or the put stream pointer (in the case of ).

seekg() and seekp()

These functions allow us to change the position of the get and put stream pointers. Both functions are overloaded with two different prototypes. The first prototype is:


Using this prototype the stream pointer is changed to the absolute position (counting from the beginning of the file). The type for this parameter is the same as the one returned by functions and : the member type , which is an integer value.

The other prototype for these functions is:


Using this prototype, the position of the get or put pointer is set to an offset value relative to some specific point determined by the parameter . is of the member type , which is also an integer type. And is of type , which is an enumerated type () that determines the point from where offset is counted from, and that can take any of the following values:

ios::begoffset counted from the beginning of the stream
ios::curoffset counted from the current position of the stream pointer
ios::endoffset counted from the end of the stream

The following example uses the member functions we have just seen to obtain the size of a file:



Binary files

In binary files, to input and output data with the extraction and insertion operators ( and ) and functions like is not efficient, since we do not need to format any data, and data may not use the separation codes used by text files to separate elements (like space, newline, etc...).

File streams include two member functions specifically designed to input and output binary data sequentially: and . The first one () is a member function of inherited by . And is a member function of that is inherited by . Objects of class have both members. Their prototypes are:


Where is of type "pointer to char" (), and represents the address of an array of bytes where the read data elements are stored or from where the data elements to be written are taken. The parameter is an integer value that specifies the number of characters to be read or written from/to the memory block.



In this example the entire file is read and stored in a memory block. Let's examine how this is done:

First, the file is open with the flag, which means that the get pointer will be positioned at the end of the file. This way, when we call to member , we will directly obtain the size of the file. Notice the type we have used to declare variable :



is a specific type used for buffer and file positioning and is the type returned by . This type is defined as an integer type, therefore we can conduct on it the same operations we conduct on any other integer value, and can safely be converted to another integer type large enough to contain the size of the file. For a file with a size under 2GB we could use :



Once we have obtained the size of the file, we request the allocation of a memory block large enough to hold the entire file:



Right after that, we proceed to set the get pointer at the beginning of the file (remember that we opened the file with this pointer at the end), then read the entire file, and finally close it:



At this point we could operate with the data obtained from the file. Our program simply announces that the content of the file is in memory and then terminates.

Buffers and Synchronization


When we operate with file streams, these are associated to an internal buffer of type . This buffer is a memory block that acts as an intermediary between the stream and the physical file. For example, with an , each time the member function (which writes a single character) is called, the character is not written directly to the physical file with which the stream is associated. Instead of that, the character is inserted in that stream's intermediate buffer.

When the buffer is flushed, all the data contained in it is written to the physical medium (if it is an output stream) or simply freed (if it is an input stream). This process is called synchronization and takes place under any of the following circumstances:

  • When the file is closed: before closing a file all buffers that have not yet been flushed are synchronized and all pending data is written or read to the physical medium.
  • When the buffer is full: Buffers have a certain size. When the buffer is full it is automatically synchronized.
  • Explicitly, with manipulators: When certain manipulators are used on streams, an explicit synchronization takes place. These manipulators are: and .
  • Explicitly, with member function sync(): Calling stream's member function , which takes no parameters, causes an immediate synchronization. This function returns an value equal to if the stream has no associated buffer or in case of failure. Otherwise (if the stream buffer was successfully synchronized) it returns .

One thought on “Copy File C++ Fstream Assignment

Leave a Reply

Your email address will not be published. Required fields are marked *