Sooner or later, every developer needs a String of some sort. Reading and writing text is essential for almost any software. Most languages offer a String type for this task, but how about C?
There is no String class in the C programming language. Here strings are represented as a collection of individual characters in an array, the so-called char array. These character strings are then terminated by a specific symbol, the termination character ‘\0’.
C does not have classes per se. And it also does not have a dedicated type for a String. So how can you use strings (or their replacement) in C? The rest of this article reveals the details.
A String Class in C?
There is no String Class or String Datatype in C. Also there are no classes in the C programming language. The only way to represent a String in C is through an array of characters, where a character itself is represented in the datatyp char.
The char datatype which is 1 Byte in size and essentially a number between 0 and 255. The compiler knows how to interpret the number as a character by using the ASCII Table.
Char and the ASCII-Table
The ASCII Table was invented in the 1960s in order to help standardize computer communication in the 1960s.
ASCII (/ˈæskiː/iASS-kee),[3]: 6 abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices.
https://en.wikipedia.org/wiki/ASCII (visited on 2023/09/14)
In interaction with the ASCII table, each numerical value can be assigned to a corresponding value in the ASCII table. Our alphabet is located in the ASCII rows 65 to 122, where ‘A’ to ‘Z’ are 65 to 90 and ‘a’ to ‘z’ are 97 to 122.
Due to its numeric nature, a chararacter can also be interchanged with the number it represents in the ASCII Table. The following example demonstrates this.
#include <stdio.h>
int main()
{
char example = 'a'; /* 97 in ASCII */
printf("%c\n", example); /* Prints 'a' */
printf("%d\n", example); /* Prints 97 */
example = 98;
printf("%c\", example); /* Prints 'b' */
}
String Representation in Plain C
Now we know how characters are represented and stored in C, but how do we actually show and store a whole String? The answer is character arrays.
The biggest disadvantage is that you cannot have Strings that are dynamic in size, because the size of an array is fixed in C, regardless of its datatype. So you are basically left with two options, either youmake the array as big as it can (should) maximally get up front or you reallocate the memory every time the array changes.
The first option may waste memory space because if you reserve 255 Bytes for your String but on average only use 20 Bytes, you have 235 unused Bytes. The second option may waste performance, if you change the string very often you have to reallocate memory very often and this affects performance. Following is an example for both cases.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char text[255] = "Hello";
printf("%s, size: %ld\n", text, sizeof(text));
text[1] = 'a';
printf("%s, size: %ld\n", text, sizeof(text));
strcpy(&text[0], "Test");
printf("%s, size: %ld\n", text, sizeof(text));
char* smalltext = malloc(12 * sizeof(*smalltext));
strcpy(&smalltext[0], "Hello World");
printf("%s, size: %ld (real size: 12)\n", smalltext, sizeof(smalltext));
smalltext[1] = 'a';
printf("%s, size: %ld (real size: 12)\n", smalltext, sizeof(smalltext));
smalltext = realloc(smalltext, 5 * sizeof(char));
strcpy(&smalltext[0], "Test" );
printf("%s, size: %ld (real size: 5)\n", smalltext, sizeof(smalltext));
free(smalltext); /* Never forget to free allocated memory! */
}
This Program leads to the following output:
Hello, size: 255
Hallo, size: 255
Test, size: 255
Hello World, size: 8 (real size: 12)
Hallo World, size: 8 (real size: 12)
Test, size: 8 (real size: 5)
Wait: Why is the size of the second array always 8 Bytes? This is because the size of a pointer in the environment where I run this code is 8 Bytes. Unfortunately there is no standard way to find out the real size of this array in C so you have to keep track yourself or use something like msize (at least in Windows).
Writing behind the End of the Array
What happens when you write more characters into an array than you reserved initally? Well, the string is terminated but you may write into memory which can lead to undefined behaviour in your program and thus is a very bad thing. This is called a Buffer Overflow and should be avoided at all times
The simplest version of this is shown in the following example. Although only 4 Bytes are reserverd, the text is way bigger. If this happens at compile time the compiler may warn you, but if this happens dynamically in runtime (for example by reading a text file that is too big) then you have no awareness of this possible bug.
#include <stdio.h>
int main()
{
char small[4] = "This is way too big";
printf("%s\n", small);
}
The compiler warns us that we did something wrong:
main.c: In function ‘main’:
main.c:5:21: warning: initializer-string for array of ‘char’ is too long
5 | char small[4] = "This is way too big";
| ^~~~~~~~~~~~~~~~~~~~~
Only the word ‘This’ is printed to the console because the string is terminated after that.
This
User Defined String Class in C
Another option is to write a String “Class” your own so that you don’t have to reallocate memory manually evertime or are flexible to choose wheter you want fixed memory or not.
I showed how to emulate classes in C in another article. By following this example we could come up with a string class for ourselves.
If you want to take a deeper dive into writing your own class like modules in C you may take a look at all the container classes I implemented a C Version for.
In the header file we declare the interface of our String Class and how we are going to use it.
#pragma once
typedef struct string_t string_t;
string_t* string_new();
void string_ctr(string_t* obj, char* text);
void string_dtr(string_t* obj);
void string_set_text(string_t* obj, char* text);
char* string_get_text(string_t* obj);
int string_get_size(string_t* obj);
The implementations of these functions are very straightforward. We store the text that the user passes, determine the length of the text and also store this. The other functions return the values of the “Object”.
#include "string_t.h"
#include <stdlib.h>
typedef struct string_t
{
int size;
char* text;
}string_t;
string_t* string_new()
{
return (string_t*)malloc(sizeof(string_t));
}
static int string_determine_size(char* text)
{
int i = 0;
char n;
do {
++i;
n = text[i];
}while(n!='\0');
return i;
}
void string_ctr(string_t* obj, char* text)
{
string_set_text(obj, text);
}
void string_dtr(string_t* obj)
{
free(obj);
}
void string_set_text(string_t* obj, char* text)
{
obj->size = string_determine_size(text);
obj->text = text;
}
char* string_get_text(string_t* obj)
{
return obj->text;
}
int string_get_size(string_t* obj)
{
return obj->size;
}
This example program shows how our new class could be used in the real world. Keep in mind that it is very simple and not safe yet, but feel free to extend it for your needs.
#include <stdio.h>
#include "string_t.h"
int main()
{
string_t* my_string = string_new();
string_ctr(my_string, "Hello");
printf("%s, size: %d\n", string_get_text(my_string), string_get_size(my_string));
string_set_text(my_string, "Modern C Programming");
printf("%s, size: %d\n", string_get_text(my_string), string_get_size(my_string));
string_dtr(my_string);
}
This program outputs the following strings:
Hello, size: 5
Modern C Programming, size: 20
Summary
As you see, working with Strings in C is a little bit more complicated than in other languages. We also haven’t touched on Unicode or other specifications outside of the ASCII range, this would open a very wide field.