Docs
Strings
String Basics in C
Table of Contents
- ā¢Introduction
- ā¢What is a String in C?
- ā¢String Declaration and Initialization
- ā¢String Memory Layout
- ā¢String Input and Output
- ā¢Accessing Individual Characters
- ā¢String Length
- ā¢Common String Operations
- ā¢String Literals
- ā¢Best Practices
- ā¢Summary
Introduction
Strings are one of the most commonly used data types in programming. In C, strings have a unique implementation that differs from many other programming languages. Understanding how C handles strings is essential for:
- ā¢Text processing and manipulation
- ā¢User input/output handling
- ā¢File operations
- ā¢Data parsing
- ā¢Network programming
What is a String in C?
Definition
In C, a string is an array of characters terminated by a null character ('\0').
char str[] = "Hello";
This creates:
str[0] = 'H'
str[1] = 'e'
str[2] = 'l'
str[3] = 'l'
str[4] = 'o'
str[5] = '\0' ā Null terminator (marks end of string)
The Null Terminator
The null terminator ('\0') is crucial because:
- ā¢Marks the end - Functions know where the string ends
- ā¢ASCII value 0 - Not the character '0' (which is ASCII 48)
- ā¢Automatic - Added automatically for string literals
- ā¢Takes 1 byte - Must be accounted for in memory allocation
// The null character
'\0' == 0 // True! ASCII value is 0
'\0' == '0' // False! '0' is ASCII 48
'\0' == ""[0] // True! Empty string has only null terminator
Why Null Termination?
C strings are null-terminated because:
- ā¢No length stored - Unlike other languages, C doesn't store string length
- ā¢Efficiency - No need to pass length to every function
- ā¢Simplicity - Easy to iterate until
'\0'is found - ā¢Flexibility - String length can be determined at runtime
String Declaration and Initialization
Method 1: Character Array with Size
char str[10] = "Hello";
// Memory layout:
// [H][e][l][l][o][\0][ ][ ][ ][ ]
// 0 1 2 3 4 5 6 7 8 9
// Remaining positions are initialized to '\0'
Method 2: Character Array without Size
char str[] = "Hello";
// Compiler automatically allocates 6 bytes (5 chars + 1 null)
// [H][e][l][l][o][\0]
// 0 1 2 3 4 5
Method 3: Pointer to String Literal
char *str = "Hello";
// str points to a string literal (read-only memory)
// WARNING: Modifying this string is undefined behavior!
Method 4: Character by Character
char str[6];
str[0] = 'H';
str[1] = 'e';
str[2] = 'l';
str[3] = 'l';
str[4] = 'o';
str[5] = '\0'; // Don't forget the null terminator!
Method 5: Using Braces (Less Common)
char str[] = {'H', 'e', 'l', 'l', 'o', '\0'};
// Must manually include null terminator!
// Without '\0', it's just a char array, not a string
Comparison of Methods
| Method | Syntax | Modifiable | Size |
|---|---|---|---|
| Array with size | char s[10] = "Hi"; | Yes | Fixed (10) |
| Array auto-sized | char s[] = "Hi"; | Yes | Auto (3) |
| Pointer to literal | char *s = "Hi"; | No | Pointer size |
| Character init | char s[] = {'H','i','\0'}; | Yes | Auto (3) |
String Memory Layout
Visual Representation
String: "Hello"
Memory Address: 1000 1001 1002 1003 1004 1005
āāāāāā¬āāāāā¬āāāāā¬āāāāā¬āāāāā¬āāāāā
ā H ā e ā l ā l ā o ā \0 ā
āāāāāā“āāāāā“āāāāā“āāāāā“āāāāā“āāāāā
ASCII Values: 72 101 108 108 111 0
char str[] = "Hello";
str points to address 1000
str[0] = 'H' (at 1000)
str[5] = '\0' (at 1005)
Array vs Pointer
char arr[] = "Hello"; // Array - copies string to stack
char *ptr = "Hello"; // Pointer - points to read-only data
// Key differences:
sizeof(arr) // 6 (includes null terminator)
sizeof(ptr) // 4 or 8 (pointer size)
arr[0] = 'J'; // OK - modifying stack memory
ptr[0] = 'J'; // UNDEFINED BEHAVIOR - read-only memory!
Stack vs Data Segment
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā STACK ā
ā char arr[] = "Hello"; ā
ā [H][e][l][l][o][\0] ā Modifiable ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā HEAP ā
ā (dynamically allocated strings) ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā DATA SEGMENT (Read-Only) ā
ā "Hello" ā String literals ā
ā char *ptr = "Hello"; points here ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
String Input and Output
Output Functions
printf() with %s
char str[] = "Hello, World!";
printf("%s\n", str); // Output: Hello, World!
printf("%.5s\n", str); // Output: Hello (first 5 chars)
printf("%10s\n", str); // Right-aligned in 10 spaces
printf("%-10s\n", str); // Left-aligned in 10 spaces
puts()
char str[] = "Hello";
puts(str); // Outputs: Hello (automatically adds newline)
putchar() (single character)
putchar('H');
putchar('\n');
Input Functions
scanf() with %s
char name[50];
printf("Enter name: ");
scanf("%s", name); // No & needed - array name is already a pointer
// WARNING: scanf stops at whitespace!
// Input: "John Doe" ā name = "John"
Safe scanf with width specifier
char name[50];
scanf("%49s", name); // Read at most 49 chars (leave room for \0)
gets() - DANGEROUS (Removed in C11)
// NEVER USE gets() - no buffer overflow protection!
gets(str); // DANGEROUS!
fgets() - Safe Input
char str[100];
fgets(str, sizeof(str), stdin);
// Reads up to sizeof(str)-1 characters
// Includes newline if present
// Always null-terminates
Removing Newline from fgets()
char str[100];
fgets(str, sizeof(str), stdin);
// Method 1: Replace newline
str[strcspn(str, "\n")] = '\0';
// Method 2: Manual
size_t len = strlen(str);
if (len > 0 && str[len-1] == '\n') {
str[len-1] = '\0';
}
Comparison of Input Methods
| Function | Buffer Safe | Reads Spaces | Includes \n | Use Case |
|---|---|---|---|---|
| scanf %s | No* | No | No | Single words |
| scanf %49s | Yes | No | No | Single words |
| gets | No | Yes | No | NEVER USE |
| fgets | Yes | Yes | Yes | Full lines |
*Can be made safe with width specifier
Accessing Individual Characters
Using Array Indexing
char str[] = "Hello";
char first = str[0]; // 'H'
char last = str[4]; // 'o'
str[0] = 'J'; // str is now "Jello"
Using Pointer Arithmetic
char str[] = "Hello";
char *ptr = str;
char first = *ptr; // 'H'
char second = *(ptr + 1); // 'e'
ptr++; // ptr now points to 'e'
Iterating Through a String
// Method 1: Index-based
char str[] = "Hello";
for (int i = 0; str[i] != '\0'; i++) {
printf("%c ", str[i]);
}
// Method 2: Pointer-based
char *ptr = str;
while (*ptr != '\0') {
printf("%c ", *ptr);
ptr++;
}
// Method 3: Compact pointer style
for (char *p = str; *p; p++) {
printf("%c ", *p);
}
String Length
Using strlen()
#include <string.h>
char str[] = "Hello";
size_t len = strlen(str); // Returns 5 (doesn't count \0)
Manual Length Calculation
int my_strlen(const char *str) {
int len = 0;
while (str[len] != '\0') {
len++;
}
return len;
}
// Or with pointers:
int my_strlen_ptr(const char *str) {
const char *s = str;
while (*s) s++;
return s - str;
}
Length vs Size
char str[100] = "Hello";
strlen(str); // 5 (actual string length)
sizeof(str); // 100 (total array size)
Common String Operations
Copying Strings
#include <string.h>
char src[] = "Hello";
char dest[20];
// Using strcpy (unsafe - no bounds checking)
strcpy(dest, src);
// Using strncpy (safer - limits bytes copied)
strncpy(dest, src, sizeof(dest) - 1);
dest[sizeof(dest) - 1] = '\0'; // Ensure null termination
Concatenating Strings
char str1[20] = "Hello";
char str2[] = " World";
// Using strcat (unsafe)
strcat(str1, str2); // str1 = "Hello World"
// Using strncat (safer)
strncat(str1, str2, sizeof(str1) - strlen(str1) - 1);
Comparing Strings
char str1[] = "Hello";
char str2[] = "Hello";
// WRONG: This compares addresses, not content!
if (str1 == str2) // False! Different addresses
// CORRECT: Use strcmp
if (strcmp(str1, str2) == 0) {
printf("Strings are equal\n");
}
strcmp() Return Values
| Result | Meaning |
|---|---|
| 0 | Strings are equal |
| < 0 | str1 comes before str2 alphabetically |
| > 0 | str1 comes after str2 alphabetically |
String Literals
Characteristics
"Hello" // String literal - stored in read-only memory
- ā¢Immutable - Cannot be modified
- ā¢Static storage - Exist for program duration
- ā¢Shared - Compiler may merge identical literals
String Literal Concatenation
// Adjacent string literals are concatenated at compile time
char *msg = "Hello "
"World "
"!";
// Equivalent to: "Hello World !"
Escape Sequences in Strings
| Sequence | Meaning |
|---|---|
\n | Newline |
\t | Tab |
\\ | Backslash |
\" | Double quote |
\' | Single quote |
\0 | Null character |
\r | Carriage return |
\xNN | Hex value NN |
printf("Line 1\nLine 2\n"); // Two lines
printf("Tab\there\n"); // Tab character
printf("Quote: \"Hello\"\n"); // Embedded quotes
printf("Path: C:\\Users\\\n"); // Backslashes
Best Practices
1. Always Ensure Null Termination
char str[5];
strncpy(str, "Hello", 5);
// DANGER: No room for '\0'!
// CORRECT:
char str[6];
strncpy(str, "Hello", 5);
str[5] = '\0';
// OR:
strncpy(str, "Hello", sizeof(str) - 1);
str[sizeof(str) - 1] = '\0';
2. Use Safe Functions
// Prefer these:
strncpy(dest, src, n); // Instead of strcpy
strncat(dest, src, n); // Instead of strcat
fgets(str, n, stdin); // Instead of gets
snprintf(str, n, ...); // Instead of sprintf
3. Check Buffer Sizes
char buffer[100];
char input[200];
// Check before copying
if (strlen(input) < sizeof(buffer)) {
strcpy(buffer, input);
}
4. Initialize Strings
char str[100] = ""; // Initialize to empty string
char str2[100] = {0}; // Initialize all to null
5. Use const for Read-Only Strings
void print_message(const char *msg) {
printf("%s\n", msg);
// msg[0] = 'X'; // Compiler error - protected
}
6. Prefer String Literals with const
const char *msg = "Hello"; // Correct - clearly read-only
char *msg = "Hello"; // Works but misleading
Summary
Key Points
- ā¢C strings are null-terminated character arrays
- ā¢Null terminator (
'\0') marks the end of a string - ā¢String literals are stored in read-only memory
- ā¢Array strings can be modified; pointer-to-literal strings cannot
- ā¢strlen() returns length without null terminator
- ā¢sizeof() returns total allocated size
- ā¢Use safe functions (strncpy, strncat, fgets) to prevent buffer overflow
String Declaration Quick Reference
// Modifiable strings:
char str1[] = "Hello"; // Auto-sized array
char str2[20] = "Hello"; // Fixed-size array
char str3[20] = {'H','i','\0'}; // Character init
// Read-only (pointer to literal):
const char *str4 = "Hello"; // Pointer to literal
Common Mistakes to Avoid
- ā¢Forgetting null terminator when manually building strings
- ā¢Using == to compare strings instead of strcmp()
- ā¢Modifying string literals (undefined behavior)
- ā¢Buffer overflow with strcpy/strcat on small buffers
- ā¢Off-by-one errors when allocating string memory
- ā¢Using gets() - always use fgets() instead
Memory Size Calculation
char str[] = "Hello"; // Total bytes = 6 (5 chars + 1 null)
// For dynamic allocation:
char *copy = malloc(strlen(original) + 1); // +1 for null!