Tuesday, July 19, 2016

Storage of Strings in C

How readonly & dynamically allocated strings are stored in C
In C, a string can be referred either using a character pointer or as a character array.
Strings as character arrays
char str[4] = "GfG"; /*One extra for string terminator*/
/*    OR    */
char str[4] = {‘G’, ‘f’, ‘G’, '\0'}; /* '\0' is string terminator */
When strings are declared as character arrays, they are stored like other types of arrays in C. For example, if str[] is an auto variable then string is stored in stack segment, if it’s a global or static variable then stored in data segment, etc.
Strings using character pointers
Using character pointer strings can be stored in two ways:
1) Read only string in a shared segment.
When string value is directly assigned to a pointer, it’s stored in a read only block(generally in data segment) that is shared among functions
char *str  =  "GfG";
In the above line “GfG” is stored in a shared read only location, but pointer str is stored in a read-write memory. You can change str to point something else but cannot change value at present str. So this kind of string should only be used when we don’t want to modify string at a later stage in program.
And also the above should be declared as
const char *p = "Hello";So the compiler will throw an error if you try to modify it.
2) Dynamically allocated in heap segment.
Strings are stored like other dynamically allocated things in C and can be shared among functions.
char *str;
int size = 4; /*one extra for ‘\0’*/
str = (char *)malloc(sizeof(char)*size);
*(str+0) = 'G';
*(str+1) = 'f';
*(str+2) = 'G';
*(str+3) = '\0';
Example 1 (Try to modify string) 
The below program may crash (gives segmentation fault error) because the line *(str+1) = ‘n’ tries to write a read only memory.
int main()
{
 char *str;
 str = "GfG";     /* Stored in read only part of data segment */
 *(str+1) = 'n'; /* Problem:  trying to modify read only memory */
 getchar();
 return 0;
}
str is a pointer stored on the stack. It gets initialized to point to the literal string "abc". That literal string is going to be stored in the data section of your compiled executable and gets loaded into memory when your program is loaded. That section of memory is read-only, so when you try and modify the data pointed to by str, you get an access violation.
char* str = malloc(sizeof(char) * 4);
strcpy(str, "abc");
Here, str is the same stack pointer.This time, it is initialized to point to a 4-character block of memory on the heap that you can both read and write. At first that block of memory is uninitialized and can contain anything. strcpy reads the block of read-only memory where "abc" is stored, and copies it into the block of read-write memory that str points to. Note that setting str[3] = '\0' is redundant, since strcpy does that already.
As an aside, if you are working in visual studio, use strcpy_s instead to make sure you don't overwrite your buffer if the string being copied is longer than you expected.
Here str is now an array allocated on the stack. The compiler will sized it exactly fit the string literal used to initialize it (including the NULL terminator). The stack memory is read-write so you can modify the values in the array however you want.
Below program works perfectly fine as str[] is stored in writable stack segment.
int main()
{
 char str[] = "GfG"/* Stored in stack segment like other auto variables */
 *(str+1) = 'n';   /* No problem: String is now GnG */
 getchar();
 return 0;
}
Below program also works perfectly fine as data at str is stored in writable heap segment.
int main()
{
  int size = 4;
  /* Stored in heap segment like other dynamically allocated things */
  char *str = (char *)malloc(sizeof(char)*size);
  *(str+0) = 'G';
  *(str+1) = 'f';
  *(str+2) = 'G';
  *(str+3) = '\0';
  *(str+1) = 'n'/* No problem: String is now GnG */
   getchar();
   return 0;
}
Example 2 (Try to return string from a function) The below program works perfectly fine as the string is stored in a shared segment and data stored remains there even after return of getString()
char *getString()
{
  char *str = "GfG"; /* Stored in read only part of shared segment */
  /* No problem: remains at address str after getString() returns*/
  return str;
}    
int main()
{
  printf("%s", getString());
  getchar();
  return 0;
}
The below program alse works perfectly fine as the string is stored in heap segment and data stored in heap segment persists even after return of getString()
char *getString()
{
  int size = 4;
  char *str = (char *)malloc(sizeof(char)*size); /*Stored in heap segment*/
  *(str+0) = 'G';
  *(str+1) = 'f';
  *(str+2) = 'G';
  *(str+3) = '\0'
  /* No problem: string remains at str after getString() returns */
  return str;
}
int main()
{
  printf("%s", getString());
  getchar();
  return 0;
}
But, the below program may print some garbage data as string is stored in stack frame of function getString() and data may not be there after getString() returns.
char *getString()
{
  char str[] = "GfG"; /* Stored in stack segment */
  /* Problem: string may not be present after getSting() returns */
  return str;
}
int main()
{
  printf("%s", getString());
  getchar();
  return 0;
}

No comments:

Post a Comment