String lists once again.

Started by EugeneP, November 17, 2011, 06:16:50 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

EugeneP

Dynamic a list of strings is the thing which can be implemented in Bennu but in an uneasy way. I guess every developer tried wrote something on this field. ( I am not exception :) )

People asks about dynamic strings / list of strings:
http://forum.bennugd.org/index.php?topic=2625.msg46432#msg46432
http://forum.bennugd.org/index.php?topic=2208.msg39555
http://forum.bennugd.org/index.php?topic=2122.msg43091
http://forum.bennugd.org/index.php?topic=94.msg805
(Not a popular question, but I remember few related topics, just can't find it now)

So I decided to raise this topic once again, to clear things to myself ( and to future generations :) ). I hope it wont be considered useless talks.

How to create dynamically allocated list of strings?
Most common solutions:
1) string *array = alloc( arraysize * sizeof(string) );
2) Create a sort of linked list ( like Drumpi did in some topic above)

Both (as I can see) would lead to memory corruption and wierd bugs due to nature of Bennu's strings.

This is how I understand current situation. And I want to ask:
- Is there a reliable way to implement dynamic list of strings?
- Maybe it could be useful if managing lists of strings was added to Bennu?

P.S.
My implementation was reliable, but slow and limited in usage. I stored list of string in a file, which is allowed me to avoid curly memory management.
Another way I tried was using linked lists of such kind:

process list_node()
public string chunk[NODE_SIZE]
public int prev_node, next_node;
begin loop frame 1000; end; end;

This worked too, but having over9000 useless process is uncool.

handsource-dyko

Using processes is one way I tried once. It works well (like a linked list thanks to bennu's father/son process relation ships and process id's), but the downside is that you'll end up with a lot of processes. Also, you can't index them like you can in delphi. Also, I don't know if bennu has some limitation in the max. number of processes (DIV had a limitation as far I can recall). In delphi a string list behaves like a dynamic array with an index, wich makes it easy to use.

EugeneP

After some thinking I wrote such test:

process test( string prefix, int n)
local
    string pointer arr;
    int i;
    int pointer ip;
begin
    arr = alloc( n * sizeof(string) );
   
    for(i=0;i<n;i++)
       sinit( arr + i );  // arr[i] stores garbage - 100%, fixing it.
    end
   
    for( i=0; i<n; i++ )
        arr[i] = prefix+" #"+i;
        frame;
    end
   
    for(i=0;i<n;i++)
        say(" s["+i+"] = " + arr[i] );
        frame;
    end
   
    for(i=0;i<n;i++)
       sdiscard( arr + i ); // == string_discard
    end
   
    free(arr);
end


Now I'll try to implement sinit and sdiscard as small module. It shouldn't be hard.

handsource-dyko

I've dug up an old pre-dykodialog filedialog prototype from my archive that I made a couple of months ago. It uses process for creating a linked list (a list of filenames in this case). It's a bit rough but the source is split into modules. I made this orginally for the malvado editor, but I realised I wanted a proper gui so I made dykodialogs in free pascal. But in my prototype I wanted to see if I could make a linked list without any c like pointer based memory managment because I don't fully understand pointers. It turned out that it is possible with processes. Also, this program shows where son/father/smallbro and bigbro are used for (they form a tree!) :)


Here's a snippit from the code in the attached archive:



/*****************************************************************************************

process file_entry: Node in a (linked) list of files. The links are created by the
                     file_entry id's bigbro and smallbro links that bennu process always
                     have. (These are a kind of pointer packed in a variable that holds
                     the process id code of the two adjecent file_entry processes).
                     Bennu automatically allocates memory for each process instance, so
                     hench we get a linked list without having to mess with pointers and
                     memory managment manually.
                     
                     This process is the core part of the "DYKO FILE SYSTEM OBJECT".

arguments: - int count       : the number of the file in the list
            - string path     : the absolute path to the file or directory
            - string name     : the file or directory name
            - byte dir        : TRUE or FALSE, indicates if the entry is a file or a directory
            - byte hidden     : TRUE or FALSE, indicates if the file is a hidden file
            - byte readonly   : TRUE or FALSE, indicated if the file is readonly
            - int size        : the size of the file
            - string created  : the date the file or directory was created
            - string modified : the date the file or directory was last modified
           
returns: - NOTHING       

******************************************************************************************/
PROCESS file_entry(int count, string path, string name, byte dir, byte hidden, byte readonly, int filesize, string created, string modified);

PRIVATE

BEGIN
     
   // check if we found the .. dir in the current directory, if not, the "dir up button" becomes
   // pointless, since we've reached the root directory of the filesystem.
   IF (dir==TRUE)
      IF (strcasecmp(name,"..")==0)
         IF (debugging==TRUE)
            say("There is a level above");           
         END         
         parentdir=TRUE;   // this entry is the ".." entry         
      ELSE
         IF (debugging==TRUE)
            say("This is a standard dir");
         END
      END
   END
   
     
   // create a child process that displays the file graphically
   display_file_entry(count,dir);
   
   
   
   
   IF (debugging==TRUE)
      IF (dir==TRUE)
         say("ID="+id+" file_entry: "+count+"; "+name+" <DIR>"+" parent? > "+parentdir);         
      ELSE
         say("ID="+id+" file_entry: "+count+"; "+name);
      END
   END
   
   LOOP
      // set the pointers
      prev_id=bigbro;
      next_id=smallbro;
      FRAME;
   END // end of loop
   
END // end of begin



/*****************************************************************************************

process display_file_entry: Child process of file entry, this is the "visible" part of
                             a file system object. This process puts the filename on the
                             screen.

arguments: - int count : number of file
            - byte dir  : TRUE or FALSE
           
returns: - NOTHING.     

******************************************************************************************/
PROCESS display_file_entry(int count, byte dir);

PRIVATE

int label_id;

byte parent_detected=FALSE;

BEGIN

   // reset the parent detector
   parent_detected=FALSE;
 
   // put it in the scroll
   ctype=c_scroll;

   x=20;
   y=(count*scroll_interval);
   
   IF (dir==TRUE)
      set_text_color(COLOR_GREY5);
      // create the text label
      label_id=write_in_map(0,father.name+" <DIR>",ALIGN_CENTER_LEFT);
     
      // when a directory has the parent property, it is the ".." directory.
      // we store the it's id code for use by the "dir up" button.
      IF (father.parentdir==TRUE)
     
         parentdir_id=father.id; // save the id code of the ".." dir     
         parent_detected=TRUE;
         
         IF (debugging==TRUE)
            say("parent detected!!!!!!!!!!!!!!!!! parentdir_ID="+parentdir_id);
         END
         
         set_text_color(COLOR_RED);
         // create the text label
         label_id=write_in_map(0,father.name+" <DIR>",ALIGN_CENTER_LEFT);
         
      // the dir is not a parent dir
      ELSE
         
         // check if the instance still exists, if not, we're in the root dir.
         IF (exists(parentdir_id))
            IF (debugging==TRUE)
               say("parentdir_id:"+parentdir_id);
            END
         ELSE
            parentdir_id=0; // we have no parent, so we're in the root of the filesystem.
           
            IF (debugging==TRUE)
               say("parentdir_id:"+parentdir_id);
            END
         END
         
         set_text_color(COLOR_GREY5);
         // create the text label
         label_id=write_in_map(0,father.name+" <DIR>",ALIGN_CENTER_LEFT);
      END
   
   // the entry is not a dir but a file.   
   ELSE
      set_text_color(COLOR_WHITE);
      // create the text label
      label_id=write_in_map(0,father.name,ALIGN_CENTER_LEFT);
   END
   
   graph=label_id;
   
   LOOP
      FRAME;
   END // end of loop
     
ONEXIT
   // free the resources
   unload_map(file,label_id);
   
END // end of begin


/*****************************************************************************************

process main: discription of what it more or less does

arguments: - string glob_mask : The path to start (e.g. c:\windows\*.*)
           
returns: NOTHING.     

******************************************************************************************/

PROCESS dir_read(string glob_mask);


PRIVATE

   
BEGIN

   // kill the file selector, when an instance of it already exists, because we
   // create a new instance at the end of the routine.
   IF (exists(TYPE fileselector))
      IF (debugging==TRUE)
         say("killing cursor");
      END
      signal(TYPE fileselector,s_kill);
   END
   
 
   
   glob("");    // reset search

   filename=""; // clear the filename
   filecount=0; // reset the counter

   IF (debugging==TRUE)
      say("reading directory: "+glob_mask);
   END
   
   LOOP
      filename=glob(glob_mask);
                                   
      // when glob returns a name,  it is a file or directory, so we increase the
      // filecounter, add it to the list of files, and print it.     
      IF (filename!="")       
         
         
         // add node to the list, depending on the dialog setting, show only dirs, or dirs and files
         IF (dialog_io.dialogtype==DIR_SELECT)         
            IF (fileinfo.directory==TRUE)
               filecount+=1;
               current_entry=file_entry(filecount, fileinfo.path, fileinfo.name, fileinfo.directory, fileinfo.hidden, fileinfo.readonly, fileinfo.size, fileinfo.created, fileinfo.modified);                                     
            ELSE
               // do nothing           
            END           
         ELSE // we're in normal mode
           filecount+=1;
           current_entry=file_entry(filecount, fileinfo.path, fileinfo.name, fileinfo.directory, fileinfo.hidden, fileinfo.readonly, fileinfo.size, fileinfo.created, fileinfo.modified);                 
         END
         
         IF (filecount==1)
            first_entry=current_entry;
         END
               
      ELSE
         last_entry=current_entry; // we have reached the last file/directory in the directory     

         IF (debugging==TRUE)         
            say("reading directory completed.");         
         END
         BREAK;
      END

   END // end of loop
   
   
   
   // update the current path.
   current_path=fileinfo.path;
   
   
   IF (debugging==TRUE)
      say("current_path (3):"+current_path);
   END
   
   // display the scroll with the files, and the selection cursor
   IF (debugging==TRUE)
      say("display file list");   
   END
   
   display_filelist();
   
   IF (debugging==TRUE)
      say("create cursor");   
   END
   
   selector=fileselector();
   
ONEXIT

   
   
END // end of begin




/*****************************************************************************************

process display_filelist: List the files on the screen created by dir_read().
                           The filenames are put inside a scroll.

arguments: -

           
returns: - NOTHING.     

******************************************************************************************/
PROCESS display_filelist();

PRIVATE

int scrollfile;
int fileslist;




BEGIN


   // create map for the scroll
   fileslist=new_map(300,100,16);
   
   // reset the scroll's internal upper coordinate, this keeps the scroll always on the
   // same position when this value was modified by the fileselector, since that process
   // is the scroll's camera process, so it modifies the y-value to keep the fileselector
   // centered on the screen.
   scroll.y0=0;
   
   // define the region and create the scroll
   scroll.region1=define_region(1,10,10,300,415);   
   start_scroll(0,scrollfile,fileslist,0,1,2);
   
   IF (exists(TYPE display_label))
      IF (debugging==TRUE)
         say("killing file list");
      END
      signal(TYPE display_label,s_kill);
   END
   
   // display the path
   pathlabel_id=display_label(30,460,TRUE,18,"PATH: "+current_path);
           
   LOOP 
     
      IF (exists(filelabel_id))
         signal(filelabel_id,s_kill);
      END
     
      // display the filename
      filelabel_id=display_label(30,450,FALSE,2,"FILENAME: "+current_selected_previewfilename);
     
      FRAME;
   END // end of loop

   
ONEXIT

   // free the resources
   stop_scroll(0);
   unload_map(scrollfile,fileslist);
   
END // end of begin



These are some of routines that read the current directory, create the file node and the list display.
The full source is in the archive.

Maybe some people find it usefull or get technical inspiration from it's implimentation.

handsource-dyko

When looking at the documentation of alloc I added an article for sizeof. What I noticed when I made the example for it, is that a bennu string is always 4 bytes. This is of course not the size of the character data itself, but the size of an int. My theory about memory corruption with string lists is that the string type is just a pointer to the character data. Bennu seems to manage the memory for the string automatically. So, what's interesting is that you can implement strings in a lot of diffent ways. Bennu's strings seem to be ansi (C string) compatible (with a NULL termination).

SplinterGU

#5
nop...

strings in bennu are an identifier (integer number), not a pointer.

this identifier pointer is a positional index in a string database/array, this database/array is an array of pointers to real string, this real string is a string with a null char delimiter ('\0')

all local/private/public strings are automatically released (free) when process/function die or exit.

you don't must use alloc* functions for perform operations with string vars... string vars are a special data type for bennugd and it's manage internally by the core.

you don't must:

- alloc/free string vars
- copy with memmove or memcopy operations string vars (or structs with string vars)

the core need know what strings are used, if you use memcopy/memmove or alloc/free on string var, you are disturb the core string management.
Download Lastest BennuGD Release: http://www.bennugd.org/node/2

handsource-dyko

I see..... So the only right way to create a string list is to use processes (wich works fine for me) or to create an array of character array's?  (alloc for each character array, and them alloc for the array of character arrays) That should be safe.

Just a side note: is bennu's string database a perhaps a hash table? 

SplinterGU

don't need hash table... have a bitmap index table...

if you need use a string array... maybe the best secure option is use an array of char[]... if you know the max sizeof of string.

a char[] is same that a string, but is more faster and more secure that use string.
Download Lastest BennuGD Release: http://www.bennugd.org/node/2

handsource-dyko

When I stepped out of bed this morning I had an "a-ha! erblebnis" (a philosphical moment). It sould be possible to create an array of structs thats is created with alloc/realloc that contain only integers, but that represent character postions of words in one big string. So, At compile time only one string is created, and during runtime individual strings are concatianted into that string. With subst() it is possible to get parts of that string. So what if I store nummerical positions in an dymanically allocated array to use with substr? I think it should be possible.

So the struct would look like this:


struct stringpart

     int index;       // number of the substring
     int first_char;
     int last_char;
     int length;
end


and the string for storing the individual parts:


string stringlist="";


In the beginning, this string is empty. When a function wants to add a string to the list, it allocates memory for the struct member, and updates the fields, and concats the string to stringlist.

To get right index, we can use sizeof(stringpart), wich would be size of 4 integers (16 bytes) this way we now know the byte offset, and every time we add a string to the list, stringpart is reallocated with an additional 16 bytes, and the string contents is simply concatinated to existing string.

So if there are 6 strings in the list, (dog, cat, bird, horse, camel, monkey): the string "stringlist" looks like this:"dogcatbirdhorsecamelmonkey". The array would contain 6 elements of each 16 bytes. The contents are:


stringpart[0].index=0;
stringpart[0].first_char=0;
stringpart[0].last_char=2;
stringpart[0].length=3;

stringpart[1].index=1;
stringpart[1].first_char=3;
stringpart[1].last_char=5;
stringpart[1].length=3;

stringpart[2].index=2;
stringpart[2].first_char=6;
stringpart[2].last_char=9;
stringpart[2].length=5;

etc etc


The rest is pointer math, and the memory for each struct element is allocated with alloc/realloc and memset. I still have to work this out, but I just thought of this when I stepped out of bed! :)




SplinterGU

#9
?????????!!!!!!!!!

what is your point????

use char str[max_size] for fast and secure strings.
Download Lastest BennuGD Release: http://www.bennugd.org/node/2

handsource-dyko

#10
 ;D ;D I'm already working on something. It's not ready yet, but what I have so far is working fine. Maybe the code will clarify my design intend:

(I still have add the "array enlarge" function) :)


IMPORT "mod_say";
IMPORT "mod_debug";
IMPORT "mod_mem";
IMPORT "mod_string";


/* Containter data type for string array. Holds information of the substrings.  */
/* The size of this type in bytes is 16 bytes (4 intergers), as bennu int's are */
/* 4 bytes in size (32 bits).                                                   */
TYPE stringlist;

   int index;
   int first_char;
   int last_char;
   int length;
END



GLOBAL

   /* Data used for the stringlist functions */
   
   stringlist pointer p_string_array; // pointer for acessing the stringlist array
   string list_buffer;                // this is the bennu string in wich each substring is stored
   
   int num_strings;                   // the size of the array (indicates the boudaries)
   int current_string;                // the position of the index (the last added string of the list)
   
   
   
   
   
   /* Other data */
   
   int str;    // indentifier of a stringlist.
   int count;  // general purpose loop counter.

   int add_status;
   
   

/* Declares for functions that return strings */
DECLARE

   FUNCTION string return_string(int array_index);

END // end of declare.




PROCESS main();

BEGIN

   // create a new string array with 10 elements (0-9).
   str=new_stringlist(10);
   say("str: "+str);
   
   // add some strings to the list.
   add_status=add_string("dog");
   say("add_status: "+add_status);
   add_status=add_string("cat");
   say("add_status: "+add_status);
   add_status=add_string("horse");
   say("add_status: "+add_status);
   add_status=add_string("rhino");
   say("add_status: "+add_status);
   add_status=add_string("bird");
   say("add_status: "+add_status);
   
   add_status=add_string("monkey");
   say("add_status: "+add_status);
   add_status=add_string("donkey");
   say("add_status: "+add_status);
   add_status=add_string("parrot");
   say("add_status: "+add_status);
   add_status=add_string("bunny");
   say("add_status: "+add_status);
   add_status=add_string("wombat");
   say("add_status: "+add_status);
   
   // add one beyond array boundary, should cause error
   add_status=add_string("giraffe");
   say("add_status: "+add_status);
   
   
   // print some debug info.
   FOR (count=0; count<10; count++)
     
      say("string "+count);
      say("stringlist["+count+"].index: "+p_string_array[count].index);
      say("stringlist["+count+"].first_char: "+p_string_array[count].first_char);
      say("stringlist["+count+"].last_char: "+p_string_array[count].last_char);
      say("stringlist["+count+"].length: "+p_string_array[count].length);   
      say("stringlist["+count+"]: "+chr(34)+return_string(count)+chr(34));     
   END
   
   
   // print strings from the list.
   say(return_string(0));
   say(return_string(1));
   say(return_string(2));
   say(return_string(3));
   say(return_string(4));
 
/* 
   // these are invalid
   say(return_string(5));
   say(return_string(10));
*/
     
END // end of main.











/* FUNCTION new_stringlist                                                           */
/*                                                                                   */
/* what it does:                                                                     */
/* - Creates a new array with references to substrings of a potentially very large   */
/*   string, that holds all the substrings (the strings in the list).                */
/*   It provides an index mechanism for strings, where each string in the list is a  */
/*   substring of a large "container" string.                                        */
/*   Therefore, it needs to know the length and the position of each substring.      */
/*   This, and the index are stored in a custom data-type, called "stringlist".      */
/*                                                                                   */
/* arguments:                                                                        */
/* - int num_strings: the number of elements the array should contain.               */
/*                                                                                   */
/* returns:                                                                          */
/* -int status: -1: error, >1: the length of the array in elements.                  */

FUNCTION int new_stringlist(int num_strings);

PRIVATE

int c;

BEGIN
     
   p_string_array=calloc(num_strings,sizeof(stringlist));   // num_strings * 16 bytes (the size of an stringlist)
   
   IF (p_string_array==NULL)
      // we have an error
      say("error");
      RETURN -1;
   ELSE
      // not null, it's ok! set things in up the array's data container.     
      FOR (c=0; c<num_strings; c++)
     
         p_string_array[c].index=c;
         p_string_array[c].first_char=0;
         p_string_array[c].last_char=0;
         p_string_array[c].length=0;           
      END
     
      // return how many elements the array has created, so that the indexing and adding functions
      // can check if a string outside the array boundaries are accessed. If that's the case such
      // functions can throw an error.
      RETURN num_strings;
   END   
END



/* FUNCTION add_string                                                               */
/*                                                                                   */
/* what it does:                                                                     */
/* - Adds a new string to the list, provided that the list has enough room for it.   */
/*   If the array is big enough, it will concatinate the string to bufferstring      */
/*   (global string listbuffer), and return the index of it (stored in the global    */
/*   int variable "current_string").                                                 */
/*   It will return an error (-1) when a string is added beyond the array bounadries.*/
/*   The array boundary is stored in the global int variable int "num_strings".      */
/*                                                                                   */
/* arguments:                                                                        */
/* - string string_element: the string to add to the list.                           */
/*                                                                                   */
/* returns:                                                                          */
/* -int status: -1: error, >1: the index number of the added string.                 */

FUNCTION int add_string(string string_element);

PRIVATE

BEGIN
   
   // concatinate the string, i.e. add the "substring" to the "buffer string".
   list_buffer=list_buffer+string_element;
   
   // update the fields of the array data container, but only if the array is large enough.
   IF (current_string < num_strings)   
      p_string_array[current_string].index=current_string;
      p_string_array[current_string].first_char=len(list_buffer)-len(string_element);
      p_string_array[current_string].last_char=len(list_buffer);
      p_string_array[current_string].length=len(string_element);     
   ELSE
      // it doesn't fit, it can't be added unless more memory is allocated, so return an error status.
      RETURN -1;
   END
   
   // set the "pointer" for the next index to be used when this function is called
   // the next time.
   current_string+=1; 
   
   // return the index of the string that has just been added to the list.
   RETURN current_string-1;
END




/* FUNCTION return_string                                                            */
/*                                                                                   */
/* what it does:                                                                     */
/* - Returns the substring of the "buffer string" listbuffer, by using an array      */
/*   index number. When successfull, it returns the string, but when an empty index  */
/*   or an index beyond the array boundary is used, it retruns a string with the     */
/*   contents "-1", indicating an error.                                             */
/*                                                                                   */
/* arguments:                                                                        */
/* - int array_index: the index of the string to return.                             */
/*                                                                                   */
/* returns:                                                                          */
/* -string status/contents: -1: error, otherwise: the contents of the string.        */

FUNCTION string return_string(int array_index);

PRIVATE

BEGIN

   // Return the substring if the string index is within the boundaries of the array.
   // When an empty (or non-existing) element is accessed, we've got an error.
   IF (array_index < num_strings AND p_string_array[array_index].length > 0)
               
      RETURN substr(list_buffer, p_string_array[array_index].first_char, p_string_array[array_index].length); 
           
   ELSE
      // we have an error, the index is beyond the array boudaries.
      RETURN -1;
   END
END





FUNCTION int free_stringlist(pointer list);

PRIVATE

BEGIN

   free(list);
END



In a nutshell: I use one string to store everything, and create a dynamic array with some information about the substrings. Maybe my story sounded it a bit more complicated then it should have been.

SplinterGU

I understood from the beginning, but I think it's a good idea.

your idea isn't performant... and you don't have delete_string... add delete_string do need reorder all position table... isn't performant.
Download Lastest BennuGD Release: http://www.bennugd.org/node/2

handsource-dyko

I've been working further on this idea, and I've added a function enlarge the array, save it to a textfile and load a file to the stringlist. So far the limitations at the moment are that the "stringlist" array is global (there can only be one instance of at a time) and some the internal variables are still exposed to the application progam.

What I would like to do is to "hide" these variables from the application program, but I can't make them private because these variables are shared amongst the stringlist functions. Languages like delphi/free pascal have a unit system where a select group of functions can share variables but functions outside this unit cannot have access to them unless these variables are "exported".

With process I may be able to do this with father/son constructions but I think functions are bit more appropriate inj this case. The speed/performance is not really much of a concern at the moment and I only append strings to the list so inserting strings in the middle of a list is not a really an intended feature. It's really just more of an exersise for me with memory allocation and pointers. :) When I get more experienced/comfortable with it I may invent something better.

handsource-dyko

Updated / completed version with some extra functions.

Note I'm considering to make some further improvements, but this can be used if performance is not a big concern and when you only append strings to a list. It supports loading from and saving to a textfile. See the source for further details.

The new version is a bit too long to put in this post, so I attached a file:

handsource-dyko

I have been tinkering with my idea a bit further, I present you the new and improved  version with support for multiple lists, splitting lists, concatinating lists, inserting strings, deleting strings and swapping strings plus a few other functions.

This new version uses a process for each list instance, but besides that change in the design the mechanism is still the same. I've included both the old and the new version in the library and a some example/test programs with a description of the algorithm. The library can be downloaded here:

http://code.google.com/p/dykodialogs-for-bennugd/downloads/detail?name=dyko_stringlist_library.zip&can=2&q=

The new version contains about 18 functions in total, and each function has it's own source file, with a detailed description of it's usage, it's paramters, options, return values and error conditions. Also each function is fully documented. However, some functions have dependancies on others so I recommend to always include all the functions in your application. The old library is put in a seperate folder, called "lib_global".

Wich version you use, depends on your usage scenario. The global version is a basic library with about 7 functions (it is the old version). There isn't really a fixed limitation to length of the list, only a practical limitation imposed by your system memory and operating system. Be aware that performance is propertional to the length of the list, I don't recommend it for lists with thousands of strings (inserting and deleting are expensive operations because the array has to be split).