A.I. programming in Prolog and Assembler

September 21, 2007

Reading EXCEL CSV-files as Prolog Clauses (SWI-Prolog source-code)

stylized depiction of a csv text file
Image via Wikipedia

If you need to convert into Prolog terms “raw data” supplied in EXCEL csv-files, read on! The source code in this posting will read any CSV file, converting each semicolon-delimited line (or record) of the CSV file into a Prolog clause, asserted in RAM. It is also possible to use the same code to read data deliberately provided (e.g. by another application) as a CSV-file, but which is specifically intended for use as a set of Prolog clauses.

This code also uses a couple of specification predicates: time_field_type/1, field1_as_functor/1, and conv_csvhead/2. These predicates control the behaviour of the conversion process, as follows:

time_field_type/1 :

  • time_field_type(0). In this case, time-fields in the CSV file (of the form “HH:MM” or “HH:MM:SS…”) are translated into minutes, ignoring seconds or hundredths of a second.
  • time_field_type(1). In this case, time-fields in the CSV file (of the form “HH:MM” or “HH:MM:SS…”) are translated into seconds, ignoring hundredths of a second.
  • time_field_type(2). In this case, time-fields in the CSV file are kept as they are, as atoms (e.g. ’03:35′, ’12:45:20′, etc).

field1_as_functor/1:

  • field1_as_functor(0): Each line in the CSV-file is interpreted as a prolog clause, where the functor of the clause is the first field of the record, and the other fields are arguments.
  • field1_as_functor(foo) (where ‘foo’ can be any atom): Each line in the CSV file is interpreted as a prolog clause, where the functor of the clause is foo (or any atom supplied as 1st argument to field1_as_functor/1) and all the fields are arguments.

conv_csvhead/2:

  • This predicate is used to convert the contents of the first field (of the CSV-file) into a (user-defined) internal Prolog representation. It is used only if “time_field_type(0)” exists. For example, to convert records where the first field is a Prolog functor ‘job’ but the actual contents of this field are ‘j’ (for brevvity), using a definition “conv_csvhead(j,job)” will convert each ‘j’ into a functor ‘job’. (Use of conv_csvhead/2 is optional; in the default case, it does nothing!)

Finally, some notes:

  • The main predicate to call is “loaddb(CSVfile)“, where CSVfile can be e.g. “test.csv”.
  • Provision has been taken for special fields which contain Lists of items, comma-delimited. In EXCEL these fields will appear as longish strings, but this code was written to parse them as Prolog atom-lists. (Comment-out this section if you don’t need it).
  • The only type of field that is currently not converted into any meaningful internal representation is DATE. Dates are converted to atoms, just as they appear, without parsing their actual contents. (As an exercise, you can re-use parts of the same code to parse date-fields!) The honest reason for this omission is that… I didn’t need dates (in an application I am developing, for which this code was also written).

The source-code follows. There are useful comments inside this code. You can just copy and paste what follows from this point onwards, into a text file saved for compilation by SWI-Prolog, ending in “.pl”:

% ReadingG CSV-files (EXCEL); converting to Prolog clauses
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Open-source SWI-Prolog code by OMADEON (c)2007/2009.
%% Blogs:
%%%%%%%% https://prologsource.wordpress.com
%%%%%%%% http://omadeon.wordpress.com
% time_field_type/1 defines the type of the TIME-fields:
% Cases:
% (0): minute-accuracy,
% (1): seconds-accuracy,
% (N): time preserved as is (atom, e.g. '03:30'); N > 1 
time_field_type(0).
%time_field_type(1).
%time_field_type(2).

% Reading CSV-file, converting to prolog-clauses:
loaddb(File):-
     open(File, read, Stream),
     read_line_to_codes(Stream, T0),
     parse_line(T0,Stream), 
     close(Stream).

% user-specific conversion of 1st field to functor:
%conv_csvhead('j','job'):- !. % activate if really needed!
% default-case, when the 1st field is an internal functor
conv_csvhead(X,X):- !.

% specifies if first field of each record is Prolog functor
field1_is_functor(1). % it is a functor
%field1_is_functor(foo). % NOT a functor; arg1 IS the functor!

% end of file
parse_line(end_of_file,_):- !.
% end of input-stream
parse_line([],_):- !.

% current line is parsed; next line follows
parse_line(CODES,Stream):-
     field1_is_functor(1), !,
     csvparse(CODES,XL),
     XL = [First|XXL],
     conv_csvhead(First,Head),
     TERM =.. [Head|XXL],
     assert(TERM),  % assert prolog-clause
     read_line_to_codes(Stream,T2), 
     parse_line(T2,Stream).

% current line is parsed; next line follows
parse_line(CODES,Stream):-
     field1_is_functor(Head), !,
     csvparse(CODES,XL),
     TERM =.. [Head|XL],
     assert(TERM), %%% assert prolog-clause
     read_line_to_codes(Stream,T2),
     parse_line(T2,Stream).

% error message (rarely invoked; included "just in case")
parse_line(Term,Stream):- 
     format(user_error, 'Bad term: ~p~n', [Term]),
     read_line_to_codes(Stream,T2),
     parse_line(T2,Stream).
csvparse2([],_,[[]]):-!.
csvparse2(Inp,Sep,X):-
     csvparse(Inp,Sep,X), !.

csvparse2a([],[[]]):-!.
csvparse2a(Inp,X):- csvparse(Inp,X), !.

% parsing a line delimited by a semicolon(;)
csvparse([],[]):-!.
csvparse(CODES,[FirstOut|More]):-
     append(First,[59|RestCODES],CODES),
     !,
     conv_codes(First,FirstOut),
     csvparse2a(RestCODES,More).
csvparse(Last,[OutX]):- conv_codes(Last,OutX), !.

% parsing line delimited by given separator-char:
csvparse([],_,[]):-!.
csvparse(CODES,SEPc,[FirstOut|More]):-
     append(First,[SEPc|RestCODES],CODES), !,
     conv_codes(First,FirstOut),
     csvparse2(RestCODES,SEPc,More).
csvparse(Last,_,[OutX]):-
     conv_codes(Last,OutX), !.

% check that input-list is only digits or dot(.)
are_digits([C]):- C >= 46, C =< 57, C =\= 47, !.
are_digits([C|CL]):-
      C >= 46, C =< 57, C =\= 47,
      !, are_digits(CL).

% read string enclosed in double_quotes, e.g. "the"
conv_codes([34|Rest],AtomX):-
     append(MAINx,[34],Rest),
     +\memberchk(44,MAINx),
     atom_codes(AtomX,MAINx), !.

% reading an integer or decimal number
conv_codes(CL,Nx):-
     are_digits(CL),
     atom_codes(Ax,CL),
     atom_number(Ax,Nx), !.

% read comma-delimited list in dup-quotes, e.g. "a,b,cd"
conv_codes([34|Rest],LISTx):-
     append(MAINx,[34],Rest),
     memberchk(44,MAINx),
     csvparse(MAINx,44,LISTx), !.

% time_field of the form "HH:MM"
conv_codes([C1,C2,58,C3,C4],OUTx):-
    time_field_type(TYP),
    atom_codes(A1,[C1,C2]),
    atom_codes(A2,[C3,C4]),
    atom_number(A1,N1x),
    atom_number(A2,N2x),
    (TYP = 0 ->       
      OUTx is N2x + 60*N1x % only minutes-accuracy
      ;
      ( TYP = 1 ->
        ( OUTx is 60*(N2x + 60*N1x) ) % seconds-accuracy
        ;
        atom_codes(OUTx,[C1,C2,58,C3,C4]) % time=string
      )
    ), !.

% time_field of the form "HH:MM:SS..."
conv_codes([C1,C2,58,C3,C4,58,C5,C6|CL],OUTx):-
    time_field_type(TYP),
    atom_codes(A1,[C1,C2]),
    atom_codes(A2,[C3,C4]),
    atom_number(A1,N1x),
    atom_number(A2,N2x),
    MINSx is N2x + 60*N1x,
    ( TYP = 0
      -> OUTx = MINSx % if only minutes-accuracy required
        ;
        ( TYP = 1
          -> % use this if seconds-accuracy is required
          ( atom_codes(A3,[C5,C6]),
            atom_number(A3,SECS),
            OUTx is SECS + 60*MINSx
          )
          ;
          atom_codes(OUTx,[C1,C2,58,C3,C4,58,C5,C6|CL])
          % time=string
        )
    ), !.

% a comma-delimited list
conv_codes(MAIN,LISTx):-
     memberchk(44,MAIN),
     csvparse(MAIN,44,LISTx), !.

% remaining case: just an atom 
% (including case of time=string):
conv_codes(MAIN,ATOMx):-
     atom_codes(ATOMx,MAIN), !.
Reblog this post [with Zemanta]
Advertisements

16 Comments »

  1. […] Το νέο ποστ είναι κώδικας για ανάγνωση αρχείων EXCEL από SWI-Prolog. […]

    Pingback by Ενα νέο BLOG για Προγραμματισμό σε γλώσσα PROLOG ( https://prologsource.wordpress.com ) « OMADEON — September 21, 2007 @ 10:16 am

  2. ADO.NET free tutorial at Source Code

    Comment by Ramesh — April 16, 2008 @ 4:41 am

  3. Largest take place to find cheap and noble sell to notice winsome assess pro pills in USA – online peach on http://zyprexa-bestbuy.freehostia.com – HERE

    Comment by totaldrugs — March 30, 2010 @ 6:44 am

  4. Hello,

    I’m new to Prolog, but I really need the functionality you provide here. Could you tell me how to use the Excel data, which are stored in RAW after transformation? How to refer to them? How to check that this code works properly?

    Comment by May — February 3, 2011 @ 12:00 am

    • Well, the code here converts the Excel *.csv file (which is comma-delimited,or semi-colon-delimited) into Prolog lists (or alternatively, n-argument predicates).

      It’s easy to write a piece of code to verify the result. It’s essentially a printing routine. You can find plenty of code in textbooks or other sources doing this.

      It is, however, a very long time since I wrote the code and I don’t remember now, off-hand, what result exactly is produced, i.e. if it is just lists or predicates with n-arguments (for n cells in each Excel row), but they’re equivalent (just use functor / =.. ).

      Comment by OMADEON — February 3, 2011 @ 1:17 am

      • P.S. I looked again into this issue.

        Actually, prolog facts are produced, i.e. predicates with a functor that can either be the same as the leftmost EXCEL cell, or else an arbitrary user chosen functor.
        This is already explained in the code:

        % specifies if first field of each record is Prolog functor

        field1_is_functor(1). % it is a functor

        %field1_is_functor(foo). % NOT a functor; arg1 IS the functor!

        i.e. the default case (the code as it is) converts the 1st cell of each EXCEL row to a functor.

        The alternative case (by removing the “%” symbol on the left of the second statement and inserting a “%” symbol on the left of the first)… would make “foo” the functor.

        Example:
        An EXCEL comma-delimited row, such as

        this,number,"string",2,3,4,a,

        is converted to prolog term:

        this(number,"string",2,3,4,a)

        (in the current, default case)
        Alternatively, it can be converted to prolog term:

        foo(this,number,"string",2,3,4,a)

        …but to activate _this_ alternative, we need to use the following changed code:

        %field1_is_functor(1). % it is a functor

        field1_is_functor(foo). % NOT a functor; arg1 IS the functor!

        Comment by OMADEON — February 3, 2011 @ 10:50 am

  5. @May
    As regards verification, it’s easy to write a printing predicate that converts prolog terms to *.CSV-file comma-delimited EXCEL-lines. (and then compare the resulting file with the original CSV file).

    E.g. for prolog terms with 6 arguments,

    func(1,2,3,a,b,"str").
    func(4,5,6,c,d,"test").
    func(6,7,8,e,f,"end").

    …we can use a “fail-driven” loop:

    print_as_csv:-
    func(X1,X2,X3,X4,X5,X6),
    write(X1), write(','),
    write(X2), write(','),
    write(X3), write(','),
    write(X4), write(','),
    write(X5), write(','),
    writeq(X6),
    write(','), nl, fail.

    print_as_csv.

    (using “writeq/1” instead of “write/1” to preserve double quotes).

    Now, a good exercise for a beginner, is writing a better piece of prolog code, which “prints as a CSV-file” _any_ set of prolog predicates (all of them with the same functor and arity)… But I won’t do it _for you_ (hehe). 🙂

    Comment by OMADEON — February 3, 2011 @ 11:06 am

  6. It looks prommissing, but I still haven’t figured out how you got this output. It’s probably something super obvious, but I simply don’t know how to do it. Could you show me how to access the data after parsing it?

    To be honest, I’d have a .csv file in this form:

    schedule,,,,
    12:00,13:00,Alice,Room 5,task1
    13:00,14:00,Tom,Room 6,task2

    So, regarding to the intro description, I’d also have problem with data format. Did you, in a mean time, extended the code to parse data format as well?

    P.S. This is the first time, when I ask for help on any blog or forum. I’m surprised a lot with your fast answer and personal involvement. Thank you 🙂

    Comment by May — February 3, 2011 @ 11:16 am

    • Date not data format. Sorry for my mistake

      Comment by May — February 3, 2011 @ 11:57 am

  7. P.S.2 Sorry for stupid questions, but this is what happens, when a mathematician is asked to do some coding in a language she hasn’t used before…

    Comment by May — February 3, 2011 @ 11:18 am

  8. Gosh, you read in my mind. I haven’t even finished writing a post and your answer is here. 🙂 What a shame I haven’t contacted you some time ago. I’d have my work done till then and would think about less painful things right now 😛

    Comment by May — February 3, 2011 @ 11:23 am

  9. Your last note was REALLY helpful. Everything runs smoothly now. No more questions anymore 🙂

    Thank you for commitment!

    Cheers,
    A.

    Comment by May — February 3, 2011 @ 12:27 pm

  10. I have one more question. While compiling the code for the first time I got this kind of error:

    ERROR: /excConv.pl:150:7: Syntax error: Operator expected

    It relates to this piece of code:

    (146) % read string enclosed in double_quotes, e.g. “the”
    (147)
    (148) conv_codes([34|Rest],AtomX):-
    (149) append(MAINx,[34],Rest),
    (150) +\memberchk(44,MAINx),
    (151) atom_codes(AtomX,MAINx), !.

    To check whether parser works I simply put space after ‘+’ sign, but now the code doesn’t work for expressions in quotation. What may be the problem? How to solve it?

    Comment by May — February 3, 2011 @ 3:38 pm

    • You are absolutely right. I looked into my original file, wondering what caused the error message at line 150…
      …and I was surprised to see that the operator was “\+” instead of “+\” !!!

      It was a typing mistake I made, when trying to make the blog-post code more readable, a long time ago. And the wrong operator looked deceptively correct!

      There is also another small bug I discovered, after recompiling: Negative numbers are converted to atoms on the output. This will also be corrected, in a few minutes. Corrections will be made in the post, as well. And there will be a new post on the same topic, later on today. The new code will have improved control over output-format, as well as a GUI written in XPCE ! 🙂

      Comment by Omadeon — March 28, 2011 @ 9:19 am

  11. I want education videos Visual Prolog version 7.03 of the first level

    Comment by mohamed — December 21, 2012 @ 1:01 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: