perl - Fetch tabular data from MS-Word file -


i need fetch tabular data ms-word file. code referred fetches first , last rows, need fetch entire table.

later, data fetched has cross-checked if same filename exists in folder.

i not able understand flow of code, new win32::ole module.

i have referred similar question fetching data on site, couldn't it.

please let me know how proceed.

#!/usr/bin/perl   use strict; use warnings;  use file::spec::functions qw( catfile ); use win32::ole qw(in); use win32::ole::const 'microsoft word';  $win32::ole::warn = 3;  $word = get_word(); $word->{displayalerts} = wdalertsnone; $word->{visible}       = 1;  $doc    = $word->{documents}->open('d:\a.doc'); $tables = $word->activedocument->{'tables'};  $table (in $tables) {   $tabletext = $table->converttotext({ separator => wdseparatebytabs });   print "table: " . $tabletext->text() . "\n"; }  $doc->close(0);  sub get_word {   $word;   eval { $word = win32::ole->getactiveobject('word.application'); };   die "$@\n" if $@;   unless (defined $word) {     $word = win32::ole->new('word.application', sub { $_[0]->quit })         or die "oops, cannot start word: ", win32::ole->lasterror, "\n";   }   return $word; } 

update: a.doc

article    no.      count no      committee  a0029     a0029    16            e01.07  b0028     b0028    34            e04.09  c0036     c0036    17            e09.00  d0033     d0033    15            e08.07 

output in cmd

d:\word>a.pl d0033   d0033   15      e08.07no                     committee 

the problem caused fact table rows terminated cr characters in text returned converttotext method:

c:\...\temp> perl word-table.pl a.doc | xxd 00000000: 4172 7469 636c 6509 4e6f 2e09 436f 756e  article.no..coun 00000010: 7420 4e6f 0943 6f6d 6d69 7474 6565 0d41  t no.committee.a 00000020: 3030 3239 0941 3030 3239 0931 3609 4530  0029.a0029.16.e0 00000030: 312e 3037 0d42 3030 3238 0942 3030 3238  1.07.b0028.b0028 00000040: 0933 3409 4530 342e 3039 0d43 3030 3336  .34.e04.09.c0036 00000050: 0943 3030 3336 0931 3709 4530 392e 3030  .c0036.17.e09.00 00000060: 0d44 3030 3333 0944 3030 3333 0931 3509  .d0033.d0033.15. 00000070: 4530 382e 3037 0d0d 0a                   e08.07...

to solve, replace carriage returns newlines:

#!/usr/bin/env perl  use strict; use warnings;  use carp qw( croak ); use cwd qw( abs_path ); use path::class; use win32::ole qw(in); use win32::ole::const 'microsoft word';  $win32::ole::warn = 3;  run(\@argv);  sub run {     $argv = shift;     $word = get_word();      $word->{displayalerts} = wdalertsnone;     $word->{visible}       = 1;      $word_file ( @$argv ) {         print_tables($word, $word_file);     }      return; }  sub print_tables {     $word = shift;     $word_file = file(abs_path(shift));      $doc = $word->{documents}->open("$word_file");     $tables = $word->activedocument->{tables};      $table (in $tables) {         $text = $table->converttotext(wdseparatebytabs)->text;         $text =~ s/\r/\n/g;         print $text, "\n";     }      $doc->close(0);     return; }  sub get_word {     $word;     eval { $word = win32::ole->getactiveobject('word.application'); 1 }         or die "$@\n";     $word , return $word;     $word = win32::ole->new('word.application', sub { $_[0]->quit })         or die "oops, cannot start word: ", win32::ole->lasterror, "\n";     return $word; } 

Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -