perl - Fetch tabular data from MS-Word file -
i need fetch tabular data ms-word file. code referred fetches first , last rows, need fetch entire table.
later, data fetched has cross-checked if same filename exists in folder.
i not able understand flow of code, new win32::ole
module.
i have referred similar question fetching data on site, couldn't it.
please let me know how proceed.
#!/usr/bin/perl use strict; use warnings; use file::spec::functions qw( catfile ); use win32::ole qw(in); use win32::ole::const 'microsoft word'; $win32::ole::warn = 3; $word = get_word(); $word->{displayalerts} = wdalertsnone; $word->{visible} = 1; $doc = $word->{documents}->open('d:\a.doc'); $tables = $word->activedocument->{'tables'}; $table (in $tables) { $tabletext = $table->converttotext({ separator => wdseparatebytabs }); print "table: " . $tabletext->text() . "\n"; } $doc->close(0); sub get_word { $word; eval { $word = win32::ole->getactiveobject('word.application'); }; die "$@\n" if $@; unless (defined $word) { $word = win32::ole->new('word.application', sub { $_[0]->quit }) or die "oops, cannot start word: ", win32::ole->lasterror, "\n"; } return $word; }
update: a.doc
article no. count no committee a0029 a0029 16 e01.07 b0028 b0028 34 e04.09 c0036 c0036 17 e09.00 d0033 d0033 15 e08.07
output in cmd
d:\word>a.pl d0033 d0033 15 e08.07no committee
the problem caused fact table rows terminated cr
characters in text returned converttotext
method:
c:\...\temp> perl word-table.pl a.doc | xxd 00000000: 4172 7469 636c 6509 4e6f 2e09 436f 756e article.no..coun 00000010: 7420 4e6f 0943 6f6d 6d69 7474 6565 0d41 t no.committee.a 00000020: 3030 3239 0941 3030 3239 0931 3609 4530 0029.a0029.16.e0 00000030: 312e 3037 0d42 3030 3238 0942 3030 3238 1.07.b0028.b0028 00000040: 0933 3409 4530 342e 3039 0d43 3030 3336 .34.e04.09.c0036 00000050: 0943 3030 3336 0931 3709 4530 392e 3030 .c0036.17.e09.00 00000060: 0d44 3030 3333 0944 3030 3333 0931 3509 .d0033.d0033.15. 00000070: 4530 382e 3037 0d0d 0a e08.07...
to solve, replace carriage returns newlines:
#!/usr/bin/env perl use strict; use warnings; use carp qw( croak ); use cwd qw( abs_path ); use path::class; use win32::ole qw(in); use win32::ole::const 'microsoft word'; $win32::ole::warn = 3; run(\@argv); sub run { $argv = shift; $word = get_word(); $word->{displayalerts} = wdalertsnone; $word->{visible} = 1; $word_file ( @$argv ) { print_tables($word, $word_file); } return; } sub print_tables { $word = shift; $word_file = file(abs_path(shift)); $doc = $word->{documents}->open("$word_file"); $tables = $word->activedocument->{tables}; $table (in $tables) { $text = $table->converttotext(wdseparatebytabs)->text; $text =~ s/\r/\n/g; print $text, "\n"; } $doc->close(0); return; } sub get_word { $word; eval { $word = win32::ole->getactiveobject('word.application'); 1 } or die "$@\n"; $word , return $word; $word = win32::ole->new('word.application', sub { $_[0]->quit }) or die "oops, cannot start word: ", win32::ole->lasterror, "\n"; return $word; }
Comments
Post a Comment