#!/usr/bin/perl

=pod

=head1	NAME

B<Pipeline_Source>

Inserts pathname(s) into a Conductor pipeline Sources table.

=head1	SYNOPSIS

Pipeline_Source [E<lt>B<catalog>E<gt>.]E<lt>B<pipeline>E<gt>[_Source]
E<lt>B<pathname>E<gt> | @E<lt>B<pathname list file>E<gt> [...]
[-Pipeline [E<lt>B<catalog>E<gt>.]E<lt>B<pipeline>E<gt>]
[-Id E<lt>B<source ID>E<gt> | @E<lt>B<source ID list file>E<gt>]
[-Log E<lt>B<log directory>E<gt> | @E<lt>B<log directory list file>E<gt>]
[-COnfiguration E<lt>B<filename>E<gt>]
[-Server E<lt>B<server>E<gt>]
[-Catalog E<lt>B<catalog>E<gt>[.E<lt>B<pipeline>E<gt>]]
[-Always | -Unique[_or_Processed]]
[-Verbose | -Quiet]
[-Help]

=head1	DESCRIPTION

For each data source pathname specified on the command line a
pipeline source record is interested into the Source_Pathname field of
the E<lt>B<pipeline>E<gt>_Sources
table in the E<lt>B<catalog>E<gt> of the database E<lt>B<server>E<gt>.
It is an error to specify a pathname for a file that does not exist.
If specified a pathname will be dropped if a source record in the
pipeline table has the same pathname, or has the same pathname but
has not yet been acquired for processing. A Source_ID and/or Log_Pathname
field value may also be specified for each or all the source records.
A configuration file is required to proved database access information.

=head1	OPTIONS AND ARGUMENTS

A list of one or more pathnames to be inserted into the Conductor
pipeline Sources table Source_Pathname field must be specified. A file
containing a list of pathnames may be specified, following an '@'
character, instead of, or in addition to, listing individual pathnames on
the command line. Relative pathnames are made absolute before insertion
by prepending the current working directory. B<WARNING>: The current
working directory is the real, canonical, directory pathname; this will
not be the same as a logical pathname accessed via soft links. Each
pathname is confirmed as referring to an existing regular file. If any
pathname refers to a nonexistant file no pipeline source records will
be inserted.

=over

=item	-Pipeline [E<lt>B<catalog>E<gt>.]E<lt>B<pipeline>E<gt>

The name of the Conductor pipeline to receive Source_Pathname entries. The
pipeline name may be qualified by the E<lt>B<catalog>E<gt> name,
preceeding a '.' delimiter. Default: The command name, or that part of
the command name preceding a "_Source" suffix. B<Note>: If the command
name provides the pipeline name, it may also use the
E<lt>B<catalog>E<gt>.E<lt>B<pipeline>E<gt> notation.

=item	-Id E<lt>B<source ID>E<gt> | @E<lt>B<source ID list file>E<gt>

If this option is specified the Source_ID field of the source record
will be set to the E<lt>source IDE<gt> value. Typically, this option,
when used, is specified for each E<lt>pathnameE<gt> specified; each
E<lt>source IDE<gt> will be paired with a E<lt>pathnameE<gt> according
to the order in which they occur on the command line. If only one
E<lt>source IDE<gt> is specified for multiple E<lt>pathnamesE<gt>, then
it is applied to all; any other mismatch will cause a syntax error. A
file containing a list of source IDs may be specified, following an '@'
character, instead of listing individual source IDs on the command line.
If no E<lt>source IDE<gt> is specified none will be applied.

=item	-Log E<lt>B<log directory>E<gt> | @E<lt>B<log directory list file>E<gt>

If this option is specified the Log_Pathname field of the source record
will be set to the E<lt>log directoryE<gt> value. Typically, this
option, when used, is specified for each E<lt>pathnameE<gt> specified;
each E<lt>log directoryE<gt> will be paired with a E<lt>pathnameE<gt>
according to the order in which they occur on the command line. If only
one E<lt>log directoryE<gt> is specified for multiple
E<lt>pathnamesE<gt>, then it is applied to all; any other mismatch will
cause a syntax error. An attempt is made to create any log directory
that does not exist. A file containing a list of log directories may be
specified, following an '@' character, instead of listing individual log
directories on the command line. If no E<lt>log directoryE<gt> is
specified none will be applied.

=item	-COnfiguration E<lt>B<filename>E<gt>

The value of an environment variable named
E<lt>B<pipeline>E<gt>_CONFIG_DIR will be used, if present, as the name
of the directory where the configuration file is to be found if the
specified configuration filename is not an absolute pathname; otherwise
the current working directory will be tried. Default:
E<lt>B<pipeline>E<gt>.conf.

=item	-Server E<lt>B<server>E<gt>

The configuration file may contain connection information for more than
one database. The information for each database is organized by Server
name. A Server name corresponds to a parameter group in the
confinguration file, with the same name, that contains access information
for a database server. Default: The first name in the Server list
specified in the configuration.

=item	-Catalog E<lt>B<catalog>E<gt>[.E<lt>B<pipeline>E<gt>]

The name of the database catalog that contains the Conductor pipeline
tables. The catalog name may be qualified by the E<lt>B<pipeline>E<gt>
name, following a '.' delimiter. Default: The E<lt>B<catalog>E<gt> name
in the command name, if present.

=item	-Always | -Unique[_or_Processed]

This option determines acceptable conditions for inserting a source
record for a pathname. When -Always is specified all pathnames are always
acceptable. When Unique is specified the canonical pathname must be
unique for all source records in the table; i.e. the canonical pathname
must not be present in the Source_Pathname field of any existing source
record. This precludes inserting a duplicate source pathname in the
table. When Unique_or_Processed is specified the canonical pathname may
already be present but only if the source record has been acquired for
processing (its Conductor_ID field is non-NULL). Unacceptable sources are
simply dropped along with any corresponding source ID and log directory
pathname, if any. Default: Always.

Note: While it is acceptable to register the same pathname more than once
in a pipeline sources table, some pipelines may fail if the same source
is processed more than once. In various circumstances it is acceptable to
register a duplicate pathname only if all source records with the same
pathname have already been processed. This option allows the conditions
when it is acceptable to source a pathname to a pipeline to be specified.

=item	-Verbose | -Quiet

Verbose operation provides details about what is being done.
Quiet operation minimizes messages. Default: Quiet.

=item	-Help

The man page for this procedure is listed.

=back

=head1	Exit Status

=over

=item	E<48> - Success

The database table insert(s) completed successfully.

=item	E<49> - Bad command line syntax

A command line syntax usage message will be provided.

=item	E<50> - No such file

A pathname to a non-existant file, or a file that is not a regular file
(e.g. a directory pathname may not be used), was specified. This applies
to both source and configuration file pathnames.

=item	E<51> - No log directory

A specified log directory does not exist and could not be created.

=item	E<52> - Database insert(s) failed

The database table insert(s) did not complete successfully. A report
from the Update_DB procedure will be provided.

=item	E<53> - Missing arguments file

An arguments list file, indicated with a leading '@' character on the
command line, could not be found.

=item	E<54> - Database query failed

A database query checking for an existing source pathname failed. A report
from the Query_DB procedure will be provided.

=item	E<55> - No unique source(s)

All the source pathnames were found to be unacceptable because they meet
the unique, and possibly unprocessed, conditions in the pipeline Sources
table.

=back

=head1	Author

Bradford Castalia, UA/PIRL

=head1	Copyright

Copyright (C) 2004-2012  Arizona Board of Regents on behalf of the
Planetary Image Research Laboratory, Lunar and Planetary Laboratory at
the University of Arizona.

This file is part of the PIRL Java Packages.

The PIRL Java Packages are free software; you can redistribute them
and/or modify them under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation, either version 3 of
the License, or (at your option) any later version.

The PIRL Java Packages are distributed in the hope that they will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser
General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

=head1	Version

1.34 2012/04/16 06:04:10

=cut

#	CVS ID: Pipeline_Source,v 1.34 2012/04/16 06:04:10 castalia Exp
#===============================================================================
$CVS_ID = 'Pipeline_Source (1.34 2012/04/16 06:04:10)';

($Command_Name = $0) =~ s|.*/(\S+)$|$1|;
print "$Command_Name - $CVS_ID\n";

#	Defaults:

#	Assume that the command name is the pipeline name.
$DEFAULT_PIPELINE	= $Command_Name;
$DEFAULT_PIPELINE	=~ s/(.+)_Source$/$1/
	if ($DEFAULT_PIPELINE =~ /.+_Source$/);

$DEFAULT_CATALOG	= "Proc_Test";

$CONFIG_DIR_ENV		= "_CONFIG_DIR";

@UPDATE_DB			= ("Update_DB");
@QUERY_DB			= ("Query_DB");
$PATHNAME_FIELD     = "Source_Pathname";
$SOURCE_ID_FIELD	= "Source_ID";
$LOG_PATHNAME_FIELD	= "Log_Pathname";
$CONDUCTOR_ID_FIELD	= "Conductor_ID";

#	Exit status values:
$SUCCESS			= 0;
$BAD_SYNTAX			= 1;
$NO_SUCH_FILE		= 2;
$NO_LOG_DIRECTORY	= 3;
$UPDATE_FAILED		= 4;
$NO_ARGUMENTS_FILE	= 5;
$QUERY_FAILED		= 6;
$NO_UNIQUE_SOURCES	= 7;

#-------------------------------------------------------------------------------
#	Command line arguments:

use Pod::Usage;

pod2usage
	(
	-verbose => 0,
	-exitval => $BAD_SYNTAX
	)
	unless @ARGV;

while ($option = shift @ARGV)
	{
	#	N.B.: Must come before -Catalog option.
	if ($option =~ /^-[Cc][Oo]/)
		{
		#	Configuration file.
		pod2usage
			(
			-message => "$Command_Name: Missing configuration filename.\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			unless (@ARGV && $ARGV[0] !~ /^-/);
		$Config_Pathname = shift @ARGV;
		next;
		}
	if ($option =~ /^-[Ss]/)
		{
		# Server
		pod2usage
			(
			-message => "$Command_Name: Missing database server name.\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			unless (@ARGV && $ARGV[0] !~ /^-/);
		$server = shift @ARGV;
		pod2usage
			(
			-message => "Database server already specified -\n".
				"$Database_Server\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			if $Database_Server && ($Database_Server ne $server);
		$Database_Server = $server;
		next;
		}
	if ($option =~ /^-[Cc]/ ||
		$option =~ /^-[Dd]/)
		{
		#	Catalog (or Database).
		pod2usage
			(
			-message => "$Command_Name: Missing database catalog name.\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			unless (@ARGV && $ARGV[0] !~ /^-/);
		$catalog = catalog_name ($option = shift @ARGV);
		$catalog = $option
			unless $catalog;
		Multiple_Catalogs:
		pod2usage
			(
			-message => "$Command_Name: Multiple catalogs specified - ".
				"$Catalog and $catalog\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			if $Catalog && ($Catalog ne $catalog);
		$Catalog = $catalog;

		#	Check for pipeline name.
		$pipeline = pipeline_name ($option);
		goto Multiple_Pipelines
			if $pipeline && $Pipeline && ($pipeline ne $Pipeline);
		$Pipeline = $pipeline
			if $pipeline;
		next;
		}
	if ($option =~ /^-[Pp]/)
		{
		#	Pipeline name.
		pod2usage
			(
			-message => "$Command_Name: Missing pipeline name.\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			unless (@ARGV && $ARGV[0] !~ /^-/);
		$pipeline = pipeline_name ($option = shift @ARGV);
		$pipeline = $option
			unless $pipeline;
		Multiple_Pipelines:
		pod2usage
			(
			-message => "$Command_Name: Multiple pipelines specified - ".
				"$Pipeline and $pipeline\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			if $Pipeline && ($Pipeline ne $pipeline);
		$Pipeline = $pipeline;

		#	Check for catalog name.
		$catalog = catalog_name ($option);
		goto Multiple_Catalogs
			if $catalog && $Catalog && ($catalog ne $Catalog);
		$Catalog = $catalog
			if $catalog;
		next;
		}
	if ($option =~ /^-[Ii]/)
		{
		#	Source ID.
		pod2usage
			(
			-message => "$Command_Name: Missing source ID.\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			unless (@ARGV && $ARGV[0] !~ /^-/);
		$arguments = shift @ARGV;
		if (substr ($arguments, 0, 1) eq '@')
			{
			if (! open (ARGUMENTS, '<', substr ($arguments, 1)))
				{
				print STDERR
					"$Command_Name: Unable to open the source ID list file - ",
						substr ($arguments, 1), "\n";
				exit ($NO_ARGUMENTS_FILE);
				}
			while (<ARGUMENTS>)
				{
				push @IDs, split;
				}
			close ARGUMENTS;
			}
		else
			{
			push @IDs, $arguments;
			}
		next;
		}
	if ($option =~ /^-[Ll]/)
		{
		#	Log Pathname.
		pod2usage
			(
			-message => "$Command_Name: Missing log file directory.\n",
			-verbose => 0,
			-exitval => $BAD_SYNTAX
			)
			unless (@ARGV && $ARGV[0] !~ /^-/);
		$arguments = shift @ARGV;
		if (substr ($arguments, 0, 1) eq '@')
			{
			if (! open (ARGUMENTS, '<', substr ($arguments, 1)))
				{
				print STDERR
					"$Command_Name: Unable to open the log directories list file - ",
						substr ($arguments, 1), "\n";
				exit ($NO_ARGUMENTS_FILE);
				}
			while (<ARGUMENTS>)
				{
				push @Log_pathnames, split;
				}
			close ARGUMENTS;
			}
		else
			{
			push @Log_pathnames, $arguments;
			}
		next;
		}
	if ($option =~ /^-[Aa]/)
		{
		#	Always.
		$Unique = 0;
		$Processed = 0;
		next;
		}
	if ($option =~ /^-[Uu]/)
		{
		#	Unique.
		$Unique = 1;
		$Processed = 1
			if ($option =~ /.*[Pp]/);
		next;
		}

	if ($option =~ /^-[Vv]/)
		{
		#	Verbose.
		$Verbose = 1;
		next;
		}
	if ($option =~ /^-[Qq]/)
		{
		#	Quiet.
		$Verbose = 0;
		next;
		}

	#	Help.
	pod2usage
		(
		-verbose => 2,
		-exitval => $SUCCESS
		)
		if ($option =~ /^-[Hh]/);

	pod2usage
		(
		-message => "$Command_Name: Unknown option \"$option\"\n",
		-verbose => 0,
		-exitval => $BAD_SYNTAX
		)
		if ($option =~ /^-/);

	#	Source pathname.
	if (substr ($option, 0, 1) eq '@')
		{
		if (! open (ARGUMENTS, '<', substr ($option, 1)))
			{
			print STDERR
				"$Command_Name: Unable to open the source pathnames list file - ",
					substr ($option, 1), "\n";
			exit ($NO_ARGUMENTS_FILE);
			}
		while (<ARGUMENTS>)
			{
			push @Filenames, split;
			}
		close ARGUMENTS;
		}
	else
		{
		push @Filenames, $option;
		}
	}

#-------------------------------------------------------------------------------
#	Post-command initialization.

if ($Verbose)
	{
	push @QUERY_DB,  '-verbose';
	push @UPDATE_DB, '-verbose';
	}

#	Pipeline name.
if (! $Pipeline)
	{
	$pipeline = pipeline_name ($DEFAULT_PIPELINE);
	$pipeline = $DEFAULT_PIPELINE
		unless $pipeline;
	$Pipeline = $pipeline;

	#	Check for catalog name.
	$catalog = catalog_name ($DEFAULT_PIPELINE);
	goto Multiple_Catalogs
		if $catalog && $Catalog && ($catalog ne $Catalog);
	$Catalog = $catalog
		if $catalog;
	}

#	Database catalog name.
if (! $Catalog)
	{
	#	Try to find a catalog name in the default pipeline name.
	$Catalog = catalog_name ($DEFAULT_PIPELINE);
	pod2usage
		(
		-message => "$Command_Name: A database catalog name must be specified.\n",
		-verbose => 0,
		-exitval => $BAD_SYNTAX
		)
		unless $Catalog;
	}

#	Configuration file.
$Config_Pathname = $Pipeline.".conf"
	unless $Config_Pathname;

use File::Spec;
if (! File::Spec->file_name_is_absolute ($Config_Pathname))
	{
	#	Relative pathname.
	#	Check for a configuration directory environment variable.
	$Configuration_Dir = $ENV{$Pipeline . $CONFIG_DIR_ENV};
	$Config_Pathname = "$Configuration_Dir/$Config_Pathname"
		if $Configuration_Dir;
	#	Ensure absolute pathname.
	$Config_Pathname = File::Spec->rel2abs ($Config_Pathname);
	}
push @QUERY_DB,  '-config', $Config_Pathname;
push @UPDATE_DB, '-config', $Config_Pathname;


#	Server selection.
if ($Database_Server)
	{
	push @QUERY_DB,  '-server', $Database_Server;
	push @UPDATE_DB, '-server', $Database_Server;
	}


#	Check the filenames.
pod2usage
	(
	-message => "$Command_Name: At least one source pathname must be specified.\n",
	-verbose => 0,
	-exitval => $BAD_SYNTAX
	)
	unless @Filenames;

pod2usage
	(
	-message => "$Command_Name: The number of source pathnames -\n" .
		scalar @Filenames . "\n" .
		"- and source IDs -\n" .
		scalar @IDs . "\n" .
		"- do not match.\n",
	-verbose => 0,
	-exitval => $BAD_SYNTAX
	)
	if (@IDs &&
		@IDs != 1 &&
		@IDs != @Filenames);
$ID = shift @IDs
	if @IDs == 1;

pod2usage
	(
	-message => "$Command_Name: The number of source pathnames -\n" .
		scalar @Filenames . "\n" .
		"- and log pathnames -\n" .
		scalar @Log_pathnames . "\n" .
		"- do not match.\n",
	-verbose => 0,
	-exitval => $BAD_SYNTAX
	)
	if (@Log_pathnames &&
		@Log_pathnames != 1 &&
		@Log_pathnames != @Filenames);
$Log_pathname = shift @Log_pathnames
	if @Log_pathnames == 1;


foreach $filename (@Filenames)
	{
	#	Ensure absolute (canonical) pathname.
	$filename = File::Spec->rel2abs ($filename);
	if (! -f $filename)
		{
		print STDERR
			"$Command_Name: No such file - $filename\n";
		exit ($NO_SUCH_FILE);
		}

	#	Check for acceptable source pathname.
	if (Accept_Source ($filename))
		{
		push @Pathnames, $filename;

		if ($ID)     {push @Source_IDs, $ID;}
		elsif (@IDs) {push @Source_IDs, shift @IDs;}

		if ($Log_pathname)     {push @Logs, $Log_pathname;}
		elsif (@Log_pathnames) {push @Logs, shift @Log_pathnames;}
		}
	else
		{
		print "Not unique";
		print " or processed"
			if $Processed;
		print ": $filename\n";
		}
	}

if (! @Pathnames)
	{
	print "No unique";
	print " or processed"
		if $Processed;
	print " sources.\n";
	exit ($NO_UNIQUE_SOURCES);
	}

foreach $pathname (@Logs)
	{
	#	Ensure absolute pathname.
	$pathname = File::Spec->rel2abs ($pathname);
	exit ($NO_LOG_DIRECTORY)
		if (! Make_Path ($pathname));
	push @Log_directories, $pathname;
	}

#-------------------------------------------------------------------------------
#	Database update operation.

push @UPDATE_DB, '@-';

print "\n@UPDATE_DB\n"
	if $Verbose;

if (! open UPDATE, "| @UPDATE_DB")
	{
	print STDERR
		"$Command_Name: Unable to open a pipe to Update_DB\n";
	print STDERR
		"@UPDATE_DB\n"
		unless $Verbose;
	exit ($UPDATE_FAILED);
	}

foreach $Pathname (@Pathnames)
	{
	#	Send the update set arguments.
	print UPDATE
		'-Table ', $Catalog.'.'.$Pipeline."_Sources ",
		"$PATHNAME_FIELD=$Pathname\n";
	print UPDATE "$SOURCE_ID_FIELD=".shift (@Source_IDs)."\n"
		if @Source_IDs;
	print UPDATE
		"$LOG_PATHNAME_FIELD=".shift (@Log_directories)."\n"
		if @Log_directories;
	}

close UPDATE;
$exit_status = $?;

$total_records = scalar @Pathnames;
$total_records = 250	#	Maximum number of updates that can be reported.
	if ($total_records > 250);
if ($exit_status == -1)
	{
	print STDERR
		"$Command_Name: The command could not be executed!\n";
	print STDERR "\n@UPDATE_DB\n"
		unless $Verbose;
	exit ($UPDATE_FAILED);
	}
elsif (($exit_status >>= 8) != $total_records)
	{
	print STDERR
		"$Command_Name: Update failed - exit status $exit_status.\n";
	print STDERR "\n@UPDATE_DB\n"
		unless $Verbose;
	exit ($UPDATE_FAILED);
	}

exit ($SUCCESS);

#-------------------------------------------------------------------------------

sub Accept_Source
{
my ($source) = @_;

return 1
	if (! $Unique);

$Query_Sources =
	"\"" .
	"SELECT $PATHNAME_FIELD " .
	"FROM $Catalog.$Pipeline"."_Sources " .
	"WHERE $PATHNAME_FIELD='$source'";
$Query_Sources .= " AND $CONDUCTOR_ID_FIELD is NULL"
	if $Processed;
$Query_Sources .= "\"";

my $Query = "@QUERY_DB -Query $Query_Sources";
@result = qx/$Query/;

if ($exit_status == -1)
	{
	print STDERR
		"$Command_Name: The command could not be executed!\n";
	print STDERR "\n$Query\n"
		unless $Verbose;
	exit ($QUERY_FAILED);
	}
elsif (($exit_status >>= 8) != 0)
	{
	print STDERR
		"$Command_Name: Query failed - exit status $exit_status.\n";
	print STDERR "\n$Query\n"
		unless $Verbose;
	exit ($QUERY_FAILED);
	}

foreach $line (@result)
	{
	#	Look for a data record.
	return 0
		if ($line =~ /^\d+ - /);
	}
#	No data record found.
return 1;
}

#-------------------------------------------------------------------------------

sub catalog_name
{
my ($name) = @_;
return (($index = index ($name, '.')) > 0) ?
	substr ($name, 0, $index) : undef;
}


sub pipeline_name
{
my ($name) = @_;
return (($index = index ($name, '.')) >= 0 && $index < (length ($name) - 1)) ?
	substr ($name, $index + 1) : undef;
}

#-------------------------------------------------------------------------------
#	Ensure that all directories of a path exist or are created.
#
sub Make_Path {

my ($path) = @_;
return 1
	if (-d $path);
my @segments = split (/\//, $path);
my $path_to, $delimiter;
$delimiter = '/'
	if (index ($path, '/') == 0);
for my $segment (@segments)
	{
	next
		unless $segment;
	$path_to .= $delimiter . $segment;
	#	Only the first segment may be without a delimiter.
	$delimiter = '/';

	next
		if (-d $path_to);	#	The directory pathname segment already exists.
	if (-f $path_to)
		{
		print STDERR
			"A directory is needed where the $path_to file exists.\n";
		return 0;
		}

	#	Try to make the directory pathname segment.
	if (! mkdir $path_to)
		{
		#	Making the directory failed.
		$retries = 5;
		while ($retries--)
			{
			#	Wait up to 5 seconds and try again.
			sleep (int (rand (4)) + 1);
			last
				if (-d $path_to ||
					mkdir $path_to);
			}
		if (! -d $path_to)
			{
			print STDERR
				"Unable to make the $path_to directory.\n",
				"$ERRNO\n";
			return 0;
			}
		}
	}
return 1;
}
