Passing command line arguments to Spark-shell -
i have spark job written in scala. use
spark-shell -i <file-name> to run job. need pass command-line argument job. right now, invoke script through linux task,
export input_date=2015/04/27 and use environment variable option access value using:
system.getenv("input_date") is there better way handle command line arguments in spark-shell?
short answer:
spark-shell -i <(echo val thedate = $input_date ; cat <file-name>)
long answer:
this solution causes following line added @ beginning of file before passed spark-submit:
val thedate = ...,
thereby defining new variable. way done (the <( ... ) syntax) called process substitution. available in bash. see this question more on this, , alternatives (e.g. mkfifo) non-bash environments.
making more systematic:
put code below in script (e.g. spark-script.sh), , can use:
./spark-script.sh your_file.scala first_arg second_arg third_arg, , have array[string] called args arguments.
the file spark-script.sh:
scala_file=$1 shift 1 arguments=$@ #set +o posix # enable process substitution when not running on bash spark-shell --master yarn --deploy-mode client \ --queue default \ --driver-memory 2g --executor-memory 4g \ --num-executors 10 \ -i <(echo 'val args = "'$arguments'".split("\\s+")' ; cat $scala_file)
Comments
Post a Comment