Passing command line arguments to Spark-shell -


i have spark job written in scala. use

spark-shell -i <file-name> 

to run job. need pass command-line argument job. right now, invoke script through linux task,

export input_date=2015/04/27  

and use environment variable option access value using:

system.getenv("input_date") 

is there better way handle command line arguments in spark-shell?

short answer:

spark-shell -i <(echo val thedate = $input_date ; cat <file-name>)

long answer:

this solution causes following line added @ beginning of file before passed spark-submit:

val thedate = ...,

thereby defining new variable. way done (the <( ... ) syntax) called process substitution. available in bash. see this question more on this, , alternatives (e.g. mkfifo) non-bash environments.

making more systematic:

put code below in script (e.g. spark-script.sh), , can use:

./spark-script.sh your_file.scala first_arg second_arg third_arg, , have array[string] called args arguments.

the file spark-script.sh:

scala_file=$1  shift 1  arguments=$@  #set +o posix  # enable process substitution when not running on bash   spark-shell  --master yarn --deploy-mode client \          --queue default \         --driver-memory 2g --executor-memory 4g \         --num-executors 10 \         -i <(echo 'val args = "'$arguments'".split("\\s+")' ; cat $scala_file) 

Comments

Popular posts from this blog

java - Spring Data JPA: Why findOne(id) executing delete query internally? -

python - Mongodb How to add addtional information when aggregating? -

java - Incorrect order of records in M-M relationship in hibernate -