IT Job Pro

Most Popular Tech Job site – Find Jobs || Post Jobs

Software Engineer I

Windsor Mill, Maryland Valsatech Corp

Duties: Develop and validate solution architecture to support business
requirements. Develop Spark applications to read transaction data and
process business rules to report errors and transaction summary. Load
and transform large sets of structured and semi structured. Read data
from Kafka and process data using Spark. Adopt innovative
architectural approaches to leverage in-house data integration
capabilities. Analyze existing processes and prepare functional and
requirements documents. Configure Spark Streaming to process the
received weekly data via SFTP/Portal to MapR-FS and store the streamed
data to Kafka topic. Develop multiple spark streaming and core jobs
with Kafka as a data pipe-line system. Load the D-Stream data into
Spark RDD and complete in-memory data computation to generate the
Output response. Work on NoSQL databases like HBase/ MapR-DB in
creating tables to load large sets of JSON data using Spark-HBase
connector. Load the HBase data into Redshift cluster using Spark
Structured streaming. Hive external tables on HBase using Insert
Overwrite with S3 as data storage. Troubleshoot developed Spark jobs.
Manage and review Yarn Application Logs, Spark Event Logs and Metrics
sink CSV files. Improve the performance and optimization of the
existing jobs in Spark. Develop solutions by utilizing commercial and
open source software including Minifi and Nifi to interface big data
and relational solutions. Load data from SQL server DB on Azure to
Redshift Data warehouse using Spark structured streaming. Work on S3
life cycle rules management, Redshift inline policy management to
load, unload or coy data from S3. Work on Hive with S3 data store
optimizations. Design and implement solutions for metadata, data
quality, privacy management. Collaborate with subject-matter-experts
to design and enable ad hoc data analysis and a robust data
consumption platform. Support analytics team on data presentation and
reporting that import query or direct data to Power BI using Redshift
connector or ODBC driver using SSL.
Read More
Duties:
Develop and validate solution architecture to support business
requirements. Develop Spark applications to read transaction data and
process business rules to report errors and transaction summary. Load
and transform large sets of structured and semi structured. Read data
from Kafka and process data using Spark. Adopt innovative
architectural approaches to leverage in-house data integration
capabilities. Analyze existing processes and prepare functional and
requirements documents. Configure Spark Streaming to process the
received weekly data via SFTP/Portal to MapR-FS and store the streamed
data to Kafka topic. Develop multiple spark streaming and core jobs
with Kafka as a data pipe-line system. Load the D-Stream data into
Spark RDD and complete in-memory data computation to generate the
Output response. Work on NoSQL databases like HBase/ MapR-DB in
creating tables to load large sets of JSON data using Spark-HBase
connector. Load the HBase data into Redshift cluster using Spark
Structured streaming. Hive external tables on HBase using Insert
Overwrite with S3 as data storage. Troubleshoot developed Spark jobs.
Manage and review Yarn Application Logs, Spark Event Logs and Metrics
sink CSV files. Improve the performance and optimization of the
existing jobs in Spark. Develop solutions by utilizing commercial and
open source software including Minifi and Nifi to interface big data
and relational solutions. Load data from SQL server DB on Azure to
Redshift Data warehouse using Spark structured streaming. Work on S3
life cycle rules management, Redshift inline policy management to
load, unload or coy data from S3. Work on Hive with S3 data store
optimizations. Design and implement solutions for metadata, data
quality, privacy management. Collaborate with subject-matter-experts
to design and enable ad hoc data analysis and a robust data
consumption platform. Support analytics team on data presentation and
reporting that import query or direct data to Power BI using Redshift
connector or ODBC driver using SSL. Design, develop and deploy
repeatable processes to enable end-to-end automation with emphasis on
Continuous Integration and Deployment (CI-CD). Collaborate with
architects and engineers to understand technology solution roadmap,
and technical requirements with focus on business outcomes.
Work Location: Various unanticipated work locations throughout the
United States; relocation may be required. Must be willing to
relocate.
Minimum Requirements:
Education: Bachelor Computer Science or Computer Engineering (will
accept foreign education equivalent)
Experience: Five (5) years
Please share profiles at ********************

To apply for this job please visit itjobpro.com.