Remote apache-spark Jobs

This Month

Senior Cloud Data Architect/Engineer
azure bigdata kubernetes postgresql apache-spark cloud Apr 04

Introduction to Shield AI Shield AI’s mission is to protect service members and civilians with artificially intelligent systems.  For our world-class team, no idea is too ambitious, and we never stop working to make possible what looks out of reach today.  We are backed by Silicon Valley venture capital firms including Andreessen Horowitz, have been shipping product since 2018, and are growing rapidly. Job

Description Are you a passionate and innovative Senior Cloud Data Architect/Engineer with real world experience architecting big data pipelines? Are you eager to make a positive difference in the world?  Do you want to work alongside mission-driven and values-focused teammates?  Shield AI is just the place for you! 

As a Senior Cloud Data Architect/Engineer on the Fleet team in the Nova Systems Business Unit,  you’ll have the opportunity to work on data infrastructure at Shield AI and play a critical role in the success of our company! 

What you'll do:

  • You will be responsible for driving the architecture and creation of a scalable cloud data pipeline platform
  • You will design and build scalable infrastructure platforms to collect and process large amounts of structured and unstructured data that will be consumed in real-time
  • You will work on automating data pipelines, creating data models, and monitoring and ensuring performance
  • You will conduct root cause analyses of performance and instability of systems with respect to accuracy and performance
  • You will be responsible for making decisions regarding data storage, technology selection, organization, and solution design in conjunction with software engineering and product management teams
  • You will be tasked with identifying and executing on best practices

Projects that you might work on: 

  • Collection and management of data for training and evaluation models and scaled data analysis for Hivemind

People we're looking for have the following required education and experience:

  • 5+ years of demonstrated technical expertise with the following technologies:
  • Cloud platforms (Azure, GCP, or AWS)
  • Data processing frameworks (Spark/MapReduce, Kafka, etc)
  • Distributed data stores (Hadoop, BigQuery/BigTable, Redshift, S3, etc)
  • Expert programming skills (Python, Go, Kotlin, etc)
  • Containerization technologies (Docker, Kubernetes)
  • Relational and NoSQL databases
  • You have real-world experience architecting big data pipelines.
  • You have demonstrated knowledge of cloud computing technologies and current computing trends.
  • You have hands-on, professional experience designing and implementing large scale data pipelines.


  • You have a demonstrated record of working hard, being a trustworthy teammate, holding yourself and others to high standards, and being kind to others

If you're interested in being part of an engineering team that works hard, loves to have fun, and is working on some truly meaningful, challenging work; apply now and we can chat further! Shield AI is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender identity, or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you have a disability or special need that requires accommodation, please let us know. To conform to U.S. Government regulations, applicant must be a U.S. citizen, lawful permanent resident of the U.S., protected individual as defined by 8 U.S.C. 1324b(a)(3), or eligible to obtain the required authorizations from the U.S. Department of State.

Share this job:

This Year

Senior Software Engineer
python apache-spark postgresql gcp senior cloud Mar 06

We are currently seeking s Senior Software Engineer to join our Data Pipeline Team. Reporting into the Data Engineering Manager, you will work on evolving our data models in several styles of datastores, improve internal tooling to allow data self-service, and operationalize production-grade data pipelines. 

You Will:

    • Scale data pipelines to allow data to go from research to platform as fast as possible 
    • Develop data access mechanisms for downstream applications consumption
    • Manage sources which contain both semi-structured as well as unstructured data
    • Develop and apply suitable frameworks to detect data drift, and then calibrate and redeploy them to production seamlessly
    • Collaborate closely with other engineers to solve interesting and challenging data problems

You Have:

    • 5+ years' experience working as a professional developer
    • Expertise in Python
    • Expertise with SQL
    • Expertise in Spark 2.x, Dataset/DataFrame API and performance tuning
    • Experience with cloud reference architectures and developing specialized stacks on cloud services
    • Experience with Pandas

Nice to haves, but not mandatory qualifications:

    • Background in Life Science
    • Experience with Airflow or other workflow management systems in a distributed setup
    • Experience with graph data modelling and scaling graph databases
    • Experience with Kubernetes in production
    • Experience with technical design and applying architectural patterns

Our benefits and perks:

    • A compensation package that includes equity options in the company
    • An annual Executive Health Assessment at Medcan: All employees get the “executive treatment”
    • Effectiveness coaching for managers: Onsite, personalized coaching from an executive coach with a doctorate in clinical psychology
    • Mental health tools and support: Optional mindfulness sessions and a free Headspace account
    • Complimentary genome sequencing from 23andMe: Find out what your DNA says about your health, traits, and ancestry
    • Three weeks of vacation, plus another week: Get 15 days to use anytime, and we’re closed Dec 25-Jan 1
    • Additional days off: Company summer day, your birthday, and earn +1 vacation day annually
    • Work from anywhere flexibility: Every day right now, and up to 4 days per week once we return to the office
    • An onsite gym: Keep fit, conveniently, with a Peloton and other great equipment
    • A great benefits package: Including health and dental

Here at BenchSci, these are our core values: Focused: We focus on what will drive the greatest impact at all times. Advancement: We believe in continuous growth, and discovering new ways to do things better. This applies to our product and business, but also to ourselves. Speed: We recognize that without a sense of urgency, our team, our product and our mission lose their value. Tenacity: What we’re trying to do isn’t easy, but we hire the best people, and give them the autonomy, tools, and resources to succeed. The hard work is up to them. Transparency: We believe that sharing diverse ideas and information creates strong teams. Our success stems from research, collaboration, feedback, and trust. Diversity, Equity and Inclusion: BenchSci is committed to creating an inclusive environment where people from all backgrounds can thrive. The work and commitment to diversity, equity and inclusion is our collective responsibility. That fundamental belief will guide us along our diversity, equity, and inclusion journey. We are just at the beginning, we will experience moments of discomfort and we may stumble along the way but we are committed to continuously improving and creating equitable and systemic change. Accessibility Accommodations: BenchSci provides accessibility accommodations during the recruitment process. Should you require any accommodation, we will work with you to meet your needs.

Share this job:
Backend Engineer w/ Machine Learning
apache-kafka apache-spark cassandra java machine-learning machine learning Dec 14 2020

Numbrs is reshaping the future of the workplace. We are a fully remote company, at which every employee is free to live and work wherever they want.

Join our dedicated technology team that builds massively scalable systems, designs low latency architecture solutions and leverages machine learning technology to turn financial data into action. Want to push the limit of personal finance management? Join Numbrs.


You will be a part of a small agile team that is responsible for the design and development of our machine learning systems. You'll work on learning-based solutions and develop machine learning applications according to requirements. You enjoy learning new things and are passionate about developing new features, using cutting-edge technology and contributing to overall system design and architecture. You are a great teammate who thrives in a dynamic environment with rapidly changing priorities.

Key Qualifications

  • a Bachelor's or higher degree in technical field of study or equivalent practical experience
  • a minimum of 5 years of professional experience in software development and micro service based architecture
  • previous experience or at least exposure to Machine Learning
  • experience with Big Data technologies such as Kafka, Spark, and Cassandra
  • strong hands-on experience and fluency with Java or Scala
  • experience with software engineering best practices, coding standards, code reviews, testing and operations
  • excellent written and oral communication in English and interpersonal skills

Ideally, candidates will also have

  • experience with CI/CD toolchain products like Jira, Stash, Git, and Jenkins

  • fluent with functional, imperative and object-­oriented languages;

  • experience with C++, or Golang is a plus

Location: Home office from your domicile

Share this job:
Spark Developer at DCTech startup using location data for Impact
apache-spark aws scala amazon-emr amazon-ec2 machine learning Nov 26 2020

X-Mode Social, Inc. is looking for a full-time Back-End Engineer (Spark Developer) to work on X-Mode's data platform and join our rapidly growing team. We are looking for someone who is excited to solve complex location challenges and ready to contribute significantly to new feature development on our AWS multi-petabyte data pipeline.

This position is full-time remote (anywhere in the U.S.). Please note that at this time, X-Mode is not sponsoring visas for any positions.


  • Use big data technologies, processing frameworks, and platforms to solve complex problems related to location
  • Build, improve, and maintain data pipelines that ingest billions of data points on a daily basis
  • Efficiently query data and provide data sets to help Sales and Client Success teams' with any data evaluation requests
  • Ensure high data quality through analysis, testing, and usage of machine learning algorithms


  • 1+ years of Spark and Scala experience
  • Experience working with very large databases and batch processing datasets with hundreds of millions of records
  • Experience with Hadoop ecosystem, e.g. Spark, Hive, or Presto/Athena
  • Experience with SQL based data architectures
  • 1+ years Linux experience
  • 1 years working with cloud services,  ideally in AWS (EMR, Cloudwatch, EC2, Athena, Hive, etc)
  • Self-motivated learner who is willing to self-teach and can maintain a team-centered outlook
  • Self-directed and comfortable working on a remote team 
  • Strong curiosity about new technologies and a desire to always use the best tools for the job
  • Experience at a startup or comfortable working in highly dynamic/ fast-paced environment
  • BONUS: GIS/Geospatial tools/analysis and any past experience with geolocation data


  • Cool people, solving cool problems.
  • Competitive Salary
  • Medical, Dental and Vision 
  • Generous PTO policy & paid holidays
  • We value your input. This is a chance to get in on the "ground floor" of a growing company

At X-Mode, we’re excited about building a diverse team and creating an inclusive environment where everyone can thrive, and we encourage all applicants of any educational background, gender identity and expression, sexual orientation, religion, ethnicity, age, citizenship, socioeconomic status, disability, and veteran status to apply

Share this job:
Senior Full-Stack Software Engineer
python apache-spark sql mongo vue-js senior Oct 14 2020

The perks:

  • Remote working
  • An integrated tech community with diverse backgrounds and talents in medicine, biomedical informatics, quantitative investment, machine learning, and language processing.
  • We are a communal and integrated tech team so we can learn from each other and deliver technology seamlessly that solve multifaceted problems.
  • Freedom to use your resourcefulness, ingenuity to solve meaningful problems without red tape.
  • Blessed with deeply passionate clients and seed investors that are partnering with us in make healthcare data-driven.
  • Top tier medical,  dental, and vision insurance
  • Gym discount, commute perk, 401k

The role:

  • We are adding multiple healthcare clients, and therefore looking for additional talent who are sensitive to client needs and can bring alive a suite of systems to improve care efficiency of CKD patients.
  • Help with data processing, analytics, and quality user experience to support kidney care.
  • Design and build multithreaded, distributed, performant applications and platforms that process complex data in real-time to provide better care to patients.
  • Utilize exciting open source technologies to design and develop secure and scalable products.
  • Build pipelines for data ingestion, processing, storage, and access.
  • Maintain required infrastructure (VM's, containers, cluster management tools, networking) to automate deployment pipeline.

Looking for:

  • Excitement to have real ML systems deployed that really make a difference in people's lives.
  • Commitment  and support with seamless delivery of scalable applications from design to patient care. 
  • 4+ years experience as an engineer, developer, programmer in a production software environment.
  • Dependable, cooperative and responsible person who can collaborate well in a team environment.
  • Independent, proactive, and can enjoy the process of creating order from uncertainty.
  • Experience implementing web service technologies with HTTP, JSON, REST.
  • Experience with databases: Postgres, Mongo, SQL, NoSQL.
  • Experience building responsive, user facing applications using JavaScript, React, Vue and GraphQL.
  • Experience in a cloud environments i.e. AWS, Azure, GCE, Kubernetes.
  • Other value-add skills:  Database architecture, Apache Spark.
  • Excitement  to be a part of our growth story  going from Pre-Series A to Series A.


  • While pulseData is based in New York, the location of this position is flexible. If the candidate is based remotely it will be expected that they travel to New York at least once a month once conditions are safe for business travel to resume.
Share this job:
Paid Research Study for Data Professionals
apache-spark data-warehouse google-bigquery amazon-redshift Jul 27 2020

User Research International is a research company based out of Redmond, Washington. Working with some of the biggest companies in the industry, we aim to improve your experience via paid research studies. Whether it be the latest video game or productivity tools, we value your feedback and experience. We are currently conducting a research study called . We are looking for currently employed Data Professionals who have experience with cloud-based data warehouses. This study is a one-time Remote Study via an online meeting. We’re offering $150 for participation in this study. Session lengths are 90 minutes. These studies provide a platform for our researchers to receive feedback for an existing or upcoming products or software. We have included the survey link for the study below. Taking the survey will help determine if you fit the profile requirements. Completing this survey does not guarantee you will be selected to participate.  If it's a match, we'll reach out with a formal confirmation and any additional details you may need.

I have summarized the study details below. In order to be considered, you must take the survey below. Thank you!

Study: Cloud-Based Data Study

Gratuity: $150

Session Length: 90 mins

Location: Remote

Dates: Available dates are located within the survey

Survey: Cloud-Based Data Study Sign-Up

Share this job: