Data Engineer
About Us
We, at AUI are excited to introduce you to Apollo. Apollo is a breakthrough language model, built with a neuro-symbolic architecture to make conversational agents possible. Apollo enables the native tool use and controllability transformer-based agents lack. Apollo unlocks fine-tuning for agents, allowing continuous evolution from human feedback and ever-improving performance for conversational agents of any kind.
The Role: Data Engineer
We're seeking a seasoned Data Engineer to build the data infrastructure that fuels our groundbreaking intelligent agent. You'll play a crucial role in developing large-scale data-intensive systems that power Apollo's capabilities.
What You'll Do:
- Design and implement massive parallel processing solutions for both real-time and batch scenarios
- Develop real-time stream processing solutions using technologies like Apache Kafka or Amazon Kinesis
- Build infrastructures that bring machine learning capabilities to production
- Orchestrate containerized applications in cloud environments (AWS and GCP)
- Write production-grade Python code and work with various database systems
- Administer and design cloud-based data warehousing solutions
- Work with unstructured data, complex data sets, and perform data modeling
- Collaborate with cross-functional teams to integrate data solutions into our AI systems
Who We're Looking For:
- A seasoned Data Engineer with deep understanding of data modeling and massive parallel processing
- Someone experienced in bringing Machine Learning capabilities into large-scale production systems
- An individual with experience at a cutting-edge startup
- A passionate builder of data infrastructures for advanced AI systems
- A team player with excellent collaboration and communication skills
- Someone with a "can-do" approach to problem-solving
Requirements:
- 3+ years of experience building massive parallel processing solutions (e.g., Spark, Presto)
- 2+ years of experience developing real-time stream processing solutions (e.g., Apache Kafka, Amazon Kinesis)
- 2+ years of experience developing ML infrastructures for production (e.g., Kubeflow, Sagemaker, Vertex)
- Experience orchestrating containerized applications in AWS and GCP using EKS and GKE
- 3+ years of experience writing production-grade Python code
- Experience working with both relational and non-relational databases
- 2+ years of experience administering and designing cloud-based data warehousing solutions (e.g., Snowflake, Amazon Redshift)
- 2+ years of experience working with unstructured data, complex data sets, and data modeling
If you're excited about building the data backbone for the next generation of AI and want to be at the forefront of intelligent agent technology, we want to hear from you!