Data Engineer (Pipelines, Quality, Orchestration) in Dallas
Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide.
We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.
Job DescriptionJob Description
About the role
You’ll build the data backbone that powers our keyword→auto-script machine. Your work
ensures reliable Semrush/Search Console ingestion, clean schemas, fast feature access, and
robust scheduling/monitoring—so models and scripts run on time, every time.
What you’ll do
● Build/own connectors: Semrush API, Google Search Console, internal logs; schedule
with Airflow/Prefect.
● Design schemas and tables for raw, curated, and feature layers (warehouse +
Postgres).
● Implement data quality checks (freshness, completeness, duplicates, ontology
mappings) with alerts.
● Stand up and tune vector infrastructure (pgvector/Pinecone) with indexing and
retention policies.
● Expose clean datasets features to ML services (privacy-aware, audit-ready).
● Optimize cost/perf (partitions, clustering, caching, job concurrency) and SLAs for
daily/weekly runs.
● Build simple observability dashboards (job health, latency, data drift signals).
● Partner with ML/NLP on retraining pipelines and with Compliance on audit
logs/versioning.
What you’ve done
● 3+ years as a Data Engineer (ETL/ELT in production).
● Strong Python + SQL; experience with Airflow/Prefect, dbt (nice-to-have).
● Worked with cloud warehouses (BigQuery/Snowflake/Redshift) and Postgres.
● Built resilient API ingestions with pagination, rate limits, retries, and backfills.
● Experience with data testing/validation (Great Expectations, dbt tests, or similar).
● Bonus: vector DB ops, GCP/AWS, event streaming (Kafka/PubSub), healthcare data
hygiene.
How we’ll measure success (first 90 days)
● Reliable daily Semrush/GSC loads with 99% on-time SLA and data quality checks.
● Curated tables powering clustering/intent models with documented lineage.
● Feature/embedding store online with 200ms p95 reads for model services.
Tech you’ll touch
Python, SQL, Airflow/Prefect, Postgres, Warehouse (BigQuery/Snowflake/Redshift), dbt
(optional), Great Expectations, Docker, Terraform (nice-to-have), pgvector/Pinecone.
If you are interested in applying for this job please press the Apply Button and follow the application process. Energy Jobline wishes you the very best of luck in your next career move.