Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide.

We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.

Job DescriptionJob Description

About the role

You’ll build the data backbone that powers our keyword→auto-script machine. Your work

ensures reliable Semrush/Search Console ingestion, clean schemas, fast feature access, and

robust scheduling/monitoring—so models and scripts run on time, every time.

What you’ll do

● Build/own connectors: Semrush API, Google Search Console, internal logs; schedule

with Airflow/Prefect.

● Design schemas and tables for raw, curated, and feature layers (warehouse +

Postgres).

● Implement data quality checks (freshness, completeness, duplicates, ontology

mappings) with alerts.

● Stand up and tune vector infrastructure (pgvector/Pinecone) with indexing and

retention policies.

● Expose clean datasets features to ML services (privacy-aware, audit-ready).

● Optimize cost/perf (partitions, clustering, caching, job concurrency) and SLAs for

daily/weekly runs.

● Build simple observability dashboards (job health, latency, data drift signals).

● Partner with ML/NLP on retraining pipelines and with Compliance on audit

logs/versioning.

What you’ve done

● 3+ years as a Data Engineer (ETL/ELT in production).

● Strong Python + SQL; experience with Airflow/Prefect, dbt (nice-to-have).

● Worked with cloud warehouses (BigQuery/Snowflake/Redshift) and Postgres.

● Built resilient API ingestions with pagination, rate limits, retries, and backfills.

● Experience with data testing/validation (Great Expectations, dbt tests, or similar).

● Bonus: vector DB ops, GCP/AWS, event streaming (Kafka/PubSub), healthcare data

hygiene.

How we’ll measure success (first 90 days)

● Reliable daily Semrush/GSC loads with 99% on-time SLA and data quality checks.

● Curated tables powering clustering/intent models with documented lineage.

● Feature/embedding store online with 200ms p95 reads for model services.

Tech you’ll touch

Python, SQL, Airflow/Prefect, Postgres, Warehouse (BigQuery/Snowflake/Redshift), dbt

(optional), Great Expectations, Docker, Terraform (nice-to-have), pgvector/Pinecone.

If you are interested in applying for this job please press the Apply Button and follow the application process. Energy Jobline wishes you the very best of luck in your next career move.

Data Engineer (Pipelines, Quality, Orchestration) in Dallas

Data Engineer (Pipelines, Quality, Orchestration) in Dallas

Share this job now