Web Scraping for Indeed.com & Predicting Salaries

Problem Statement:

Can salaries for Data Scientist job listings be predicted as high or low, in comparison to the median salary posted, using various features of the posting? What will best predict whether it's higher or lower than the median?

Step I. Check out the data

In [1]:
import warnings
warnings.filterwarnings('ignore')

%reload_ext autotime
In [2]:
import requests
from bs4 import BeautifulSoup
import datetime
import time
import re
import numpy as np

# I manully ran the search for data scienctist salaries and found the URL format
# The search is for title of data scientist and salary above $20,000

url = "http://www.indeed.com/jobs?q=data+scientist+%2420%2C000"

## x will indicate the number of the first listing on a particular page 
## each page lists 10 posts (and 5 sponsored posts)
page = requests.get(url).content
soup = BeautifulSoup(page,'lxml')
time: 1.13 s

Notes:

  • The HTML can be narrowed down to the contents within <td id="resultsCol"> ... </td>. This makes it easier to find what we're looking for by eliminating the script section and everything on the side panes.
In [4]:
print soup.find('td',{'id':'resultsCol'})
<td id="resultsCol">
<div class="messageContainer">
<script type="text/javascript">
function setJaPromoCookie() {
var expires = new Date();
expires.setTime(expires.getTime() + (5 * 365 * 24 * 60 * 60 * 1000));
setCookie("showJaPromo", "1", expires);
}
function setRefineByCookie(refineByTypes) {
var expires = new Date();
expires.setTime(expires.getTime() + (10 * 1000));
for (var i = 0; i < refineByTypes.length; i++) {
setCookie(refineByTypes[i], "1", expires);
}
}
</script>
</div>
<style type="text/css">
#increased_radius_result {
font-size: 16px;
font-style: italic;
}
#original_radius_result{
font-size: 13px;
font-style: italic;
color: #666666;
}
</style>
<div class="resultsTop"><div id="searchCount">Jobs 1 to 10 of 21,491</div>
<div data-tn-section="primePromo" id="primePromo">
<span class="new">New!</span> <a href="/promo/prime" onclick="this.href = appendParamsOnce( this.href, '?from=serptop&amp;subfrom=primeprmtop&amp;trk.origin=jobsearch&amp;trk.variant=primeprmtop&amp;trk.tk=1bhdgud8k18jh5sa&amp;vertical=TECH&amp;x_isid=serptop&amp;x_ikw=data+scientist+%2420%2C000&amp;x_sid=serptop&amp;x_kw=data+scientist+%2420%2C000')">Join Indeed Prime</a> - Get offers from great tech companies</div></div>
<script type="text/javascript">
window['sjl'] = "WubEucQIlF";
</script>
<style type="text/css">
.WubEucQIlF { margin: 0 0 6px 0; padding: 0; _zoom:100%; border: 0; background-color: #fff; }
.WubEucQIlF .jobtitle { white-space: nowrap; float:@LINE_START@; _float: none; }
.WubEucQIlF .sdn { color: #CD29C0; }
.GJUbY6D4wuk .brdr { margin-top: 12px; }
.hkvp2VD .brdr { margin-bottom: 12px; }
@media only screen and (min-height:780px) {
.GJUbY6D4wuk { margin-bottom: 9px; }
.hkvp2VD .brdr,
.hZS0Wc2H,
.GJUbY6D4wuk .brdr { margin-bottom: 9px; margin-top: 9px; }
}
</style>
<style type="text/css">
.result-tab:empty {margin-top: 0;}
.GJUbY6D4wuk {
margin-bottom: 0;
}
@media only screen and (min-height:780px) {
.GJUbY6D4wuk {
margin-bottom: 0;
}
}
</style>
<div></div>
<a id="jobPostingsAnchor" tabindex="-1"></a>
<div class="WubEucQIlF GJUbY6D4wuk">
<div class="row result" data-jk="bfc578d23604c2fc" id="pj_bfc578d23604c2fc">
<!-- Previously this variable was used to indicate job board jobs, we have replaced that with a more accurate source type check -->
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0D_L61JJZVH4SBayrvFEFSIDhxtpSFhtUBfRgL_yS-y4KQnwxgyzWhCsPoBnwyjC7i1224RkXyNjCKyMmAnielysSbfAScZoI_OgdN6cH2LUiHe9CeKg9jliNU9_-djYbNyJ2RAQGYO0xiVwMIc48Gv2SU4pEDHN7m15eB7x73iTEBPM8wde0m5wE7qft5c1dgVbTtYuirrJ2MXi9rtgLqz_3z08os3klbEk0GHijoK6eLDwntLPeFfr_xOkMIszVbY7ND_um_nHwqIVA9lBSsA7fBd3NYR17wN-15QAauVNP4tWBdZdpy033Nv_vgEHgH6O9Hz3KRwdtj2ZeLqnT1meQmHWf-Gnbw41IN0pJwRzm0hokMz9Q-UuyORdDuPt3-5X7v9N3IHUaeM7qKkWprW1kxiCaoU9Pefy2_xxl35xabC-xqCbXAyXSf2xvRndf9KyuYie6AYtnBm6V22pbM65uMS7kdZM0yFQhHpIG2GnKKG4vBrBxpn3yUYO7M8kX8kivKGnkynA9NIY8D1q_-hoTWIEn7l245CVw6AmDCRFdFTaqg8vqFCxXlxKvyk3UbduU1EpGT66HM8X5TebNvOoZUzvecR4zbrctZR8KUXa_3_xuhhS692adF_k0TENlH3kKF9ciTIzZh5WXG3cVszyFJf-wDh3E5ZxqeivjbXE-DXY-1Re1Wz_kWb1BfRrMdfigQuItMUd3ZOj9v0yLbtWhJDA_uE1nBn-A3z4mF6pNkFRIqLFE31v8QYj50yhtF8NSR1ny-YxXwMfJYzHNmbWKTtRloQzpHG6kyCOyGEW8FxlNBvJk49yVGY0YcbGZ36RRm45Ek6QLsxt2D0tS8ZtkK1mZyhyq_vAnRLe41m367xi--p8wKwsk_ntnHPnVUW8AEUTJ91pycaiqK5-nLPm2blu-1HbbqL8lMtnEghTNUOD5kosJwFtoyQFF4SLf9Oh1xwepQhqEEFffPjbG007k1VjztVdhNlOEn3Y-nQX4OEdHQfagG7RnTDoi_JyjUQGlR7sBLb-exQSCcJrExd&amp;p=1&amp;sk=&amp;fvj=0" id="sja1" onclick="setRefineByCookie(['salest']); sjoc('sja1',0); convCtr('SJ', pingUrlsForGA)" onmousedown="sjomd('sja1'); clk('sja1');" rel="nofollow" target="_blank" title="Statistical Modeling Analyst">Statistical Modeling Analyst</a>
<br/>
<div class="sjcl">
<span class="company">
<a class="turnstileLink" data-tn-element="companyName" href="/cmp/Nestle-USA" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491')" target="_blank">
        Nestle USA</a></span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Nestle-USA/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Statistical+Modeling+Analyst&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491');" target="_blank" title="Nestle USA reviews">
<span class="ratings"><span class="rating" style="width:43.8px"><!-- --></span></span>
<span class="slNoUnderline">724 reviews</span></a>
 - <span class="location">Arlington, VA</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tr><td class="snip">
<span class="summary">Demand signal <b>data</b> will be incorporated into forecast reporting and utilized for statistical modeling. Participate in demand signal <b>data</b> reporting and analysis...</span>
</td></tr></table>
<div class="sjCapt">
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" jasx_serpsjlabel_poststGray ">Sponsored</span> - <span class="tt_set" id="tt_set_10"><a class="sl resultLink save-job-link " href="#" id="sj_bfc578d23604c2fc" onclick="changeJobState('bfc578d23604c2fc', 'save', 'linkbar', true); return false;" title="Save this job to my.indeed">save job</a></span><div class="edit_note_content" id="editsaved2_bfc578d23604c2fc" style="display:none;"></div><script>window['sj_result_bfc578d23604c2fc'] = {"showSource": false, "source": "Nestle USA", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","jobKey": "bfc578d23604c2fc", "myIndeedAvailable": true, "showMoreActionsLink": false, "resultNumber": 10, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": false, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : true,"showSponsor" : true,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": true, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
<div class="row result" data-jk="815fb5476b35fef5" id="pj_815fb5476b35fef5">
<!-- Previously this variable was used to indicate job board jobs, we have replaced that with a more accurate source type check -->
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0CiRNM7CVr8YueLFKlzwbFWI0o7IjV438l4sVrvKZ0fliTfoNv_ONF5FvJKNaJe41O5vUOqHByFV-Kbj5Y3ThgQQObxPdR8ypSHPTD0Fer3x1OFY1fvzVKlLPcdMTrwb68jqlrGSZRldfv-jF--rtcj53F9KwZuE5gL0AJv2QRBgo3T3Yd0T9l1CD_Jy0y9DoYdRirUKZB7WJ4h1NeAIexcibP7J4SulWkb66J0nPi8nkAAuDYuMDOVcwvea9zIrnuLChykTbRIW2aKIjcsdi5WJAhK5vxXtkmhdejc8asP198HTS-UIHaqgJaOf8QdsbJvTmJeeHtK4vFTfOc5bfW0idDSgzTRgo0IYIzEhzZbsz1fsE48PDn1-HtkKTZThxMuiZl035Vy1e0obI0HVXl2ou274-WssyAfEAJ8T1MOt40y4ou62gIdM4G0GSU33ImyIFStxHiBmBqPU2maqe47q9SRylxHfkfN5bi3vVs3Ero67qmNlST1&amp;p=2&amp;sk=&amp;fvj=0" id="sja2" onclick="setRefineByCookie(['salest']); sjoc('sja2',0); convCtr('SJ', pingUrlsForGA)" onmousedown="sjomd('sja2'); clk('sja2');" rel="nofollow" target="_blank" title="Data Scientist"><b>Data</b> <b>Scientist</b></a>
<br/>
<div class="sjcl">
<span class="company">
<a class="turnstileLink" data-tn-element="companyName" href="/cmp/Indeed" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=815fb5476b35fef5&amp;jcid=d6ef41e202aa2c0b')" target="_blank">
        Indeed</a></span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Indeed/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Data+Scientist&amp;fromjk=815fb5476b35fef5&amp;jcid=d6ef41e202aa2c0b');" target="_blank" title="Indeed reviews">
<span class="ratings"><span class="rating" style="width:54.0px"><!-- --></span></span>
<span class="slNoUnderline">169 reviews</span></a>
 - <span class="location">Austin, TX 78731</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tr><td class="snip">
<span class="summary">As a <b>Data</b> <b>Scientist</b> at Indeed your role is to follow the <b>data</b>. Can fish for <b>data</b>:. Have full stack experience in <b>data</b> collection, aggregation, analysis,...</span>
</td></tr></table>
<div class="sjCapt">
<div class="iaP">
<span class="iaLabel"> Easily apply</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" jasx_serpsjlabel_poststGray ">Sponsored</span> - <span class="tt_set" id="tt_set_11"><a class="sl resultLink save-job-link " href="#" id="sj_815fb5476b35fef5" onclick="changeJobState('815fb5476b35fef5', 'save', 'linkbar', true); return false;" title="Save this job to my.indeed">save job</a></span><div class="edit_note_content" id="editsaved2_815fb5476b35fef5" style="display:none;"></div><script>window['sj_result_815fb5476b35fef5'] = {"showSource": false, "source": "Indeed", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","jobKey": "815fb5476b35fef5", "myIndeedAvailable": true, "showMoreActionsLink": false, "resultNumber": 11, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": false, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : true,"showSponsor" : true,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": true, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
<div class="row sjlast result" data-jk="a48ba5991b55a989" id="pj_a48ba5991b55a989">
<!-- Previously this variable was used to indicate job board jobs, we have replaced that with a more accurate source type check -->
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0AWCJ_wsjP2XpiEVpKxbmnSD7nn2vhw0wOf7OXX__gongTOnUpOjp_kYMJFeI3fpxPx8K5FMyAMRAhTlY7C6XJj2qZ4-GP-dm5Y6-0UjlcN1J4IZXrgeqFUB_Q7GKrqk5mkdCXfLpC_cWJzdgZigG8BV-W4CVPAcOHlhPPXJ7zL8X8esdGR_c3cT0tA7AQqiMT3LOv5onZseJUu7Y0RgNGwY0CsGesLuEkADbqLyv7nC9kjjeMquKTuIbGwvpSlYgBxvgqI5uE_L2Ob-oR-HuVS8w3nVurUwl-zdyfukK1pdIa89NfpVn9mD4UN5IVMa8R1sXJIziTEY8DxGGNhF0aZmju58gGA9lYvR_jLCL40YBd1Qx-PGaZ6ggbMD2PteUSBaoPH-vv_3aGLwSNfrjApJCa4S4J2aVwtvzfqHMCz5rvcvOxoiZ9K30-vkHLkaq2QchIvWbnEB4iNz4mwIOBQPgCzBEGzb9sVrDI623v01UzOGSFzliXf1qhlUFtEQl-nDtdl8TI_nkD9PgUfPKJN87wYY9L6R6JZlp-xHsOvNSdyEtgbnpfPY_mHnkrxzDL_Hr2zgXqjf2VNBNii5Y7L&amp;p=3&amp;sk=&amp;fvj=0" id="sja3" onclick="setRefineByCookie(['salest']); sjoc('sja3',0); convCtr('SJ', pingUrlsForGA)" onmousedown="sjomd('sja3'); clk('sja3');" rel="nofollow" target="_blank" title="Data Scientist - Big Data"><b>Data</b> <b>Scientist</b> - Big <b>Data</b></a>
<br/>
<div class="sjcl">
<span class="company">
<a class="turnstileLink" data-tn-element="companyName" href="/cmp/The-Washington-Post" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=a48ba5991b55a989&amp;jcid=9502fdf46127ff92')" target="_blank">
        The Washington Post</a></span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/The-Washington-Post/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Data+Scientist+-+Big+Data&amp;fromjk=a48ba5991b55a989&amp;jcid=9502fdf46127ff92');" target="_blank" title="The Washington Post reviews">
<span class="ratings"><span class="rating" style="width:51.6px"><!-- --></span></span>
<span class="slNoUnderline">111 reviews</span></a>
 - <span class="location">Washington, DC 20005</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tr><td class="snip">
<span class="summary">Washington Post is looking for passionate <b>Data</b> <b>Scientists</b> to join our Big <b>Data</b> Analytics team. <b>Data</b> <b>scientist</b> will utilize the <b>data</b> from the platform and design...</span>
</td></tr></table>
<div class="sjCapt">
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" jasx_serpsjlabel_poststGray ">Sponsored</span> - <span class="tt_set" id="tt_set_12"><a class="sl resultLink save-job-link " href="#" id="sj_a48ba5991b55a989" onclick="changeJobState('a48ba5991b55a989', 'save', 'linkbar', true); return false;" title="Save this job to my.indeed">save job</a></span><div class="edit_note_content" id="editsaved2_a48ba5991b55a989" style="display:none;"></div><script>window['sj_result_a48ba5991b55a989'] = {"showSource": false, "source": "The Washington Post", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","jobKey": "a48ba5991b55a989", "myIndeedAvailable": true, "showMoreActionsLink": false, "resultNumber": 12, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": false, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : true,"showSponsor" : true,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": true, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
</div>
<div class=" row result" data-jk="c4a8e4fae990ff66" data-tn-component="organicJob" id="p_c4a8e4fae990ff66" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_c4a8e4fae990ff66">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=c4a8e4fae990ff66&amp;fccid=94a6b16940c75d79" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[0],true,1);" onmousedown="return rclk(this,jobmap[0],1);" rel="nofollow" target="_blank" title="STAFF DATA SCIENTIST / DEEP LEARNING">STAFF <b>DATA</b> <b>SCIENTIST</b> / DEEP LEARNING</a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
    TERADEEP INC.</span>
</span>

 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">Campbell, CA</span></span></span>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<span class="no-wrap">$120,000 - $160,000 a year</span>
<div>
<span class="summary" itemprop="description">
Large Scale <b>Data</b> Extraction and Preparation. Develops Deep Learning HW &amp; SW Acceleration solutions for datacenter applications....</span>
</div>
<div class="iaP">
<span class="iaLabel"> Easily apply</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="date">11 hours ago</span> <span class="tt_set" id="tt_set_0">  -  <a class="sl resultLink save-job-link " href="#" id="sj_c4a8e4fae990ff66" onclick="changeJobState('c4a8e4fae990ff66', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_0" onclick="toggleMoreLinks('c4a8e4fae990ff66'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_c4a8e4fae990ff66" style="display:none;"></div><script>window['result_c4a8e4fae990ff66'] = {"showSource": false, "source": "TERADEEP INC.", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "11 hours ago","jobKey": "c4a8e4fae990ff66", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 0, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_0" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('c4a8e4fae990ff66'); return false;" title="Close"></a><div class="more_actions" id="more_0"><ul><li><span class="mat">View all <a href="/jobs?q=Teradeep+Inc&amp;l=Campbell,+CA&amp;nc=jasx" rel="nofollow">TERADEEP INC. jobs in Campbell, CA</a> - <a href="/l-Campbell,-CA-jobs.html">Campbell jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-Campbell-CA" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=c4a8e4fae990ff66&amp;from=serp-more');">Data Scientist salaries in Campbell, CA</a></span></li><li><span class="mat">Related forums: <a href="/forum/loc/Campbell-California.html">Campbell, California</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class=" row result" data-jk="6d655f1d63c5aac9" data-tn-component="organicJob" id="p_6d655f1d63c5aac9" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_6d655f1d63c5aac9">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=6d655f1d63c5aac9&amp;fccid=2c62e4de04b8f952" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[1],true,0);" onmousedown="return rclk(this,jobmap[1],0);" rel="nofollow" target="_blank" title="Data Scientist, North America Supply Chain Advanced Analytics"><b>Data</b> <b>Scientist</b>, North America Supply Chain Advanced Analytic...</a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Nike" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=6d655f1d63c5aac9&amp;jcid=e4a0ebc7ef5e730e')" target="_blank">
        NIKE INC</a></span>
</span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Nike/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Data+Scientist%2C+North+America+Supply+Chain+Advanced+Analytics&amp;fromjk=6d655f1d63c5aac9&amp;jcid=e4a0ebc7ef5e730e');" target="_blank" title="Nike reviews">
<span class="ratings"><span class="rating" style="width:52.2px"><!-- --></span></span>
<span class="slNoUnderline">3,309 reviews</span></a>
 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">Portland, OR</span></span></span>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
North America Supply Chain Advanced Analytics <b>Data</b> <b>Scientists</b> work to solve challenging strategic supply chain questions....</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="result-link-source">Nike </span>- <span class="date">1 day ago</span> <span class="tt_set" id="tt_set_1">  -  <a class="sl resultLink save-job-link " href="#" id="sj_6d655f1d63c5aac9" onclick="changeJobState('6d655f1d63c5aac9', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_1" onclick="toggleMoreLinks('6d655f1d63c5aac9'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_6d655f1d63c5aac9" style="display:none;"></div><script>window['result_6d655f1d63c5aac9'] = {"showSource": true, "source": "Nike", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "1 day ago","jobKey": "6d655f1d63c5aac9", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 1, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_1" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('6d655f1d63c5aac9'); return false;" title="Close"></a><div class="more_actions" id="more_1"><ul><li><span class="mat">View all <a href="/jobs?q=Nike+Inc&amp;l=Portland,+OR&amp;nc=jasx" rel="nofollow">NIKE INC jobs in Portland, OR</a> - <a href="/l-Portland,-OR-jobs.html">Portland jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-Portland-OR" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=6d655f1d63c5aac9&amp;from=serp-more');">Data Scientist salaries in Portland, OR</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Nike" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=6d655f1d63c5aac9&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=e4a0ebc7ef5e730e');">Nike Inc</a></span></li><li><span class="mat"><a href="/cmp/Nike/faq" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=6d655f1d63c5aac9&amp;jcid=e4a0ebc7ef5e730e');">Nike Inc questions about work, benefits, interviews and hiring process:</a><ul><li><a href="/cmp/Nike/faq/do-nike-work-on-holidays?quid=1b3jomfihas3afv3" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=6d655f1d63c5aac9&amp;jcid=e4a0ebc7ef5e730e');">Do nike work on holidays?</a></li><li><a href="/cmp/Nike/faq/how-did-you-get-your-first-interview-at-nike?quid=1al9mehunak5jb01" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=6d655f1d63c5aac9&amp;jcid=e4a0ebc7ef5e730e');">How did you get your first interview at NIKE?</a></li></ul></span></li><li><span class="mat">Related forums: <a href="/forum/cmp/Nike.html">NIKE</a> - <a href="/forum/loc/Portland-Oregon.html">Portland, Oregon</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class=" row result" data-jk="bfc578d23604c2fc" data-tn-component="organicJob" id="p_bfc578d23604c2fc" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_bfc578d23604c2fc">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=bfc578d23604c2fc&amp;fccid=bb384ca0a6d3d491" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[2],true,0);" onmousedown="return rclk(this,jobmap[2],0);" rel="nofollow" target="_blank" title="Statistical Modeling Analyst">Statistical Modeling Analyst</a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Nestle-USA" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491')" target="_blank">
        Nestle USA</a></span>
</span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Nestle-USA/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Statistical+Modeling+Analyst&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491');" target="_blank" title="Nestle USA reviews">
<span class="ratings"><span class="rating" style="width:43.8px"><!-- --></span></span>
<span class="slNoUnderline">724 reviews</span></a>
 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">Arlington, VA</span></span></span>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
Demand signal <b>data</b> will be incorporated into forecast reporting and utilized for statistical modeling. Participate in demand signal <b>data</b> reporting and analysis...</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="date">6 days ago</span> <span class="tt_set" id="tt_set_2">  -  <a class="sl resultLink save-job-link " href="#" id="sj_bfc578d23604c2fc" onclick="changeJobState('bfc578d23604c2fc', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_2" onclick="toggleMoreLinks('bfc578d23604c2fc'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_bfc578d23604c2fc" style="display:none;"></div><script>window['result_bfc578d23604c2fc'] = {"showSource": false, "source": "Nestle USA", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "6 days ago","jobKey": "bfc578d23604c2fc", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 2, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_2" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('bfc578d23604c2fc'); return false;" title="Close"></a><div class="more_actions" id="more_2"><ul><li><span class="mat">View all <a href="/q-Nestle-USA-l-Arlington,-VA-jobs.html" rel="nofollow">Nestle USA jobs in Arlington, VA</a> - <a href="/l-Arlington,-VA-jobs.html">Arlington jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Modeling-Analyst-Salaries,-Arlington-VA" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=bfc578d23604c2fc&amp;from=serp-more-nofollow');" rel='"nofollow"'>Modeling Analyst salaries in Arlington, VA</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Nestle-USA" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=bfc578d23604c2fc&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=bb384ca0a6d3d491');">Nestle USA</a></span></li><li><span class="mat"><a href="/cmp/Nestle-USA/faq" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491');">Nestle USA questions about work, benefits, interviews and hiring process:</a><ul><li><a href="/cmp/Nestle-USA/faq/what-is-the-work-environment-and-culture-like-at-nestle-usa?quid=1algs5a9sb80rfrv" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491');">What is the work environment and culture like at Nestle USA?</a></li><li><a href="/cmp/Nestle-USA/faq/working-on-the-weekend-is-that-something-u-can-work-ur-way-up-off-it-or-its-set-to-always-work-weekends?quid=1arbhbrgpaqisao6" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491');">Working on the weekend, is that something u can work ur way up off it or...</a></li></ul></span></li><li><span class="mat">Related forums: <a href="/forum/cmp/Nestle-USA.html">Nestle USA</a> - <a href="/forum/loc/Arlington-Virginia.html">Arlington, Virginia</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class=" row result" data-jk="07cb461c7beddbe1" data-tn-component="organicJob" id="p_07cb461c7beddbe1" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_07cb461c7beddbe1">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=07cb461c7beddbe1&amp;fccid=4e041af1d0af1bc8" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[3],true,0);" onmousedown="return rclk(this,jobmap[3],0);" rel="nofollow" target="_blank" title="Research Data Scientist">Research <b>Data</b> <b>Scientist</b></a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Booz-Allen-Hamilton" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=07cb461c7beddbe1&amp;jcid=4e041af1d0af1bc8')" target="_blank">
        Booz Allen Hamilton</a></span>
</span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Booz-Allen-Hamilton/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Research+Data+Scientist&amp;fromjk=07cb461c7beddbe1&amp;jcid=4e041af1d0af1bc8');" target="_blank" title="Booz Allen Hamilton reviews">
<span class="ratings"><span class="rating" style="width:44.4px"><!-- --></span></span>
<span class="slNoUnderline">1,096 reviews</span></a>
 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">Washington, DC</span></span></span>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
Research <b>Data</b> <b>Scientist</b>. Experience with building complex <b>data</b> extraction, transformation, and loading, including ETL into structured databases, <b>data</b> warehouses...</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="date">1 hour ago</span> <span class="tt_set" id="tt_set_3">  -  <a class="sl resultLink save-job-link " href="#" id="sj_07cb461c7beddbe1" onclick="changeJobState('07cb461c7beddbe1', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_3" onclick="toggleMoreLinks('07cb461c7beddbe1'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_07cb461c7beddbe1" style="display:none;"></div><script>window['result_07cb461c7beddbe1'] = {"showSource": false, "source": "Booz Allen Hamilton", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "1 hour ago","jobKey": "07cb461c7beddbe1", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 3, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_3" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('07cb461c7beddbe1'); return false;" title="Close"></a><div class="more_actions" id="more_3"><ul><li><span class="mat">View all <a href="/q-Booz-Allen-Hamilton-l-Washington,-DC-jobs.html" rel="nofollow">Booz Allen Hamilton jobs in Washington, DC</a> - <a href="/l-Washington,-DC-jobs.html">Washington jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-Washington-DC" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=07cb461c7beddbe1&amp;from=serp-more');">Data Scientist salaries in Washington, DC</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Booz-Allen-Hamilton" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=07cb461c7beddbe1&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=4e041af1d0af1bc8');">Booz Allen Hamilton</a></span></li><li><span class="mat"><a href="/cmp/Booz-Allen-Hamilton/faq" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=07cb461c7beddbe1&amp;jcid=4e041af1d0af1bc8');">Booz Allen Hamilton questions about work, benefits, interviews and hiring process:</a><ul><li><a href="/cmp/Booz-Allen-Hamilton/faq/how-are-the-working-hours?quid=1an7bo61vb822fp6" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=07cb461c7beddbe1&amp;jcid=4e041af1d0af1bc8');">How are the working hours?</a></li><li><a href="/cmp/Booz-Allen-Hamilton/faq/how-are-they-about-telecommuting?quid=1b62s7nfubvu9af0" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=07cb461c7beddbe1&amp;jcid=4e041af1d0af1bc8');">How are they about telecommuting?</a></li></ul></span></li><li><span class="mat">Related forums: <a href="/forum/cmp/Booz-Allen-Hamilton.html">Booz Allen Hamilton</a> - <a href="/forum/loc/Washington-District-of-Columbia.html">Washington, District of Columbia</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class=" row result" data-jk="e142c03f3ece6f32" data-tn-component="organicJob" id="p_e142c03f3ece6f32" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_e142c03f3ece6f32">
<a class="turnstileLink" data-tn-element="jobTitle" href="/company/Morelity/jobs/Junior-Data-Scientist-e142c03f3ece6f32?fccid=2d78d00d6a6a3869" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[4],true,0);" onmousedown="return rclk(this,jobmap[4],0);" rel="nofollow" target="_blank" title="Junior Data Scientist">Junior <b>Data</b> <b>Scientist</b></a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Morelity" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=e142c03f3ece6f32&amp;jcid=458f59c273df6bfc')" target="_blank">
        Morelity</a></span>
</span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Morelity/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Junior+Data+Scientist&amp;fromjk=e142c03f3ece6f32&amp;jcid=458f59c273df6bfc');" target="_blank" title="Morelity reviews">
<span class="ratings"><span class="rating" style="width:60.0px"><!-- --></span></span>
<span class="slNoUnderline">4 reviews</span></a>
 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">Des Moines, IA</span></span></span>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
Experience with transforming <b>data</b>. We are looking for junior (just out of school to a few years out in the real world) <b>data</b> <b>scientists</b> to work on ways to...</span>
</div>
<div class="iaP">
<span class="iaLabel"> Easily apply</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="date">1 day ago</span> <span class="tt_set" id="tt_set_4">  -  <a class="sl resultLink save-job-link " href="#" id="sj_e142c03f3ece6f32" onclick="changeJobState('e142c03f3ece6f32', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_4" onclick="toggleMoreLinks('e142c03f3ece6f32'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_e142c03f3ece6f32" style="display:none;"></div><script>window['result_e142c03f3ece6f32'] = {"showSource": false, "source": "Indeed", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "1 day ago","jobKey": "e142c03f3ece6f32", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 4, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_4" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('e142c03f3ece6f32'); return false;" title="Close"></a><div class="more_actions" id="more_4"><ul><li><span class="mat">View all <a href="/q-Morelity-l-Des-Moines,-IA-jobs.html" rel="nofollow">Morelity jobs in Des Moines, IA</a> - <a href="/l-Des-Moines,-IA-jobs.html">Des Moines jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-Des-Moines-IA" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=e142c03f3ece6f32&amp;from=serp-more');">Data Scientist salaries in Des Moines, IA</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Morelity" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=e142c03f3ece6f32&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=458f59c273df6bfc');">Morelity</a></span></li><li><span class="mat">Related forums: <a href="/forum/loc/Des-Moines-Iowa.html">Des Moines, Iowa</a> - <a href="/forum/cmp/Morelity.html">Morelity</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class=" row result" data-jk="4648e360810d9511" data-tn-component="organicJob" id="p_4648e360810d9511" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_4648e360810d9511">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=4648e360810d9511&amp;fccid=4e041af1d0af1bc8" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[5],true,0);" onmousedown="return rclk(this,jobmap[5],0);" rel="nofollow" target="_blank" title="Data Scientist"><b>Data</b> <b>Scientist</b></a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Booz-Allen-Hamilton" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=4648e360810d9511&amp;jcid=4e041af1d0af1bc8')" target="_blank">
        Booz Allen Hamilton</a></span>
</span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Booz-Allen-Hamilton/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Data+Scientist&amp;fromjk=4648e360810d9511&amp;jcid=4e041af1d0af1bc8');" target="_blank" title="Booz Allen Hamilton reviews">
<span class="ratings"><span class="rating" style="width:44.4px"><!-- --></span></span>
<span class="slNoUnderline">1,096 reviews</span></a>
 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">Arlington, VA</span></span></span>
 <a class="more_loc" href="/jobs?q=data+scientist+%2420%2C000&amp;rbt=Data+Scientist&amp;rbc=Booz+Allen+Hamilton&amp;jtid=1872a3288ede048a&amp;jcid=4e041af1d0af1bc8&amp;grp=tcl" onmousedown="ptk('addlloc');" rel="nofollow">+13 locations</a><table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
Ability to both manage and manipulate large <b>data</b> sets, develop <b>data</b> science approaches, and manage <b>data</b> science tasks....</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="date">7 days ago</span> <span class="tt_set" id="tt_set_5">  -  <a class="sl resultLink save-job-link " href="#" id="sj_4648e360810d9511" onclick="changeJobState('4648e360810d9511', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_5" onclick="toggleMoreLinks('4648e360810d9511'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_4648e360810d9511" style="display:none;"></div><script>window['result_4648e360810d9511'] = {"showSource": false, "source": "Booz Allen Hamilton", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "7 days ago","jobKey": "4648e360810d9511", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 5, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_5" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('4648e360810d9511'); return false;" title="Close"></a><div class="more_actions" id="more_5"><ul><li><span class="mat">View all <a href="/q-Booz-Allen-Hamilton-l-Arlington,-VA-jobs.html" rel="nofollow">Booz Allen Hamilton jobs in Arlington, VA</a> - <a href="/l-Arlington,-VA-jobs.html">Arlington jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-Arlington-VA" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=4648e360810d9511&amp;from=serp-more');">Data Scientist salaries in Arlington, VA</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Booz-Allen-Hamilton" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=4648e360810d9511&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=4e041af1d0af1bc8');">Booz Allen Hamilton</a></span></li><li><span class="mat"><a href="/cmp/Booz-Allen-Hamilton/faq" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=4648e360810d9511&amp;jcid=4e041af1d0af1bc8');">Booz Allen Hamilton questions about work, benefits, interviews and hiring process:</a><ul><li><a href="/cmp/Booz-Allen-Hamilton/faq/how-are-the-working-hours?quid=1an7bo61vb822fp6" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=4648e360810d9511&amp;jcid=4e041af1d0af1bc8');">How are the working hours?</a></li><li><a href="/cmp/Booz-Allen-Hamilton/faq/how-are-they-about-telecommuting?quid=1b62s7nfubvu9af0" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=4648e360810d9511&amp;jcid=4e041af1d0af1bc8');">How are they about telecommuting?</a></li></ul></span></li><li><span class="mat">Related forums: <a href="/forum/cmp/Booz-Allen-Hamilton.html">Booz Allen Hamilton</a> - <a href="/forum/loc/Arlington-Virginia.html">Arlington, Virginia</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class=" row result" data-jk="c27766f9f51e9e73" data-tn-component="organicJob" id="p_c27766f9f51e9e73" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_c27766f9f51e9e73">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=c27766f9f51e9e73&amp;fccid=dc5dd98c1ef26287" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[6],true,0);" onmousedown="return rclk(this,jobmap[6],0);" rel="nofollow" target="_blank" title="Statistical Modeling/Data Scientist, First Level Officer, Dallas TX">Statistical Modeling/<b>Data</b> <b>Scientist</b>, First Level Officer, Da...</a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Comerica-Bank" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=c27766f9f51e9e73&amp;jcid=77b67a7c2e08c53f')" target="_blank">
        Comerica</a></span>
</span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Comerica-Bank/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Statistical+Modeling%5C%2FData+Scientist%2C+First+Level+Officer%2C+Dallas+TX&amp;fromjk=c27766f9f51e9e73&amp;jcid=77b67a7c2e08c53f');" target="_blank" title="Comerica reviews">
<span class="ratings"><span class="rating" style="width:43.8px"><!-- --></span></span>
<span class="slNoUnderline">468 reviews</span></a>
 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">Dallas, TX 75201 <span style="font-size: smaller">(City Center District area)</span></span></span></span>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
Participates in rigorous assessment of <b>data</b> quality and assumptions relevance, and appropriate documentation especially where management judgment is used....</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="result-link-source">Comerica Bank </span>- <span class="date">3 days ago</span> <span class="tt_set" id="tt_set_6">  -  <a class="sl resultLink save-job-link " href="#" id="sj_c27766f9f51e9e73" onclick="changeJobState('c27766f9f51e9e73', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_6" onclick="toggleMoreLinks('c27766f9f51e9e73'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_c27766f9f51e9e73" style="display:none;"></div><script>window['result_c27766f9f51e9e73'] = {"showSource": true, "source": "Comerica Bank", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "3 days ago","jobKey": "c27766f9f51e9e73", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 6, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_6" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('c27766f9f51e9e73'); return false;" title="Close"></a><div class="more_actions" id="more_6"><ul><li><span class="mat">View all <a href="/q-Comerica-l-Dallas,-TX-jobs.html" rel="nofollow">Comerica jobs in Dallas, TX</a> - <a href="/l-Dallas,-TX-jobs.html">Dallas jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-Dallas-TX" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=c27766f9f51e9e73&amp;from=serp-more');">Data Scientist salaries in Dallas, TX</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Comerica-Bank" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=c27766f9f51e9e73&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=77b67a7c2e08c53f');">Comerica</a></span></li><li><span class="mat"><a href="/cmp/Comerica-Bank/faq" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=c27766f9f51e9e73&amp;jcid=77b67a7c2e08c53f');">Comerica questions about work, benefits, interviews and hiring process:</a><ul><li><a href="/cmp/Comerica-Bank/faq/what-is-the-work-environment-and-culture-like-at-comerica-bank?quid=1aos8lcm7b8blfvv" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=c27766f9f51e9e73&amp;jcid=77b67a7c2e08c53f');">What is the work environment and culture like at Comerica Bank?</a></li><li><a href="/cmp/Comerica-Bank/faq/how-would-you-describe-the-pace-of-work-at-comerica-bank?quid=1avrm4t1iaqi4bqi" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=c27766f9f51e9e73&amp;jcid=77b67a7c2e08c53f');">How would you describe the pace of work at Comerica Bank?</a></li></ul></span></li><li><span class="mat">Related forums: <a href="/forum/loc/Dallas-Texas.html">Dallas, Texas</a> - <a href="/forum/cmp/Comerica-Bank.html">Comerica Bank</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class=" row result" data-jk="76d7e0f41e68e80c" data-tn-component="organicJob" id="p_76d7e0f41e68e80c" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_76d7e0f41e68e80c">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=76d7e0f41e68e80c&amp;fccid=f827d06dec9ae88b" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[7],true,0);" onmousedown="return rclk(this,jobmap[7],0);" rel="nofollow" target="_blank" title="Data Scientist"><b>Data</b> <b>Scientist</b></a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
    Coverent</span>
</span>

 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">McLean, VA</span></span></span>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
Coverent is seeking a <b>Data</b> <b>Scientist</b> to lead quantitative research and program evaluation activities for a client in the US Intelligence Community....</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="date">10 days ago</span> <span class="tt_set" id="tt_set_7">  -  <a class="sl resultLink save-job-link " href="#" id="sj_76d7e0f41e68e80c" onclick="changeJobState('76d7e0f41e68e80c', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_7" onclick="toggleMoreLinks('76d7e0f41e68e80c'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_76d7e0f41e68e80c" style="display:none;"></div><script>window['result_76d7e0f41e68e80c'] = {"showSource": false, "source": "Coverent", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "10 days ago","jobKey": "76d7e0f41e68e80c", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 7, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_7" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('76d7e0f41e68e80c'); return false;" title="Close"></a><div class="more_actions" id="more_7"><ul><li><span class="mat">View all <a href="/q-Coverent-l-McLean,-VA-jobs.html" rel="nofollow">Coverent jobs in McLean, VA</a> - <a href="/l-McLean,-VA-jobs.html">McLean jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-McLean-VA" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=76d7e0f41e68e80c&amp;from=serp-more');">Data Scientist salaries in McLean, VA</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Coverent" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=76d7e0f41e68e80c&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=f827d06dec9ae88b');">Coverent</a></span></li><li><span class="mat">Related forums: <a href="/forum/loc/Mclean-Virginia.html">Mclean, Virginia</a> - <a href="/forum/cmp/Coverent.html">Coverent</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class=" row result" data-jk="30ca570905ea778a" data-tn-component="organicJob" id="p_30ca570905ea778a" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_30ca570905ea778a">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=30ca570905ea778a&amp;fccid=4e041af1d0af1bc8" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[8],true,0);" onmousedown="return rclk(this,jobmap[8],0);" rel="nofollow" target="_blank" title="Data Scientist, Junior"><b>Data</b> <b>Scientist</b>, Junior</a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Booz-Allen-Hamilton" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=30ca570905ea778a&amp;jcid=4e041af1d0af1bc8')" target="_blank">
        Booz Allen Hamilton</a></span>
</span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Booz-Allen-Hamilton/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Data+Scientist%2C+Junior&amp;fromjk=30ca570905ea778a&amp;jcid=4e041af1d0af1bc8');" target="_blank" title="Booz Allen Hamilton reviews">
<span class="ratings"><span class="rating" style="width:44.4px"><!-- --></span></span>
<span class="slNoUnderline">1,096 reviews</span></a>
 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">Arlington, VA</span></span></span>
 <a class="more_loc" href="/jobs?q=data+scientist+%2420%2C000&amp;rbt=Data+Scientist%2C+Junior&amp;rbc=Booz+Allen+Hamilton&amp;jtid=59131814e02ee563&amp;jcid=4e041af1d0af1bc8&amp;grp=tcl" onmousedown="ptk('addlloc');" rel="nofollow">+5 locations</a><table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
<b>Data</b> <b>Scientist</b>, Junior. Apply expertise in <b>data</b> mining and analysis to explore <b>data</b> from disparate sources and discover patterns and previously hidden insights...</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="date">8 days ago</span> <span class="tt_set" id="tt_set_8">  -  <a class="sl resultLink save-job-link " href="#" id="sj_30ca570905ea778a" onclick="changeJobState('30ca570905ea778a', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_8" onclick="toggleMoreLinks('30ca570905ea778a'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_30ca570905ea778a" style="display:none;"></div><script>window['result_30ca570905ea778a'] = {"showSource": false, "source": "Booz Allen Hamilton", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "8 days ago","jobKey": "30ca570905ea778a", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 8, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_8" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('30ca570905ea778a'); return false;" title="Close"></a><div class="more_actions" id="more_8"><ul><li><span class="mat">View all <a href="/q-Booz-Allen-Hamilton-l-Arlington,-VA-jobs.html" rel="nofollow">Booz Allen Hamilton jobs in Arlington, VA</a> - <a href="/l-Arlington,-VA-jobs.html">Arlington jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-Arlington-VA" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=30ca570905ea778a&amp;from=serp-more');">Data Scientist salaries in Arlington, VA</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Booz-Allen-Hamilton" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=30ca570905ea778a&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=4e041af1d0af1bc8');">Booz Allen Hamilton</a></span></li><li><span class="mat"><a href="/cmp/Booz-Allen-Hamilton/faq" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=30ca570905ea778a&amp;jcid=4e041af1d0af1bc8');">Booz Allen Hamilton questions about work, benefits, interviews and hiring process:</a><ul><li><a href="/cmp/Booz-Allen-Hamilton/faq/how-are-the-working-hours?quid=1an7bo61vb822fp6" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=30ca570905ea778a&amp;jcid=4e041af1d0af1bc8');">How are the working hours?</a></li><li><a href="/cmp/Booz-Allen-Hamilton/faq/how-are-they-about-telecommuting?quid=1b62s7nfubvu9af0" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=30ca570905ea778a&amp;jcid=4e041af1d0af1bc8');">How are they about telecommuting?</a></li></ul></span></li><li><span class="mat">Related forums: <a href="/forum/cmp/Booz-Allen-Hamilton.html">Booz Allen Hamilton</a> - <a href="/forum/loc/Arlington-Virginia.html">Arlington, Virginia</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div class="lastRow row result" data-jk="eef5e4f2c23117b2" data-tn-component="organicJob" id="p_eef5e4f2c23117b2" itemscope="" itemtype="http://schema.org/JobPosting">
<h2 class="jobtitle" id="jl_eef5e4f2c23117b2">
<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=eef5e4f2c23117b2&amp;fccid=4e041af1d0af1bc8" itemprop="title" onclick="setRefineByCookie(['salest']); return rclk(this,jobmap[9],true,0);" onmousedown="return rclk(this,jobmap[9],0);" rel="nofollow" target="_blank" title="Data Scientist / Intelligence Analyst"><b>Data</b> <b>Scientist</b> / Intelligence Analyst</a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Booz-Allen-Hamilton" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=eef5e4f2c23117b2&amp;jcid=4e041af1d0af1bc8')" target="_blank">
        Booz Allen Hamilton</a></span>
</span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Booz-Allen-Hamilton/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Data+Scientist+%5C%2F+Intelligence+Analyst&amp;fromjk=eef5e4f2c23117b2&amp;jcid=4e041af1d0af1bc8');" target="_blank" title="Booz Allen Hamilton reviews">
<span class="ratings"><span class="rating" style="width:44.4px"><!-- --></span></span>
<span class="slNoUnderline">1,096 reviews</span></a>
 - <span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"><span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">McLean, VA</span></span></span>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
<b>Data</b> <b>Scientist</b>/Intelligence Analyst. Experience with SPSS, SAS, Python, and other <b>data</b> science tools. Comprehend <b>data</b> science tools, including SAS, SPSS, and...</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class="date">6 days ago</span> <span class="tt_set" id="tt_set_9">  -  <a class="sl resultLink save-job-link " href="#" id="sj_eef5e4f2c23117b2" onclick="changeJobState('eef5e4f2c23117b2', 'save', 'linkbar', false); return false;" title="Save this job to my.indeed">save job</a> - <a class="sl resultLink more-link " href="#" id="tog_9" onclick="toggleMoreLinks('eef5e4f2c23117b2'); return false;">more...</a></span><div class="edit_note_content" id="editsaved2_eef5e4f2c23117b2" style="display:none;"></div><script>window['result_eef5e4f2c23117b2'] = {"showSource": false, "source": "Booz Allen Hamilton", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","relativeJobAge": "6 days ago","jobKey": "eef5e4f2c23117b2", "myIndeedAvailable": true, "showMoreActionsLink": true, "resultNumber": 9, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": true, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : false,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": false, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="more-links-container result-tab" id="tt_display_9" style="display:none;"><a class="close-link closeLink" href="#" onclick="toggleMoreLinks('eef5e4f2c23117b2'); return false;" title="Close"></a><div class="more_actions" id="more_9"><ul><li><span class="mat">View all <a href="/q-Booz-Allen-Hamilton-l-McLean,-VA-jobs.html" rel="nofollow">Booz Allen Hamilton jobs in McLean, VA</a> - <a href="/l-McLean,-VA-jobs.html">McLean jobs</a></span></li><li><span class="mat">Salary Search: <a href="/salaries/Data-Scientist-Salaries,-McLean-VA" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=serp-more&amp;fromjk=eef5e4f2c23117b2&amp;from=serp-more');">Data Scientist salaries in McLean, VA</a></span></li><li><span class="mat">Learn more about working at <a href="/cmp/Booz-Allen-Hamilton" onmousedown="this.href = appendParamsOnce(this.href, '?fromjk=eef5e4f2c23117b2&amp;from=serp-more&amp;campaignid=serp-more&amp;jcid=4e041af1d0af1bc8');">Booz Allen Hamilton</a></span></li><li><span class="mat"><a href="/cmp/Booz-Allen-Hamilton/faq" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=eef5e4f2c23117b2&amp;jcid=4e041af1d0af1bc8');">Booz Allen Hamilton questions about work, benefits, interviews and hiring process:</a><ul><li><a href="/cmp/Booz-Allen-Hamilton/faq/how-are-the-working-hours?quid=1an7bo61vb822fp6" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=eef5e4f2c23117b2&amp;jcid=4e041af1d0af1bc8');">How are the working hours?</a></li><li><a href="/cmp/Booz-Allen-Hamilton/faq/how-are-they-about-telecommuting?quid=1b62s7nfubvu9af0" onmousedown="this.href = appendParamsOnce(this.href, '?from=serp-more&amp;campaignid=serp-more&amp;fromjk=eef5e4f2c23117b2&amp;jcid=4e041af1d0af1bc8');">How are they about telecommuting?</a></li></ul></span></li><li><span class="mat">Related forums: <a href="/forum/loc/Mclean-Virginia.html">Mclean, Virginia</a> - <a href="/forum/cmp/Booz-Allen-Hamilton.html">Booz Allen Hamilton</a></span></li></ul></div></div><div class="dya-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
<div class="sign-in-container result-tab"></div>
<div class="notes-container result-tab"></div>
</div>
</td>
</tr>
</table>
</div>
<div></div>
<div class="WubEucQIlF hkvp2VD">
<div class="row result" data-jk="4e5024fca7b26984" id="pj_4e5024fca7b26984">
<!-- Previously this variable was used to indicate job board jobs, we have replaced that with a more accurate source type check -->
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0DZbghDog6eAFViHltrtimeH7IE6OveQ-GQud_7SkddKUVOjea9zU8iES2CM_birFXBaZvdQZKAe2UoT-TSyJU8X3SPTcTBwER6QFmkbdOVYRDj978fPAq4iZozdRVxrHoLzQCM3u38aVfoZ5XgeuSo8RuFIq5cem2fBru-RZVpa3E2mlqEhY2By7BGs9ArasccibIR6H11jmhYXCzU5u2Ie5sYAr864e7ICxnmEscB-xYmeaSE97VuxzOhuU_8RgUqiZJWpAoUdBwtr4-Crie7fIFulNwTfyx5yHEFu-ntoiFayP_tDCD_mFfld9O_LGwh6yb1ObSK_SS_kZlfP9OLI9nH66gCpEQDGjrsKz5GJBe22t2VQSZtqDqAVD-z-kZxycn_CGaGQuAb67wX-FjaYVMs_eBWfC5cRLuvyadc2OE3hVqAB-4cOvN45oSFfvYgj6USjRMMRAB0glDsLtTWrfGQRpkso4u6rch8uhSaEemiLIiF9warUnNGf7nZP6HVg5wXaj6Q3jxzYAiNC3LfCCjRdlFBMnjIkOgeX8VTYpKPsRbo9_DWWXc2Sj2SPx6iL8O9Uu0RRRrapaQR5IbTY2duzZD9LIg-TB1uMPYNIctUk1SJ-BiFa9Cr0GCgQHc1rpunQh2_5tCfOvWrHEyv3uusDaXPmAkUjBLxmSHIOuhryDOpCYSJ1aDFRRXci6v0lYa1WRCtFGDZMNOVDLubKRoSqmE2zNg_wstCLW987iKglVf5RzMQVu9dP2dM2VWuvr6ddW42yTpR1YZf4XqgqTluYh2kLcO-MNdQS8wm64ZvO2ReGaf58KTSDPHk7RZb5kmUZmkEpzoO8-8cJmjVEpOUTEsHhsJDqE0-faUr1w==&amp;p=4&amp;sk=&amp;fvj=0" id="sja4" onclick="setRefineByCookie(['salest']); sjoc('sja4',0); convCtr('SJ', pingUrlsForGA)" onmousedown="sjomd('sja4'); clk('sja4');" rel="nofollow" target="_blank" title="Data Scientist"><b>Data</b> <b>Scientist</b></a>
<br/>
<div class="sjcl">
<span class="company">
<a class="turnstileLink" data-tn-element="companyName" href="/cmp/Novetta" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=4e5024fca7b26984&amp;jcid=571a41e861ff0130')" target="_blank">
        Novetta</a></span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Novetta/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Data+Scientist&amp;fromjk=4e5024fca7b26984&amp;jcid=571a41e861ff0130');" target="_blank" title="Novetta reviews">
<span class="ratings"><span class="rating" style="width:39.0px"><!-- --></span></span>
<span class="slNoUnderline">11 reviews</span></a>
 - <span class="location">Crystal City, VA</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tr><td class="snip">
<span class="summary">Our customer uses their <b>data</b> warehouse as the foundation for a variety of applications and analytical workflows supporting intelligence analysis, threat tiering...</span>
</td></tr></table>
<div class="sjCapt">
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" jasx_serpsjlabel_poststGray ">Sponsored</span> - <span class="tt_set" id="tt_set_13"><a class="sl resultLink save-job-link " href="#" id="sj_4e5024fca7b26984" onclick="changeJobState('4e5024fca7b26984', 'save', 'linkbar', true); return false;" title="Save this job to my.indeed">save job</a></span><div class="edit_note_content" id="editsaved2_4e5024fca7b26984" style="display:none;"></div><script>window['sj_result_4e5024fca7b26984'] = {"showSource": false, "source": "Novetta", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","jobKey": "4e5024fca7b26984", "myIndeedAvailable": true, "showMoreActionsLink": false, "resultNumber": 13, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": false, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : true,"showSponsor" : true,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": true, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
<div class="row sjlast result" data-jk="0bb9716efd7444ed" id="pj_0bb9716efd7444ed">
<!-- Previously this variable was used to indicate job board jobs, we have replaced that with a more accurate source type check -->
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0Aq0rKIZcruiUNua2xSU7SkhpNUzYKIg2AxYEClhfFPH7k5YmpKoBcdllH6Z-vCI38aoVnJ8FfHh3U-fnW0gDNFo2Be7V7yt0zLsCS3nIS_xksJNV_2pAhWp0_smygQUBHJIsYirq_1u885mYfkb4oUmlDpdh4WXoanGHY2FY95sIy5qDphOEJZscA6ikfKdH-mBPqyMtqSfPW-e3hV1UKgrNoh955Bqwo6j4BQaipITBup1oqYV15CC38YwjWJEtHIiwV-F-oXdsSUY3NHzZrG7YROsZ1UDwIhwMP-j1FubYJRI56Mnoafdsl9fM1J57hrvLR_TqimKdX5NdAYHLGYKhiRmMacuOUAhkpRQW-wCPqm28UkqqVJ2c8jKUL76hWqmXZ8olsz4q5y7ijk__xiF3tULBeLNm_11cJY0q9U38Lhvs7TicB0k3oYncWLQvwqRgP6XnNEtpc0XFGzoQ_stmgUNBSrU2nJNp7sO3BLr_NPC71TsnfRinKk6KcIScKFmx2RbwcUrLg7hqPWCXyYvY2gahvEN1XtcTflHTrcHaLdGJE9js8J0WDXL8rFJ4239E3uT1msAQoj1tYczLrC7y7b4_O3kU4yD9N3m43u84iEDtMVYABuuZire2ICfzE=&amp;p=5&amp;sk=&amp;fvj=0" id="sja5" onclick="setRefineByCookie(['salest']); sjoc('sja5',0); convCtr('SJ', pingUrlsForGA)" onmousedown="sjomd('sja5'); clk('sja5');" rel="nofollow" target="_blank" title="Director of Analytics">Director of Analytics</a>
<br/>
<div class="sjcl">
<span class="company">
<a class="turnstileLink" data-tn-element="companyName" href="/cmp/Dentaquest" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=0bb9716efd7444ed&amp;jcid=e51726f3f862bbf5')" target="_blank">
        DentaQuest</a></span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Dentaquest/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Director+of+Analytics&amp;fromjk=0bb9716efd7444ed&amp;jcid=e51726f3f862bbf5');" target="_blank" title="Dentaquest reviews">
<span class="ratings"><span class="rating" style="width:41.4px"><!-- --></span></span>
<span class="slNoUnderline">87 reviews</span></a>
 - <span class="location">Boston, MA</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tr><td class="snip">
<span class="summary">Mentor analysts and <b>data</b> <b>scientists</b> in <b>data</b> discovery, modeling efforts, presentations, and communications with external sponsors....</span>
</td></tr></table>
<div class="sjCapt">
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" jasx_serpsjlabel_poststGray ">Sponsored</span> - <span class="tt_set" id="tt_set_14"><a class="sl resultLink save-job-link " href="#" id="sj_0bb9716efd7444ed" onclick="changeJobState('0bb9716efd7444ed', 'save', 'linkbar', true); return false;" title="Save this job to my.indeed">save job</a></span><div class="edit_note_content" id="editsaved2_0bb9716efd7444ed" style="display:none;"></div><script>window['sj_result_0bb9716efd7444ed'] = {"showSource": false, "source": "DentaQuest", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","jobKey": "0bb9716efd7444ed", "myIndeedAvailable": true, "showMoreActionsLink": false, "resultNumber": 14, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": false, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : true,"showSponsor" : true,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": true, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
</div><p class="footerJaPromoBubble">
<span>
Never miss a job. Get new jobs emailed to you daily.</span>
<script type="text/javascript">setJaPromoCookie()</script>
</p>
<style>
.fixed_ja_link_x { color:#00C; float: right; font-size: 22px; margin-top: -8px; font-weight: bold;}
.fixed_ja_link_x:hover { cursor:pointer;}
.fixed_ja_link { color:#00C; cursor: pointer;}
.fixed_ja_link:hover { text-decoration: underline;}
.fixed_ja_background { background-color: #ebebeb;}
</style>
<div id="bjobalertswrapper">
<div class="open jaPromoUi jaui aAfrMEJYnZ" id="bjobalerts">
<div class="jobalertlabel">
<span class="jobalerts_title" id="bjobalertlabel"><span aria-label="alert icon" class="ico" role="img"></span>Be the first to see new <b>data scientist $20,000 jobs</b></span>
</div>
<div class="jaform" id="bjobalertform">
<span id="bjobalerttext"></span><span id="bjobalertsending"></span>
<div id="bjobalertmessage">
<form action="/alert" method="POST" onsubmit="return addalertdelegate('data+scientist+%2420%2C000','','b','',this.email.value,'1bhdgud8k18jh5sa', this.verified.value, true, '661982', 'US', 'e6fefc8bfdb8e9fca4f3fc974a0f1415', this.recjobalert.checked, false, false);">
<input name="a" type="hidden" value="add"/>
<input name="q" type="hidden" value="data scientist $20,000"/>
<input name="l" type="hidden" value=""/>
<input name="radius" type="hidden" value="25"/>
<input name="noscript" type="hidden" value="1"/>
<input name="fr" type="hidden" value="b"/>
<input name="tk" type="hidden" value="1bhdgud8k18jh5sa"/>
<input id="balertverified" name="verified" type="hidden" value="0"/>
<input name="alertparams" type="hidden" value=""/>
<label for="balertemail">My email:</label> <input id="balertemail" maxlength="100" name="email" size="25" type="text" value=""/>
<span class="indeed-apply-button"><span class="indeed-apply-button-inner"><input class="indeed-apply-button-label" id="balertsubmit" type="submit" value="Activate"/></span></span>
<style type="text/css">
    .indeed-apply-button { cursor : pointer !important; display : inline-block !important; padding : 1px !important; height : 31px !important; -moz-border-radius : 7px !important; border-radius : 7px !important; position : relative !important; text-decoration : none !important;background-color:#79788B; filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#BCBBCD', endColorstr='#79788B', GradientType=0);background-image: -webkit-gradient(linear, center top, center bottom, from(#BCBBCD), to(#79788B)) !important;background-image: -webkit-linear-gradient(top, #BCBBCD, #79788B) !important;background-image: -moz-linear-gradient(top, #BCBBCD, #79788B) !important;background-image: -o-linear-gradient(top, #BCBBCD, #79788B) !important;background-image: -ms-linear-gradient(top, #BCBBCD, #79788B) !important;background-image: linear-gradient(top, #BCBBCD, #79788B) !important;-webkit-box-shadow: 0 1px 2px rgba(0,0,0,0.2) !important;-moz-box-shadow: 0 1px 2px rgba(0,0,0,0.2) !important;box-shadow: 0 1px 2px rgba(0,0,0,0.2) !important; } #indeed-ia-1329175190441-0:link, #indeed-ia-1329175190441-0:visited, #indeed-ia-1329175190441-0:hover, #indeed-ia-1329175190441-0:active { border : 0 !important; text-decoration : none !important; }

    .indeed-apply-button:hover { filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#6D99F6', endColorstr='#1B45A3', GradientType=0);background-image: -webkit-gradient(linear, center top, center bottom, from(#6D99F6), to(#1B45A3)) !important;background-image: -webkit-linear-gradient(top, #6D99F6, #1B45A3) !important;background-image: -moz-linear-gradient(top, #6D99F6, #1B45A3) !important;background-image: -o-linear-gradient(top, #6D99F6, #1B45A3) !important;background-image: -ms-linear-gradient(top, #6D99F6, #1B45A3) !important;background-image: linear-gradient(top, #6D99F6, #1B45A3) !important; }

    .indeed-apply-state-clicked .indeed-apply-button,
    .indeed-apply-button:active { filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#B3BACA', endColorstr='#7C8493', GradientType=0);background-image: -webkit-gradient(linear, center top, center bottom, from(#B3BACA), to(#7C8493)) !important;background-image: -webkit-linear-gradient(top, #B3BACA, #7C8493) !important;background-image: -moz-linear-gradient(top, #B3BACA, #7C8493) !important;background-image: -o-linear-gradient(top, #B3BACA, #7C8493) !important;background-image: -ms-linear-gradient(top, #B3BACA, #7C8493) !important;background-image: linear-gradient(top, #B3BACA, #7C8493) !important;-webkit-box-shadow: none !important;-moz-box-shadow: none !important;box-shadow: none !important; }

    .indeed-apply-button-inner { display : inline-block !important; height : 31px !important; -moz-border-radius : 6px !important; border-radius : 6px !important; font : 18px 'Helvetica Neue','Helvetica',Arial !important; font-weight : 200 !important; text-decoration : none !important; text-shadow : 0px 1px #F1F1F4 !important;background-color:#D9D9E2;  color: #FF6703;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#FAFAFB', endColorstr='#D9D9E2', GradientType=0);background-image: -webkit-gradient(linear, center top, center bottom, from(#FAFAFB), to(#D9D9E2)) !important;background-image: -webkit-linear-gradient(top, #FAFAFB, #D9D9E2) !important;background-image: -moz-linear-gradient(top, #FAFAFB, #D9D9E2) !important;background-image: -o-linear-gradient(top, #FAFAFB, #D9D9E2) !important;background-image: -ms-linear-gradient(top, #FAFAFB, #D9D9E2) !important;background-image: linear-gradient(top, #FAFAFB, #D9D9E2) !important; }

    .indeed-apply-button:active .indeed-apply-button-inner { filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#E8E8E9', endColorstr='#CBCBD3', GradientType=0);background-image: -webkit-gradient(linear, center top, center bottom, from(#E8E8E9), to(#CBCBD3)) !important;background-image: -webkit-linear-gradient(top, #E8E8E9, #CBCBD3) !important;background-image: -moz-linear-gradient(top, #E8E8E9, #CBCBD3) !important;background-image: -o-linear-gradient(top, #E8E8E9, #CBCBD3) !important;background-image: -ms-linear-gradient(top, #E8E8E9, #CBCBD3) !important;background-image: linear-gradient(top, #E8E8E9, #CBCBD3) !important; }

    .indeed-apply-button-label {cursor: pointer; text-align : center !important; border:0; background: transparent;font-size: 12px; font-family: Arial, sans-serif; padding:3px 14px 2px 12px; margin:0; line-height: 26px; }

    .indeed-apply-button:active .indeed-apply-button-label,
    .indeed-apply-state-clicked .indeed-apply-button-label { -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=0.75)" !important;filter: alpha(opacity=75) !important;-moz-opacity: 0.75 !important;-khtml-opacity: 0.75 !important;opacity: 0.75 !important; }

    #talertemail, #balertemail {  height: 27px; line-height: 24px; padding-left: 6px; padding-right: 6px; font-size: 14px; font-family: Arial, sans-serif; }

    .jobalertform-terms-outer-wrapper label {  position: fixed;  font-size: 1px;  transform: scale(0.3);  }

    .jobalertform-terms-inner-wrapper {  position: relative;  z-index: 20;  width: 50%;  background: #ebebeb;  height: 12px;  }
</style>
<label for="brecjobalert" id="brecjobalertlabel">
<input checked="" id="brecjobalert" name="recjobalert" type="checkbox"/>
<span>Also get an email with jobs recommended just for me</span>
</label>
</form>
<span class="caption">
You can cancel email alerts at any time.</span>
</div>
</div>
</div>
</div>
<script type="text/javascript">
function ptk(st,p) {
document.cookie = 'PTK="tk=&type=jobsearch&subtype=' + st + (p ? '&' + p : '')
 + (st == 'pagination' ? '&fp=1' : '')
+'"; path=/';
}
</script>
<script type="text/javascript">
function pclk(event) {
var evt = event || window.event;
var target = evt.target || evt.srcElement;
var el = target.nodeType == 1 ? target : target.parentNode;
var tag = el.tagName.toLowerCase();
if (tag == 'span' || tag == 'a') {
ptk('pagination');
}
return true;
}
</script>
<div class="pagination" onmousedown="pclk(event);">Results Page:  <b>1</b>  <a href="/jobs?q=data+scientist+%2420%2C000&amp;start=10&amp;pp=AAoAAAAAAAAAAAAAAAEO1QtuAQAcV0On5GRNuVNfX3DvLB5DwDlHAQmhXt-__3zV9DuO"><span class="pn">2</span></a>  <a href="/jobs?q=data+scientist+%2420%2C000&amp;start=20&amp;pp=ABQAAAAAAAAAAAAAAAEO1QtuAQEBCmPGKhMQz85u5ISpEyEClAmlkC1p_zD1bViFQqk6f2dAEcrutDw3qNOlUeONuRrSv-w"><span class="pn">3</span></a>  <a href="/jobs?q=data+scientist+%2420%2C000&amp;start=30&amp;pp=AB4AAAAAAAAAAAAAAAEO1QtuAQEBCwEqNNi2AUithQkYTYojy-HvMvwqEDUkvfiqQVsmithQDsF9DWfzE3wpw3jyElTgX_l8Ur_XotxivPhaeIf6Uj2xG-0"><span class="pn">4</span></a>  <a href="/jobs?q=data+scientist+%2420%2C000&amp;start=40&amp;pp=ACgAAAAAAAAAAAAAAAEO1QtuAQEBDRH4pEot9_jiNLlFyah46PnzjpM5CVwAr20s0jCBXEFC6W8m-LcYUfIh2wdhsWVW1oSf-nPV7wZ8ufw520G_C6EQ3nL6dg1GML-491chs_orYNZX"><span class="pn">5</span></a>   <a href="/jobs?q=data+scientist+%2420%2C000&amp;start=10&amp;pp=AAoAAAAAAAAAAAAAAAEO1QtuAQAcV0On5GRNuVNfX3DvLB5DwDlHAQmhXt-__3zV9DuO"><span class="pn"><span class="np">Next »</span></span></a></div>
</td>
time: 46.4 ms
  • We should grab our data listing by listing as opposed to scraping all jobs, then all companies, etc.. for the entire page. This will ensure that missing data won't throw off which values should belong to which job.

  • To do this, we can make a list of each job posting because they're all in div tags with a class that ends in 'result.'

  • There are 5 sponsored jobs per page, and these all have funky class name variations. The 10 regular listings also have different class names. The following list contains all possible class names:

    "row result", "row sjlast result", " row result", "lastRow row result"

Using regex will eliminate the need to explicitly search for each of these tags. The following code will account for everything above.

In [5]:
print soup.find('td',{'id':'resultsCol'}).find_all('div', {'class': re.compile("result$")})[0]
<div class="row result" data-jk="bfc578d23604c2fc" id="pj_bfc578d23604c2fc">
<!-- Previously this variable was used to indicate job board jobs, we have replaced that with a more accurate source type check -->
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0D_L61JJZVH4SBayrvFEFSIDhxtpSFhtUBfRgL_yS-y4KQnwxgyzWhCsPoBnwyjC7i1224RkXyNjCKyMmAnielysSbfAScZoI_OgdN6cH2LUiHe9CeKg9jliNU9_-djYbNyJ2RAQGYO0xiVwMIc48Gv2SU4pEDHN7m15eB7x73iTEBPM8wde0m5wE7qft5c1dgVbTtYuirrJ2MXi9rtgLqz_3z08os3klbEk0GHijoK6eLDwntLPeFfr_xOkMIszVbY7ND_um_nHwqIVA9lBSsA7fBd3NYR17wN-15QAauVNP4tWBdZdpy033Nv_vgEHgH6O9Hz3KRwdtj2ZeLqnT1meQmHWf-Gnbw41IN0pJwRzm0hokMz9Q-UuyORdDuPt3-5X7v9N3IHUaeM7qKkWprW1kxiCaoU9Pefy2_xxl35xabC-xqCbXAyXSf2xvRndf9KyuYie6AYtnBm6V22pbM65uMS7kdZM0yFQhHpIG2GnKKG4vBrBxpn3yUYO7M8kX8kivKGnkynA9NIY8D1q_-hoTWIEn7l245CVw6AmDCRFdFTaqg8vqFCxXlxKvyk3UbduU1EpGT66HM8X5TebNvOoZUzvecR4zbrctZR8KUXa_3_xuhhS692adF_k0TENlH3kKF9ciTIzZh5WXG3cVszyFJf-wDh3E5ZxqeivjbXE-DXY-1Re1Wz_kWb1BfRrMdfigQuItMUd3ZOj9v0yLbtWhJDA_uE1nBn-A3z4mF6pNkFRIqLFE31v8QYj50yhtF8NSR1ny-YxXwMfJYzHNmbWKTtRloQzpHG6kyCOyGEW8FxlNBvJk49yVGY0YcbGZ36RRm45Ek6QLsxt2D0tS8ZtkK1mZyhyq_vAnRLe41m367xi--p8wKwsk_ntnHPnVUW8AEUTJ91pycaiqK5-nLPm2blu-1HbbqL8lMtnEghTNUOD5kosJwFtoyQFF4SLf9Oh1xwepQhqEEFffPjbG007k1VjztVdhNlOEn3Y-nQX4OEdHQfagG7RnTDoi_JyjUQGlR7sBLb-exQSCcJrExd&amp;p=1&amp;sk=&amp;fvj=0" id="sja1" onclick="setRefineByCookie(['salest']); sjoc('sja1',0); convCtr('SJ', pingUrlsForGA)" onmousedown="sjomd('sja1'); clk('sja1');" rel="nofollow" target="_blank" title="Statistical Modeling Analyst">Statistical Modeling Analyst</a>
<br/>
<div class="sjcl">
<span class="company">
<a class="turnstileLink" data-tn-element="companyName" href="/cmp/Nestle-USA" onmousedown="this.href = appendParamsOnce(this.href, 'from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491')" target="_blank">
        Nestle USA</a></span>

 - <a class="turnstileLink slNoUnderline " data-tn-element="reviewStars" data-tn-variant="cmplinktst2" href="/cmp/Nestle-USA/reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Statistical+Modeling+Analyst&amp;fromjk=bfc578d23604c2fc&amp;jcid=bb384ca0a6d3d491');" target="_blank" title="Nestle USA reviews">
<span class="ratings"><span class="rating" style="width:43.8px"><!-- --></span></span>
<span class="slNoUnderline">724 reviews</span></a>
 - <span class="location">Arlington, VA</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tr><td class="snip">
<span class="summary">Demand signal <b>data</b> will be incorporated into forecast reporting and utilized for statistical modeling. Participate in demand signal <b>data</b> reporting and analysis...</span>
</td></tr></table>
<div class="sjCapt">
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" jasx_serpsjlabel_poststGray ">Sponsored</span> - <span class="tt_set" id="tt_set_10"><a class="sl resultLink save-job-link " href="#" id="sj_bfc578d23604c2fc" onclick="changeJobState('bfc578d23604c2fc', 'save', 'linkbar', true); return false;" title="Save this job to my.indeed">save job</a></span><div class="edit_note_content" id="editsaved2_bfc578d23604c2fc" style="display:none;"></div><script>window['sj_result_bfc578d23604c2fc'] = {"showSource": false, "source": "Nestle USA", "loggedIn": false, "showMyJobsLinks": false,"undoAction": "unsave","jobKey": "bfc578d23604c2fc", "myIndeedAvailable": true, "showMoreActionsLink": false, "resultNumber": 10, "jobStateChangedToSaved": false, "searchState": "q=data scientist $20,000&amp;", "basicPermaLink": "http://www.indeed.com", "saveJobFailed": false, "removeJobFailed": false, "requestPending": false, "notesEnabled": false, "currentPage" : "serp", "mjwebtransgroupactive" : false, "sponsored" : true,"showSponsor" : true,"reportJobButtonEnabled": false, "showMyJobsHired": false, "showSaveForSponsored": true, "showJobAge": true};</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
time: 70.3 ms
  • In order to make predictions about salaries, I want to collect the job title, company, location, description, and most importantly, SALARY.
  • Create functions for finding each attribute for each job in the list, so that the overall webscraping script is easy to read
In [6]:
'''
Each function finds the specific HTML tag and/or attribute/ID 
that leads directly to the information needed.

In order to prevent errors, each has a 'try, except' statement.
Whenever a listing doesn't provide anything for any of the attributes,
we will simply 'pass' and essentially skip the attribute.
'''


def get_job(webpage):
    tag = webpage.find('a', title=True, attrs={'data-tn-element':'jobTitle'})
    try:
        return tag['title']
    except:
        pass

def get_company(webpage):
    tag = webpage.find('span', attrs={'class':'company'})
    try:
        return tag.text.strip('\n') # many company names are messy and have newline code that we can strip
    except:
        pass
    
def get_location(webpage):
    tag = webpage.find('span', attrs={'class':'location'})
    try:
        return tag.text
    except:
        pass    

def get_salary(webpage):
    try:
        return webpage.find('table').tr.td.nobr.renderContents() ## for regular listings
    except:
        try:
            return webpage.find('div').div.text ## for sponsored listings
        except:
            pass

def get_description(webpage):
    description = webpage.find('span', attrs={'itemprop':"description"})
    try:
        return description.text.strip('\n') # many descriptions have the newline code that can be stripped
    except:
        pass
time: 31.8 ms
  • I found that some numbers are in a string and we can't just convert to a number with pandas. Below is a function to handle this:
In [5]:
## given a string of a number with commas, convert to float
def str_to_number(string):
    import locale 
    string = string.strip('$')
    locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') #for american comma notation
    num = locale.atof(string)
    return float(num)
time: 8.39 ms

Step II. Scrape job postings from Indeed.com & Clean

The general outline of my scraper function is as follows:

  • (for all runs post initial run) compile all previously collected data into a pandas dataframe
  • Check the main page for the total number of listings, so the scraper knows how long to run
  • For each page and for each job listing, use the attribute functions
  • #### Spoiler!! The following function scrapes the job postings, puts them into a Pandas dataframe, then exports them to a csv file.
  • After running the scraper a few times, I decided that I wanted to begin the function by loading each previously exported list of jobs into a dataframe and try to only add new jobs that haven't already been exported.
In [6]:
## import the results that have been previously exported
def compile_files():
    import glob
    import pandas as pd
    import numpy as np
    indeed_csvs =  '../../DC-DSI4/projects/03-Project/indeed/'
    files = glob.glob(indeed_csvs + '*.csv') # get a list of the csv files
    indeed_final = pd.DataFrame(columns=['job','company','location','salary','description'])
    for f in files: # read each csv file in
        f = pd.read_csv(f, names=['job','company','location','salary','description'],low_memory=False)
        indeed_final = indeed_final.append(f)
    # drop duplicates, get rid of jobs without salaries
    indeed_final.drop_duplicates(inplace=True)
    indeed_final.dropna(subset=['salary'],how='any',inplace=True)

    print 'Size = ',len(indeed_final)
    print 'Salaries = ', len(indeed_final[indeed_final.salary.notnull()])

    return indeed_final
time: 20.5 ms

Without further ado, the scraping function! This includes everything noted in Step 1.

In [9]:
def scrape_indeed():
    
    '''
    /// 1/6. Compile previously scraped results to see if there are new jobs to add \\\
    '''
    indeed = compile_files()  
    indeed.reset_index(drop=True,inplace=True)
    base = len(indeed)
    
    
    
    '''
    /// 2/6. Set up function with necessary libraries, print status notifications, set URL format \\\
    '''
    import requests
    from bs4 import BeautifulSoup
    import datetime
    import time
    import re
    import numpy as np
    
    start = datetime.datetime.now()
    
    print 'Start time: ',start.strftime("%Y-%m-%d %H:%M:%S")
    print 'Base file has ', base, ' records'
    print 'Salaries = ', len(indeed[indeed.salary.notnull()])
    
    ## specify structure of URL. stop at 'start=' so that we can 
    ## dynamically change the URL and flip through pages of results
    url_base = "http://www.indeed.com/jobs?q=data+scientist+%2420%2C000&fromage=last&start="
    

    
    '''
    /// 3/6. Find the total number of job listings for the search from the first page of results \\\
    '''
    x=0 # if we set the start variable in the URL to 0 to begin with, it will pull up results 1-10
    url = url_base+str(x) 

    page = requests.get(url).content
    soup = BeautifulSoup(page,'lxml')
    print 'Page scraped & souped'
    
    for results in soup.find('div', attrs={'id':'searchCount'}):
        count = str(results).split()         # take the full line that says 'Jobs x to y of z' and turn into a list
        total = count[len(count)-1]          # set total to z, the total number of results
        total = str_to_int(total) # since there are commas in numbers > 999, this function will deal with that and convert to int
        ## I've found that after a few hundred pages, no new jobs are found
        ## If we reduce the total by a bit, we can make sure the function isn't running
        ## for an unnecessary amount of time and making an unnecessary number of requests to Indeed.com
        total = total/5
                
        
    '''
    /// 4/6. Scrape!!! \\\
    '''
    while x <= total: 
        url_new_page = url + str(x)
        page = requests.get(url_new_page).content
        soup = BeautifulSoup(page,"lxml")
        
        # the top of the page says 'Showing results X through Y of Z.'
        # use this for status notification
        for num_listings in soup.find('div', attrs={'id':'searchCount'}) :
            num_listings = num_listings.split()[3]
        
        main = soup.find('td',{'id':'resultsCol'})   # limit our searching to solely the results portion of the page
        results = main.find_all('div', {'class': re.compile("result$")}) # create a list consisting only of the 15 results

        for i in range(len(results)):
            job = get_job(results[i])
            company = get_company(results[i])         # put all companies for each posting on curent results page into companies list
            location = get_location(results[i])       # put all locations for each posting on current results page into locations list
            salary = get_salary(results[i])           # put all salaries for each posting on current results page into salaries list
            description = get_description(results[i]) # put all descriptions for each posting on current results page into descriptions list

            add_job = pd.DataFrame([[job, company, location, salary, description]], columns = ['job','company','location','salary','description'])
            a = np.array(add_job)
            # If we don't already have this job, add it
            if (indeed == a).all(1).any() == False:
                indeed = indeed.append(add_job)                


        '''
        /// 5/6. Print status notifications and prep for next page of results
        '''       
        x+=10
        new = len(indeed) - base
        elapsed = datetime.datetime.now() - start
        remaining = total - x
        est_pages = remaining/10
        
        print 'Added ', new, ' jobs-- scraped ',num_listings,' of ', total, ' listings in ', elapsed, '; ', est_pages, ' pages remaining'
    
        time.sleep(0.5)
    
    
    '''
    /// 6/6. CONCLUSION. Print elapsed time and export results to CSV
    '''
    ## set a variable to use in the filename so that
    ## they're always unique and never overwritten
    finish = datetime.datetime.now()
    now = finish.strftime("%Y-%m-%d %H:%M:%S")
    elapsed = finish-start
    print 'Finish time: ',now
    print 'Elapsed: ',elapsed

    # export
    indeed = pd.DataFrame(indeed)
    indeed.drop_duplicates(inplace=True)
    indeed.to_csv('/Users/jennydoyle/Desktop/dsi/DC-DSI4/projects/03-Project/indeed/'+now+'.csv',sep=',', encoding='utf-8')
    return indeed

    print 'Base file has ', len(indeed), ' records'
    print 'Salaries = ', len(indeed[indeed.salary.notnull()])    
time: 212 ms

Load in the the data of scraped salaries

In [7]:
indeed = compile_files()
indeed.reset_index(drop=True,inplace=True)
Size =  864
Salaries =  864
time: 601 ms

Clean, clean, clean

Clean up salaries

In [8]:
import numpy as np

## create a sub-df consisting only of jobs with annual salaries
df=indeed[indeed.salary.notnull()&indeed.salary.str.contains('year')]
df.salary = df.salary.astype(str)
print "Number of annual salaries = ", len(df)

## Some salaries are listed as a range
## turn the salary into a list so we can grab the high and low ends, then average
df['salary_list'] = df.salary.str.split()

mask = df.salary.str.contains('-')
df['low_end'], df['high_end'], df['salary_clean'] = np.NaN, np.NaN, np.NaN
df['low_end'][mask] = map(lambda x: x[0],df.salary_list.loc[mask])
df['high_end'][mask] = map(lambda x: x[2],df.salary_list.loc[mask])
Number of annual salaries =  653
time: 559 ms
In [9]:
# Set the Salary_clean field = first element in salary list (intended to grab salaries that don't list a range)
df.salary_clean[df.salary.notnull()]= [x[0] for x in df.salary_list]
df.salary_clean[df.low_end.notnull()&df.high_end.notnull()] = np.NaN

# convert to numeric so that we can average the ranges
for col in ['salary_clean','low_end','high_end']:
    df[col][df[col].notnull()] = [str_to_number(x) for x in df[col][df[col].notnull()]]

# average out ranges
df.salary_clean[df.salary_clean.isnull()] = (df.low_end + df.high_end) / 2
time: 786 ms

Clean up locations

In [10]:
## remove areas in parentheses
df.location = df.location.str.upper()
df.location = df.location.str.replace('(\((.*?)\))','')
df.location = df.location.str.strip()

## remove zip codes
df.location = df.location.str.replace(r'(\d{5}(\-\d{4})?)$','')
df.location = df.location.str.strip()
time: 239 ms
In [11]:
## create feature with states
df['state'] = df.location.str.findall('\,\s(\D{2})$')
## hmm why is it in a list? take out
df.state = [i[0] if len(i)>0 else None for i in df.state]

## remove state from location
df.location = df.location.str.replace('(\,\s\D{2})$','')
time: 185 ms

Clean up companies

In [12]:
df.company = df.company.str.strip()
df.company = df.company.str.upper()
time: 104 ms

We want to predict a binary variable - whether the salary was low or high. Compute the median salary and create a new binary variable that is true when the salary is high (above the median)

In [13]:
###
### BINARY TARGET FEATURE -- above (1) median or below (0) 
###

df = df[df.salary_clean.notnull()]
median_salary = np.median(df.salary_clean)

# set our binary variable high_salary = 1, then wherever the clean salary is below the median, change to 0
df['high_salary'] = 1
df['high_salary'][df.salary_clean <= median_salary] = 0 
time: 70.1 ms
In [14]:
median_salary, len(df)
Out[14]:
(87048.0, 653)
time: 5.07 ms

Feature Engineering

Now it's time to think about what I can do with my existing features: Job, Company, Location, State, Description. I want to avoid features that would get too specific for the training set and wouldn't apply to the test set. With that said, I will eliminate the Job and Company features because they're not clean and too specific.

My plan is, however, to use job titles to create more general features that I can flag for each job. I also will create a count vectorizer to pull in the descriptions.

Job Title Keywords

In [15]:
df.job = df.job.str.upper()
df['analyst'] = 0
df['analyst'][df.job.str.contains('ANALY')] = 1

df['statistician'] = 0
df['statistician'][df.job.str.contains('STATISTIC')] = 1

df['machine_learning'] = 0
df['machine_learning'][df.job.str.contains('MACHINE')] = 1

df['research'] = 0
df['research'][df.job.str.contains('RESEARCH')] = 1

df['science'] = 0
df['science'][df.job.str.contains('SCIEN')] = 1

df['engineer'] = 0
df['engineer'][df.job.str.contains('ENGIN')] = 1

df['entry_level'] = 0
df['entry_level'][df.job.str.contains('\WI\W')] = 1
df['entry_level'][df.job.str.contains('\WI$')] = 1
df['entry_level'][df.job.str.contains('ENTRY_LEVEL')] = 1
df['entry_level'][df.job.str.contains('1')] = 1

df['mid_level'] = 0
df['mid_level'][df.job.str.contains('MANAGER')] = 1
df['mid_level'][df.job.str.contains('MID_LEVEL')] = 1
df['mid_level'][df.job.str.contains('\WII\W')] = 1
df['mid_level'][df.job.str.contains('\WII$')] = 1
df['mid_level'][df.job.str.contains('2')] = 1
df['mid_level'][df.job.str.contains('ASSISTANT')] = 1

df['senior_level'] = 0
df['senior_level'][df.job.str.contains('\WIII\W')] = 1
df['senior_level'][df.job.str.contains('\WIII$')] = 1
df['senior_level'][df.job.str.contains('3')] = 1
df['senior_level'][df.job.str.contains('SR\W')] = 1
df['senior_level'][df.job.str.contains('SENIOR')] = 1
df['senior_level'][df.job.str.contains('LEAD')] = 1
df['senior_level'][df.job.str.contains('PRINCIPAL')] = 1
df['senior_level'][df.job.str.contains('DIRECTOR')] = 1
time: 1.38 s

Job Description - Count Vectorizer

In [16]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

df.reset_index(inplace=True)

df.fillna('None',inplace=True)

cvec = CountVectorizer(stop_words='english')
cvec.fit(df['description'])
Out[16]:
CountVectorizer(analyzer=u'word', binary=False, decode_error=u'strict',
        dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words='english',
        strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b',
        tokenizer=None, vocabulary=None)
time: 1.34 s
In [18]:
cvec_table  = pd.DataFrame(cvec.transform(df['description']).todense(),
             columns=cvec.get_feature_names())
time: 103 ms
In [19]:
from sklearn.preprocessing import StandardScaler 

scale_me = []
dont_scale_me = []

for col in cvec_table.columns:
    if cvec_table[col].max() > 1:
        scale_me.append(col)
    else:
        dont_scale_me.append(col)
        

ss = pd.DataFrame(StandardScaler().fit_transform(cvec_table[scale_me]),columns=scale_me)
cvec = pd.merge(cvec_table[dont_scale_me],ss,right_index=True, left_index=True)
time: 654 ms
In [20]:
include = df.columns.drop(['location','state','index','job','company','salary','description','salary_list','low_end','high_end','salary_clean','high_salary'])
time: 3.21 ms
In [21]:
new_df = pd.merge(df[include],cvec,right_index=True, left_index=True)
time: 14.2 ms

Step IV. Feature Selection

In [22]:
import pandas as pd
from sklearn.model_selection import cross_val_score, train_test_split

# set y
y = df.high_salary
time: 358 ms
In [23]:
X = new_df
y = df.high_salary
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
time: 104 ms

Now I'm going to look at RandomForest, ExtraTrees and check out the feature importance. I'll experiment with the number of features to use, and with bagging and boosting. I'll also use these features back on Logistic Regression, since it doesn't have the feature_importances attribute.

In [24]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, BaggingClassifier, AdaBoostClassifier
time: 713 ms
In [25]:
def test_features(model,X_train,y_train):
    model.fit(X_train,y_train)
    features = pd.DataFrame(sorted(zip(model.feature_importances_,X.columns), key=lambda pair: pair[0], reverse=True),columns=['Importance','Feature'])
    
    for nums in [len(features),500,300,200,100,60,30,20]:   
        X_train_new = X_train[features.Feature[0:nums]]
        s = cross_val_score(model, X_train_new, y_train, cv=5, n_jobs=-1)

        print '0:',nums,'--', s.mean().round(3), s.std().round(3)

    print
    print 'try with Bagging:'
    for nums in [len(features),500,300,200,100,60,30,20]:   
        X_train_new = X_train[features.Feature[0:nums]]
        s = cross_val_score(BaggingClassifier(model), X_train_new, y_train, cv=5, n_jobs=-1)

        print '0:',nums,'--', s.mean().round(3), s.std().round(3)

#     print
#     print 'try with AdaBoost:'
#     for nums in [len(features),500,300,200,100,60,30,20]:   
#         X_train_new = X_train[features.Feature[0:nums]]
#         s = cross_val_score(AdaBoostClassifier(model), X_train_new, y_train, cv=5, n_jobs=-1)

#         print '0:',nums,'--', s.mean().round(3), s.std().round(3)
            
    print
    print 'try with Logistic Regression:'
    for nums in [len(features),500,300,200,100,60,30,20]:   
        X_train_new = X_train[features.Feature[0:nums]]
        s = cross_val_score(LogisticRegression(), X_train_new, y_train, cv=5, n_jobs=-1)

        print '0:',nums,'--', s.mean().round(3), s.std().round(3)
      
    
    print
    print 'try with Logistic Regression & BAGGING:'
    for nums in [len(features),500,300,200,100,60,30,20]:   
        X_train_new = X_train[features.Feature[0:nums]]
        s = cross_val_score(BaggingClassifier(LogisticRegression()), X_train_new, y_train, cv=5, n_jobs=-1)

        print '0:',nums,'--', s.mean().round(3), s.std().round(3)        
time: 57.4 ms
In [26]:
test_features(RandomForestClassifier(),X_train,y_train)   
0: 2120 -- 0.76 0.025
0: 500 -- 0.776 0.038
0: 300 -- 0.771 0.051
0: 200 -- 0.771 0.037
0: 100 -- 0.785 0.049
0: 60 -- 0.78 0.026
0: 30 -- 0.748 0.026
0: 20 -- 0.767 0.034

try with Bagging:
0: 2120 -- 0.778 0.023
0: 500 -- 0.808 0.022
0: 300 -- 0.794 0.03
0: 200 -- 0.799 0.008
0: 100 -- 0.813 0.043
0: 60 -- 0.796 0.036
0: 30 -- 0.771 0.018
0: 20 -- 0.785 0.03

try with Logistic Regression:
0: 2120 -- 0.757 0.028
0: 500 -- 0.785 0.014
0: 300 -- 0.787 0.022
0: 200 -- 0.792 0.028
0: 100 -- 0.78 0.042
0: 60 -- 0.767 0.02
0: 30 -- 0.771 0.022
0: 20 -- 0.741 0.027

try with Logistic Regression & BAGGING:
0: 2120 -- 0.757 0.02
0: 500 -- 0.764 0.021
0: 300 -- 0.778 0.016
0: 200 -- 0.774 0.04
0: 100 -- 0.783 0.034
0: 60 -- 0.771 0.039
0: 30 -- 0.762 0.033
0: 20 -- 0.746 0.019
time: 47.3 s
In [28]:
model = RandomForestClassifier(bootstrap= True, min_samples_leaf= 1, n_estimators= 300, min_samples_split= 2,criterion='entropy',max_features= 3, max_depth= 3)
test_features(model,X_train,y_train)    
0: 2120 -- 0.629 0.028
0: 500 -- 0.762 0.021
0: 300 -- 0.767 0.024
0: 200 -- 0.735 0.037
0: 100 -- 0.725 0.023
0: 60 -- 0.732 0.044
0: 30 -- 0.707 0.043
0: 20 -- 0.703 0.047

try with Bagging:
0: 2120 -- 0.558 0.052
0: 500 -- 0.684 0.066
0: 300 -- 0.712 0.05
0: 200 -- 0.728 0.045
0: 100 -- 0.746 0.034
0: 60 -- 0.728 0.026
0: 30 -- 0.725 0.025
0: 20 -- 0.716 0.038

try with Logistic Regression:
0: 2120 -- 0.757 0.028
0: 500 -- 0.769 0.027
0: 300 -- 0.748 0.031
0: 200 -- 0.746 0.043
0: 100 -- 0.767 0.029
0: 60 -- 0.753 0.026
0: 30 -- 0.737 0.033
0: 20 -- 0.737 0.036

try with Logistic Regression & BAGGING:
0: 2120 -- 0.755 0.026
0: 500 -- 0.755 0.025
0: 300 -- 0.739 0.022
0: 200 -- 0.744 0.016
0: 100 -- 0.757 0.013
0: 60 -- 0.753 0.017
0: 30 -- 0.723 0.033
0: 20 -- 0.728 0.023
time: 12min 50s
In [29]:
test_features(ExtraTreesClassifier(),X_train,y_train)   
0: 2120 -- 0.753 0.016
0: 500 -- 0.789 0.015
0: 300 -- 0.803 0.034
0: 200 -- 0.81 0.04
0: 100 -- 0.785 0.038
0: 60 -- 0.746 0.062
0: 30 -- 0.771 0.04
0: 20 -- 0.744 0.046

try with Bagging:
0: 2120 -- 0.783 0.015
0: 500 -- 0.801 0.052
0: 300 -- 0.808 0.049
0: 200 -- 0.801 0.043
0: 100 -- 0.806 0.043
0: 60 -- 0.794 0.028
0: 30 -- 0.764 0.046
0: 20 -- 0.753 0.036

try with Logistic Regression:
0: 2120 -- 0.757 0.028
0: 500 -- 0.785 0.03
0: 300 -- 0.787 0.017
0: 200 -- 0.792 0.028
0: 100 -- 0.794 0.049
0: 60 -- 0.794 0.037
0: 30 -- 0.787 0.029
0: 20 -- 0.769 0.029

try with Logistic Regression & BAGGING:
0: 2120 -- 0.746 0.024
0: 500 -- 0.764 0.032
0: 300 -- 0.753 0.031
0: 200 -- 0.76 0.047
0: 100 -- 0.774 0.056
0: 60 -- 0.785 0.042
0: 30 -- 0.783 0.038
0: 20 -- 0.76 0.039
time: 59.3 s

Let's check out feature importances for other classifiers

In [30]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB


names = ["Nearest Neighbors", 
         "Linear SVM",
         "RBF SVM",
         "Decision Tree",
         "Naive Bayes"]
classifiers = [
    KNeighborsClassifier(5),
    SVC(kernel="linear", C=0.025),
    SVC(gamma=2, C=1),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
    GaussianNB()]

# a few of these classifiers don't have the feature_importances attribute, so we will use these features for them
model = RandomForestClassifier().fit(X_train,y_train)
features = pd.DataFrame(sorted(zip(model.feature_importances_,X.columns), key=lambda pair: pair[0], reverse=True),columns=['Importance','Feature'])
features = features[features.Importance>0]

for clf,name in zip(classifiers,names):
    print name
    print '=================='
    try:
        test_features(clf,X_train,y_train)
    except:
        for nums in [len(features),500,300,200,100,60,30,20]:   
            X_train_new = X_train[features.Feature[0:nums]]
            s = cross_val_score(clf, X_train_new, y_train, cv=5, n_jobs=-1)

            print '0:',nums,'--', s.mean().round(3), s.std().round(3)

    print
    print
Nearest Neighbors
==================
0: 695 -- 0.666 0.051
0: 500 -- 0.664 0.047
0: 300 -- 0.643 0.022
0: 200 -- 0.657 0.031
0: 100 -- 0.673 0.032
0: 60 -- 0.73 0.025
0: 30 -- 0.73 0.048
0: 20 -- 0.712 0.038


Linear SVM
==================
0: 695 -- 0.734 0.043
0: 500 -- 0.744 0.025
0: 300 -- 0.744 0.03
0: 200 -- 0.732 0.046
0: 100 -- 0.739 0.02
0: 60 -- 0.739 0.007
0: 30 -- 0.742 0.025
0: 20 -- 0.739 0.029


RBF SVM
==================
0: 695 -- 0.579 0.016
0: 500 -- 0.579 0.016
0: 300 -- 0.583 0.021
0: 200 -- 0.581 0.021
0: 100 -- 0.577 0.029
0: 60 -- 0.62 0.035
0: 30 -- 0.652 0.03
0: 20 -- 0.641 0.056


Decision Tree
==================
0: 2120 -- 0.705 0.031
0: 500 -- 0.721 0.029
0: 300 -- 0.723 0.034
0: 200 -- 0.723 0.038
0: 100 -- 0.728 0.041
0: 60 -- 0.728 0.041
0: 30 -- 0.73 0.041
0: 20 -- 0.737 0.041

try with Bagging:
0: 2120 -- 0.721 0.05
0: 500 -- 0.739 0.033
0: 300 -- 0.739 0.03
0: 200 -- 0.748 0.027
0: 100 -- 0.748 0.032
0: 60 -- 0.746 0.04
0: 30 -- 0.744 0.043
0: 20 -- 0.719 0.031

try with Logistic Regression:
0: 2120 -- 0.757 0.028
0: 500 -- 0.776 0.031
0: 300 -- 0.749 0.046
0: 200 -- 0.753 0.049
0: 100 -- 0.749 0.048
0: 60 -- 0.751 0.038
0: 30 -- 0.742 0.048
0: 20 -- 0.726 0.049

try with Logistic Regression & BAGGING:
0: 2120 -- 0.739 0.02
0: 500 -- 0.778 0.031
0: 300 -- 0.749 0.046
0: 200 -- 0.753 0.032
0: 100 -- 0.758 0.031
0: 60 -- 0.753 0.035
0: 30 -- 0.751 0.036
0: 20 -- 0.735 0.031


Naive Bayes
==================
0: 2120 -- 0.572 0.021
0: 500 -- 0.609 0.041
0: 300 -- 0.572 0.034
0: 200 -- 0.698 0.009
0: 100 -- 0.705 0.054
0: 60 -- 0.694 0.068
0: 30 -- 0.693 0.089
0: 20 -- 0.718 0.04

try with Bagging:
0: 2120 -- 0.632 0.035
0: 500 -- 0.677 0.039
0: 300 -- 0.702 0.053
0: 200 -- 0.748 0.03
0: 100 -- 0.753 0.024
0: 60 -- 0.737 0.031
0: 30 -- 0.73 0.038
0: 20 -- 0.718 0.037

try with Logistic Regression:
0: 2120 -- 0.757 0.028
0: 500 -- 0.776 0.026
0: 300 -- 0.748 0.018
0: 200 -- 0.753 0.018
0: 100 -- 0.755 0.01
0: 60 -- 0.696 0.029
0: 30 -- 0.702 0.036
0: 20 -- 0.702 0.046

try with Logistic Regression & BAGGING:
0: 2120 -- 0.751 0.026
0: 500 -- 0.764 0.037
0: 300 -- 0.748 0.022
0: 200 -- 0.744 0.008
0: 100 -- 0.746 0.015
0: 60 -- 0.696 0.02
0: 30 -- 0.7 0.036
0: 20 -- 0.698 0.044


time: 1min 11s

Generally, it seems like 100 features is a safe number to go with. I'll use a RF model to create a standard list of features to use.

In [47]:
model = RandomForestClassifier()
model.fit(X_train,y_train)

features = pd.DataFrame(sorted(zip(model.feature_importances_,X.columns), key=lambda pair: pair[0], reverse=True),columns=['Importance','Feature'])
X_new = X[features.Feature[0:100]]

X_train, X_test, y_train, y_test = train_test_split(X_new, y,stratify=y, test_size=0.33, random_state=42)
time: 1.56 s

Step V. Modeling

With the list of 100 features, test out more models.

In [48]:
# Compare Algorithms
import pandas
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
# prepare configuration for cross validation test harness
seed = 42
# prepare models
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('RF', RandomForestClassifier()))
models.append(('RF_B', BaggingClassifier(RandomForestClassifier())))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC()))
# evaluate each model in turn
results = []
names = []
scoring = 'accuracy'
for name, model in models:
    kfold = model_selection.KFold(n_splits=10, random_state=seed)
    cv_results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring=scoring)
    results.append(cv_results)
    names.append(name)
    msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
    print(msg)
# boxplot algorithm comparison
fig = plt.figure()
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
plt.boxplot(results)
ax.set_xticklabels(names)
plt.show()
LR: 0.770303 (0.044861)
LDA: 0.742774 (0.047512)
KNN: 0.641399 (0.068718)
CART: 0.707436 (0.055631)
RF: 0.777925 (0.040900)
RF_B: 0.784079 (0.044139)
NB: 0.739510 (0.068383)
SVM: 0.727669 (0.061613)
time: 34.5 s

After trying out several models, I'm going to settle on a RandomForest with Bagging, using the top 100 features (by importance)

In [49]:
model = RandomForestClassifier(bootstrap=True,n_estimators= 300, min_samples_split= 2, criterion= 'entropy',max_features= 3, max_depth= 3)
model = BaggingClassifier(model)
model.fit(X_train,y_train)

print s.mean().round(3), s.std().round(3)
0.787 0.049
time: 12.9 s

Step VI. Evaluate Model

In [52]:
from __future__ import division
def do_cm_cr(model):
    from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix, classification_report

    expected = y_test
    predicted = model.predict(X_test)
    c_report = classification_report(expected, predicted)
#     c_report = pd.DataFrame(c_report)
    c_matrix = pd.DataFrame(confusion_matrix(y_test, predicted),columns=['Predicted No','Predicted Yes'],index=['Actual No','Actual Yes'])
#     return c_report, c_matrix
    from IPython.display import display

    print c_report
    TP = c_matrix['Predicted Yes'][1]
    FP = c_matrix['Predicted Yes'][0]
    TN = c_matrix['Predicted No'][1]
    FN = c_matrix['Predicted No'][0]
    N = TP + FP + TN + FN
    display(c_matrix)
#     print 'TP:', TP
#     print 'FP:', FP
#     print 'TN:', TN
#     print 'FN:', FN
 
do_cm_cr(model)
             precision    recall  f1-score   support

          0       0.78      0.78      0.78       108
          1       0.78      0.78      0.78       108

avg / total       0.78      0.78      0.78       216

Predicted No Predicted Yes
Actual No 84 24
Actual Yes 24 84
time: 2.42 s
In [51]:
model.score(X_test,y_test)
Out[51]:
0.77777777777777779
time: 2.28 s

Results:

My model is able to predict whether or not a job posting will have a salary that's above or below the median by using the job title and description with about 78% precision.

I have worked and reworked this project multiple times and have resulted with scores far lower than 78%, so I'm pretty happy with this. I think it's safe to say that more training data is needed to improve this model.