I always had a gut feeling in me, about the actual ‘work’ a data scientist needs to do in real life. Many potential data scientists are interested in knowing what it is that those on the other side keep themselves busy with all day, and so I thought that having a few connections provide their insight might be a useful endeavor.
What follows is a second round of some of the great feedback I received via email and LinkedIn messages from those who were interested in providing a few paragraphs on their daily professional tasks. The short daily summaries are presented in full and without edits, allowing the quotes to speak for themselves.
Romita Agarwal is a data analyst for the City of Dallas, located… well, in Dallas 🙂
First, I would like to thank Matthew for giving me this opportunity to share my experience. I recently started my career as a data analyst after doing my Masters in Analytics. I have my undergraduate in Computer Science with more than 5 years of experience in software development, working closely with business and understanding their requirements and managing small teams.
I typically start my day with reading articles and blogs on machine learning, AI and listening to podcasts (best way to utilize my commute time). For me one of the most important role of a data scientist is to figure out the business requirement to deliver the best use of the end-product. I work as a data analyst at the City of Dallas and my role spans in several areas starting from gathering requirements, coordinating with multiple teams, cleaning the data, building machine learning models, and visualization. At my workplace we follow the principle that a comprehensive smart city solution encompasses technology, data, intelligence and application.
In my job a clear understanding of the data and it’s cleaning is crucial to be able to use it for real purpose. I frequently meet the business for defining and refining the requirements that can help them be more efficient and work resourcefully. Quite a part of my day is spent in doing research and exploring different machine learning techniques, collaborating and communicating with the team and brainstorming on what can be added to improve business insights. I enjoy my work as it is quite dynamic. It is not just working on one tool or following a set of steps, it gives you a chance to experiment, explore and be creative. Occasionally I need to build a visual straight from excel or Cognos or Tableau and sometimes build models and give insights to the business.
Wafic El-Assi is a data scientist at Nulogy, and is based in Toronto, Canada.
It is difficult to take the last year and a half I spent at Nulogy and average them to figure out what a ‘typical’ day looks like. Therefore, I will do my best to describe a hypothetical day that touches on my three primary functions as a data scientist at a small SaaS company:
- Supporting Nulogy with Analytics and actionable insights
- Adding intelligence using machine learning techniques to our software products
- Researching and developing new data products that can impact our bottom line
I often start at 9.30 by writing down my objectives for the day as per the iteration plan. This is followed by a team stand-up at 9:45 AM. Our team is currently comprised of a product manager, interns and myself. We go over our iteration progress, discuss any potential blockers and ask for help if needed. I usually keep the time between 10:00 AM to 12:00 PM reserved for meetings, as I find the 2-hour period before lunch too short and too distracting for time-consuming tasks. The meetings serve as an interface with both internal and external stakeholders (sales, customer success, clients, etc.), where I am expected to deliver actionable data driven insights. I use any extra time I have within that period to catch up on emails.
After lunch, I look back at my list of objectives and prioritize any work that needs a deep dive. This includes your usual data extraction, cleaning, transforming, followed by statistical analysis and the training the testing of machine learning models. I find that this usually takes around 3 to 4 hours. My primary tools of choice are R, SQL and python, and are used in that particular sequence. As the time nears 5:00 PM, I shift focus to research and development. I take an hour to read up on recently published work, or to build a working MVP for a data product or solution that aligns with our long-term strategy. All in all, anything I do has to positively impact Nulogy’s bottom line. A few things left out include people management, communicating results and progress to executive leadership and automating certain tasks when possible.
Dominic Ligot is a managing consultant at Cirrolytix, based in Philippines.
There is no single typical day, but my days do follow some patterns so maybe I’ll talk about some repeating motifs within the past months.
Training days. I am an educator, and at least 25% of my days is spent in classrooms and webinars teaching professionals about data. I get attendees from all walks of life, but the top three professions that I meet are digital marketers, financial analysts and accountants, and human resource managers – who for me represent the top three functional areas analytics can make a difference in any company. Best of all: there’s is always a potential client or collaborator in every class.
Client days. I run my own data consultancy and easily 30% of my time is spent with proposals, meeting prospects, and doing client workshops. The paperwork is the worst part, but it is a necessary evil to get to the data science work. The best part is being with a client, finding their pain points, and I always live for the moment a data-driven solution presents itself that solves real life problems.
Deep tech days. 30% of my days and all of my nights are in development. Given my business, I’m now closer to software development than just data science, but all of the applications we develop are data products – whether it’s a scoring app that can improve lead conversion, to a customer segmentation and sentiment analysis dashboard that feeds off social media. Lately, we’ve been getting a few more esoteric requests such as chatbots, imaging lead generation, and document classification – and this part really excites me because it means the data science is maturing to concrete applications. Data engineering is more than just moving tables from one place to another, it’s creating data products and crystalizing solutions for people.
Speaking days. I get invites to do a keynote here and there, and lately that’s 5% of my days. The topics that I enjoy most are about disruptive innovation from data and how data and technology always triggers cultural change in people. I take this time to build networks. Tip to data speakers: try to avoid powerpoints and show people real work like an app or website and just talk from it. The audience is bored enough with every other speakre already and they will thank you for showing them something real.
My remaining 10% is in the underground. We recently setup a group for analytics freelancers in Manila just this year to help encourage people to do data science and engineering for a living. (https://www.meetup.com/Manila-Analytics-Freelancers/).
The response has been overwhelming – we now have more than 300 members even without any paid marketing, and the freelancer meetup has spread all over the world with chapters in Europe, Asia, and some parts of the Americas now poppling up. I believe analytics is where web development was in the 90s and soon we hope freelancing for analytics will be as common place as freelancing for websites.
Thomas Scialom is a research scientist in NLP at Récital, located in Paris.
How did I get into Data Science? I went to engineering school where I specialized in finance and then worked for 3 years in a trading room. Algorithmic trading was the area I was most interested in and it turns out to be very similar with AI. After following Andrew Ng’s MOOC, I took a training course in Data Science at École Polytechnique. At this time, I met Gilles Moyse and decided to join his NLP startup, Récital, which is going to sponsor my thesis on automatic summarization which I hope will start in early 2018 at UPMC.
No working day is the same at Recital except for the pleasure of working there and solving new problems. Therefore, the best way I can tell about my daily basis is to describe the typical way to solve a problem. First you must think from the client/business point of view, understand what the goal is so you can ask the right questions as what are the best metrics for instance. Sometimes, you will maximize the recall, sometimes the precision. Once the problem is well posed, I read research papers on the topic to be aware of the state of the art. It can be frustrating but a good old TFDIF often works better than the hippest seq2seq. It is my job to pick in my toolbox the one that fits the problem the best. Then I implement and adapt the algorithms. Preprocessing is often the largest part of the code. And finally, I evaluate the results. Most of the time, I must iterate over those steps because of something I didn’t think about till I find a good solution. I don’t say “the solution”: there are always different approaches that all works well!
So I would say that my typical day is “read / think / write / think / read again / think / write”. And by write, I mean write code or research papers. By the way, we are writing a paper about question generation and are expecting to release the biggest French dataset of QA in first quarter 18’. What I love the most in my job is that I keep learning every day while contributing to solve real world problems.
I hope , that you now have an idea of the day to day activities in the life of a data scientist.