Introduction

Fig. 1: CRISP-DM process model

How can I share my own professional experiences with Master's students in business analytics and data science? Encourage them to self-reflect? I asked myself this question recently when I was invited to give a guest lecture at a University of Applied Sciences. My approach was to adapt the CRISP-DM process model [1] which is familiar to many students and practitioners in the field of data science. With a total of six steps (see Figure 1), the model describes how data sets can be systematically explored for patterns and relationships.

Fig. 2: CRISP-DM inspired career reflection model

By adjusting the objectives of the six steps, however, a model can also be formulated to reflect one's own career and job planning. This model can then be applied not only to students but also to professionals who are rethinking their current situation or are open to change. This results in the diagram shown in Figure 2 and which is discussed below. Such a reflection seems to be in the interest of the individual, especially due to the high fluctuation among data scientists [2].

Industrial and technology knowledge

The first two steps of the CRISP-DM are business understanding on the one hand and data understanding on the other. In the context of ones own career reflection, this refers to industrial and technology knowledge. The special feature of a data science practitioner consists of the ability to combine these skills. Of course, career entrants should not be discouraged by the fact that they do not (yet) have the latter expertise. However, it is worthwhile to be aware of one's own strengths and limitations in every career phase.

From this, conclusions can be drawn for individual professional focus or personal development. Because even if companies do not always explicitly demand this, knowledge of the industry is a clear advantage when applying for jobs and also for internal advancement.

Application and job preparation

An estimated 80% of the effort for a data project is spent on preprocessing [3], the next step in the CRISP-DM cycle. According to Waters [4], the search for a new job is also a constant, time-consuming task for data scientists, therefore this is the third part in the career reflection cycle. The underlying reason for this fluctuation is not taken into account at this point.

There are a number of selection procedures that applicants have to pass depending on the focus of the position and also the degree of data maturity of the company. The classical interviews just like assessment centers belong to the consistently typically used procedures. In my experience however, a division into two dimensions is already apparent with quantitative case studies. On the one hand, applicants have to expect this selection test if the company has already made progress in the area of data-driven decision-making. On the other hand, these cases seem to occur less frequently the more strategic the role to be filled is. The focus here is on business case studies and the cultural match with the applicant. How can one further figure out where a company stands? Well, ask for it: What data projects did the company carry out last year? How many Data Scientists worked on it? With which organizational structure does the company want to become more data-driven?

Companies that want to give their newcomers a smooth start can score with accurate on-boarding plans. However, I strongly encourage applicants to ask in the application process whether there is a mentoring opportunity or if new entrants can meet at regular supported network events. This makes it easier to find important contacts for later projects.

Work practice

The fourth step in the CRISP-DM process is modeling, the heart of the data project. Accordingly, the topic of the professional practice is central for job evaluations. In what situation can the data scientist find him or herself in the company? From my point of view, there are two extremes to this: The pioneer on the analytical green-field and the new player in the established data science team. The advantages and disadvantages of these two poles should be compared with one's own wishes.

As a pioneer, you enjoy a high degree of creative freedom. It is you who sets the main thematic focuses and designs the overall big picture. In addition, it is indispensable that entrepreneurship concepts are applied or at least learned. Through this generalized role, which is quite frequently also performed by individuals, one quickly enjoys expert status which can but does not have to open doors. One should therefore enjoy corporate politics and strategy. You will also be asked to "repair" one or the other Excel spreadsheet.

In an established team, you can usually focus entirely on analytical topics. Each team member benefits from the professional exchange with their colleagues, which effectively includes on-the-job training. These teams have already proven their value and benefit. As a result, they usually already have an advanced infrastructure and the necessary tools are in place. Also the projects are often already defined and planned. Therefore, there is not much room to influence and change the basic strategy. As a team you always win together. The potential to advance your career through outstanding achievements can therefore be low to some extent.

Own success and corporate culture

The satisfaction level in my job is influenced by two factors, among others: How successfully could I do my job and how comfortable do I feel in the corporate culture? The part of model evaluation in the CRISP-DM has therefore given way to a self evaluation in these two dimensions.

Determining the level of one's own success is difficult on the one hand and highly individual on the other. Of course, the performance of many employees is divided into performance levels using more or less sophisticated matrices. However, it is doubtful whether these procedures are 100% meaningful or effective. Especially when it comes to one's own sense of success. For data scientists, fortifying a technical skill or learning a mathematical ability can be considered a success, even if the project in question has been delayed or failed. Error culture is a nice term, but it is still not always lived in practice, especially when it comes to exploratory and expensive data projects. Here it is necessary to assess oneself appropriately.

In addition, companies are dynamic systems, such that the environment of the data scientist can change. What are the consequences? As an example, changes in the shareholder structure can be mentioned which might not entail personnel changes in the executive board. Even if at first glance this may seem very superordinate, it can have direct consequences on data projects. This is more often the case when we consider the example of the pioneer who can quickly face a halt to his projects. However, structural changes can also offer opportunities if, for instance, new areas are set up to promote a data-driven culture. Finally, data scientists must also reckon with the fact that cost-cutting programs or relocations of the company's headquarters will call the workplace into question completely unrelated to their work. At this point there is also an evaluation of one's own job.

Decisions

After a successful project, the data product will be deployed as foreseen in the final phase of the CRISP-DM. I put decision making in this position for the career cycle. This does not only take place once a year or once a quarter, it would rather serves ones own well-being to repeatedly make the individual aware of her or his situation. This leads to a continuous decision-making process in which questions such as the following are answered:

  • Does the main task (still) correspond to my interests?
  • Can I improve my abilities and use my knowledge well?
  • Would I like to gain experience in other areas/industries?
  • Does my job fit in with my private planning? Does it have to?
  • Am I (still) happy with the culture of the company?

From this, you can gain satisfaction in your own job because you make yourself aware that the essential factors are right for you personally. Or it avoids a too long stagnation, should the own compass no longer agree with the direction of the job. The inner resignation does not become an endless story. In my experience, this can be an essential building block for fun in the field of data science in business organizations.

References

[1] Pete Chapman. "The CRISP-DM User Guide". (1999)

[2] Jonny Brooks. "Why so many data scientists are leaving their jobs". (2018) https://www.kdnuggets.com/2018/04/why-data-scientists-leaving-jobs.html. Accessed: 2019-07-15

[3] Gil Press. "Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says". (2016) https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#6ea73f616f63. Accessed: 2019-07-15

[4] Richard Waters. "How machine learning creates new professions - and problems". (2017) https://www.ft.com/content/49e81ebe-cbc3-11e7-8536-d321d0d897a3. Accessed: 2019-07-15

Photo above title by Headway, in section "Work practice" by Austin Distel, and in section "Decisions" by Damir Kopezhanov, all on Unsplash