What is the Data Journey, and how can it benefit from ChatGPT?

Ricardo Munguía
Several authors

19 of May of 2023

If you’re reading this article, you may be wondering what relationship the two words in the title have, perhaps even what they are.

What is the Data Journey?

Let’s start at the beginning: explaining the Data Journey. The Data Journey is a way of expressing the steps that a company has to take to be able to evolve in managing its data and in the value extracted from it. In short, it is about converting the data stored by a company into relevant information that helps contribute to the business by streamlining and improving processes compared to its competition. In recent years, digitalizing processes has led companies to face a new paradigm: an overexposure of data that needs to be managed, stored, and used by the business.

How can I start a Data Journey at my company?

First stop: generating structured data

The key part of any company’s journey has to start with a solid foundation: generating good data that’s structured, standardized, validated, and reviewed. To do this, you can choose to offer employees a set of standardized company-wide tools that facilitate this task. In the case of Ferrovial Construction, some examples of these tools are inSite, an ERP based on SAP that allows financial information to be fed with quality data. It is also important to facilitate worker management; to this end, you can use tools such as WorkDay. For risk prevention, there are platforms such as Cority, an application that provides standardized data in real time. Even in day-to-day work, it is possible to implement a tool that facilitates the digitization and storage of all the data in order to analyze and use it properly, one example of this is Procore and is very well explained in this blog.

Second stop: standardizing processes

If you were wondering whether this alone is enough, the answer is no. We also have to work on standardizing processes. The problem that most companies usually face is that the historical information available is unstructured. What do we mean by unstructured information? This is data and information in pdf, doc, or even jpg files. The problem here is that this information is being used unilaterally through individual, isolated knowledge of this historical data; however, it is not easily accessible to the rest of the company, thus limiting its use in the future. In short, you can take advantage of much more of it than you currently are.

A structured, good-quality information registry makes it possible to evolve and generate centralized information reports, allowing us to make decisions with data that has been validated and which follows the same criteria. This topic is complex and quite extensive, more so than we can discuss in this article, so I’d recommend this blog post by my colleague Daphne, where this is all explained perfectly.

Third stop: Predictive analytics

The next step in this data journey is predictive analytics, which consists of predicting what is going to happen based on historical data. It may seem like science fiction, but if you commit to it, the results are spectacular. In the case of Ferrovial Construction, Dafne names it above, but we are achieving this predictive analytics thanks to our Digital Hub and Data Management teams. What predictive analytics does is provide valuable information to the company and help make decisions based on data (a data-driven company).

ChatGPT and unstructured data

We still have big questions to address. What about the unstructured, knowledge-gathering data we were talking about earlier? They don’t come into the data journey yet because the information is not structured, so it is not part of the database. But something has changed that could revolutionize this aspect: ChatGPT.

The ChatGPT technology came out of the OpenAI initiative, which sought to be a non-profit association to investigate the potential of Artificial Intelligence (AI). It all started with anonymous donations, but research is expensive, and training AI is even more so. Thus, Microsoft has invested $10 billion USD, thus providing OpenAI with one of the most powerful computers in the world that operates on the Azure network, which is Microsoft’s network. Thanks to this deployment of means, OpenAI has been able to create AI that writes text like a person (or almost). It was initially called GPT (Generative Pretrained Transformer), and this later evolved to have the ability to hold a conversation, renaming itself ChatGPT.

Where does all this fit into our problem of unstructured data?

The key is that OpenAI has used all the capacity Microsoft has “donated” to index all the information on the Internet. Indexing consists of structuring the information contained in a web page in order to perform more efficient searches. In other words, they have organized all kinds of unstructured information using their AI, which can understand the written language on any web page, file, and even photograph. The main added value that ChatGPT can offer to teams is that it allows us to use information and knowledge that was previously left in some drawer, forgotten, by translating the information into direct value for users.

The connection between Data Journey and ChatGPT should be clearer by now, but we’re missing a third element, one that can be crucial.

What is ***Zuritanken*, and what role does it play?**

Fostering a company’s culture of innovation is crucial, and providing it with initiatives that get employees involved always pays off. At Ferrovial, Zuritanken is an innovation idea generation program open to all employees that helps foster our culture of innovation. In 2018, a group of colleagues working at Ferrovial (Alejandro, Gema, Eva, Luis, and Ricardo) proposed an innovative idea that won this recognition:

Ferrovial's Zuritanken team

It was a conversational chat that collected individual information generated at construction sites and provided value to construction managers to reduce operational risks. That is, this tool would make it possible to collect individual knowledge and convert it into collective knowledge for the company, allowing all workers to improve their work.

Let’s remember that virtual assistants like Alexa were first launched around that time – some didn’t even exist yet. Precisely because these virtual assistants were very precarious and the information was not structured as we understand it, the winning project, Zuritanken, was not ultimately developed. Today, ChatGPT is moving toward the promise of indexing unconnected information through its AI, offering answers based on that information through queries with ordinary language. Now is the time.

ChatGPT is going to allow the Data Journey at each company to evolve, structuring information that was previously forgotten, and adding value not only for the company as an institution but also for each of the workers that make it up. This tool will usher in the democratization of the data-driven world, which I hope will be much vaster than most are anticipating.

5g Big Data Innovation IT

There are no comments yet

Subscribe to our newsletter and you will receive only good stories

Required field

Incorrect mail format. Ex: ejemplo@mail.com

Legal terms and conditions

Don't forget to read this!

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

I hereby grant my consent to receive Ferrovial’s newsletters according to the Privacy policy and Legal notice.

Required field

I authorize the processing of my data for the purpose of enabling my registration as a user. This registration allows me to save my readings and continue at another time; to publish comments, together with the data that I may provide for this purpose; and to receive notifications about new posts, according to the categories previously selected for this purpose and new comments about the posts previously commented, in accordance with the Privacy policy.

Required field

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	11 months 29 days 23 hours 59 minutes	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category ''Advertisement''.
cookielawinfo-checkbox-analytics	11 months 29 days 23 hours 59 minutes	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category ''Analytics''.
cookielawinfo-checkbox-language	11 months 29 days 23 hours 59 minutes	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookies will remember language preferences.
cookielawinfo-checkbox-necessary	12 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
csrftoken	11 months	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
lang		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
PHPSESSID		This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wp-wpml_current_language	1 day

Cookie	Duration	Description
_csrf		Anti Cross-site request forgery cookie.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gat	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.
_gat_gtag_UA_5784146_31	1 minute	Google Used to distinguish users.
_gat_UA-141180000-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gat_UA-20934186-10	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gat_UA-5826449-38		Used by Google Analytics to throttle request rate
_gat_UA-58630905-1	1 minute	Used by Google Analytics to monitor the rate of requests
_gat_UA-70491628-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gcl_au	2 months	Used by Google AdSense to experiment with advertising efficiency across websites using its services.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_hjAbsoluteSessionInProgress	30 minutes	This cookie is used to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjCachedUserAttributes	Session	This cookie stores User Attributes which are sent through the Hotjar Identify API, whenever the user is not in the sample. These attributes will only be saved if the user interacts with a Hotjar Feedback tool.
_hjClosedSurveyInvites	365 days	Hotjar cookie that is set once a visitor interacts with an External Link Survey invitation modal. It is used to ensure that the same invite does not reappear if it has already been shown.
_hjDonePolls	365 days	Hotjar cookie that is set once a visitor completes a survey using the On-site Survey widget. It is used to ensure that the same survey does not reappear if it has already been filled in.
_hjid	365 days	Hotjar cookie that is set when the customer first lands on a page with the Hotjar script. It is used to persist the Hotjar User ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	30 minutes	This cookie is set to let Hotjar know whether that visitor is included in the data sampling defined by your site's pageview limit.
_hjIncludedInSessionSample	30 minutes	This cookie is set to let Hotjar know whether that visitor is included in the data sampling defined by your site's daily session limit
_hjLocalStorageTest	Less than 100ms	This cookie is used to check if the Hotjar Tracking Script can use local storage. If it can, a value of 1 is set in this cookie. The data stored in_hjLocalStorageTest has no expiration time, but it is deleted almost immediately after it is created.
_hjMinimizedPolls	365 days	Hotjar cookie that is set once a visitor minimizes an On-site Survey widget. It is used to ensure that the widget stays minimized when the visitor navigates through your site.
_hjRecordingLastActivity	Session	This should be found in Session storage (as opposed to cookies). This gets updated when a visitor recording starts and when data is sent through the WebSocket (the visitor performs an action that Hotjar records).
_hjShownFeedbackMessage	365 days	Hotjar cookie that is set when a visitor minimizes or completes Incoming Feedback. This is done so that the Incoming Feedback will load as minimized immediately if the visitor navigates to another page where it is set to show.
_hjTLDTest	Session	When the Hotjar script executes we try to determine the most generic cookie path we should use, instead of the page hostname. This is done so that cookies can be shared across subdomains (where applicable). To determine this, we try to store the _hjTLDTest cookie for different URL substring alternatives until it fails. After this check, the cookie is removed.
_hjUserAttributesHash	Session	User Attributes sent through the Hotjar Identify API are cached for the duration of the session in order to know when an attribute has changed and needs to be updated.
_smvs	23 hours 59 minutes
_uetsid	1 day	This is a cookie used by Microsoft Bing Ads and it is a tracking cookie. Allows you to interact with a user who has already visited our website.
_uetvid	2 weeks	Cookie installed by Google Tag Manager to store and track visits between sites.
apbct_visible_fields
apbct_visible_fields_count
ct_checkjs
ct_fkp_timestamp
ct_pointer_data
ct_ps_timestamp
ct_timezone
dtCookie	Session
GPS	30 minutos	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location.
lumesse_language	50 years ago	This cookie determines language of Application Process user interface (labels, interface etc.)
MR	1 week	This cookie is used to measure the use of the website for analytical purposes.
test_cookie	14 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.

Cookie	Duration	Description
_fbp	2 months 28 days 23 hours 59 minutes	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
everest_g_v2	1 year	The cookie is set in eversttech.net domain. The purpose of the cookie is to assign clicks to other events on the customer's website.
fr	2 months 28 days 23 hours 59 minutes	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisements before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
lms_ads	30 days	It is used to identify LinkedIn members from designated countries for advertising purposes.
mid	9 years	The cookie is set by Instagram. The cookie is used to distinguish users and to show relevant content, for better user experience and security.
MUID	1 year	Used by Microsoft as a unique identifier. The cookie is set using embedded Microsoft scripts. The purpose of this cookie is to synchronize the identifier in many different Microsoft domains to allow user tracking.
NID	6 meses	This cookie is used to a profile based on user's interest and display personalized ads to the users.
personalization_id	2 years	This cookie is set by twitter.com. It is used to integrate the sharing features of this social network. It also stores information about how the user uses the website for tracking and targeting.
uid	1 year	This cookie is used to measure the number and behavior of website visitors anonymously. The data includes the number of visits, the average duration of the visit on the website, the pages visited, etc. in order to better understand user preferences for targeted ads.
VISITOR_INFO1_LIVE	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	Session	This cookie is set by Youtube and is used to track views of embedded videos.

What is the Data Journey, and how can it benefit from ChatGPT?

What is the Data Journey?

How can I start a Data Journey at my company?

First stop: generating structured data

Second stop: standardizing processes

Third stop: Predictive analytics

ChatGPT and unstructured data

What is Zuritanken, and what role does it play?

What is ***Zuritanken*, and what role does it play?**