译文丨2017年会成为大数据的扫盲年吗?
过去一年内,我们看到了大数据的井喷式发展,数据处理分析成为热门, 大数据行业 呈现出信息激进之势。这导致数据科学家、数据应用程序员和商业分析师等大数据方面的人才成为当下职场最炙手可热的岗位。
但是,我们也能发现,有能力处理日益增长的大规模数据计算的专家和人才,还远远达不到市场需求的数量。
有人预测,随着商业数据不断增多,2017年将成为新数字信息时代的开始。但是如果没有足够多的专家对这些数据进行分析利用,那么这些资源将在很大程度上得不到充分的利用。
很不幸,事实情况是大数据的发展要远远快于我们学习利用数据的速度。
很多公司的决策者就只能依靠自己的直觉进行决策,这是因为他们眼前的数据规模庞大,杂乱无章,有些数据呈现出的信息甚至看似矛盾,导致了很多重大决策上的失误。
这种情况亟待改变。要改变现状,就必须提高数据科学家的水平。但是2017年更重要的任务是让更多的人了解大数据,即为更多的人提供 数据分析 工具和数据分析训练,来提高普通民众的大数据素养水平。
100年前,我们说要教普通民众读书写字,进行扫盲,现在我们也需要针对大数据扫盲,因为大数据分析能力将成为未来最重要的商业技能之一。
那么,要进行大数据扫盲,我们应该怎么做呢?
以下是我的一些看法
1、大数据的组合
大数据规模的重要性将逐渐让位于大数据组合。
数据的存储残片越来越多,很大一部分来自于云以外的数据源,这种情况下,如果没有明确目的性的话,储存数据的成本将大大提高。
这意味着我们需要建立一个新的模型,让公司企业能够快速组合他们的大数据和小数据,方便他们获得全面的信息并尽快确定数据的价值。组合数据也将帮助提高数据的准确性和可读性。
2、混合思维
2017年,混合云和多平台将成为数据分析的主要模型。
云的优势显而易见,存储数据方便,扩容难度小,因此云已经成为大数据的主要发展方向。但是单个的云是不够的,因为数据和工作负荷(workloads)将需要多个平台。此外,数据的重要性也决定了多个云备份的重要性。混合云和多平台环境将成为大数据的主导模型,工作负荷和数据公布将在云和预制数据储存系统之间展开。
3面向全民的自助式服务
免费增值将成为新常态。
2017年用户将更方便的对他们的数据展开分析。越来越多的数据可视化工具将以更低的价格出现在市场上,甚至免费。这样一来,一些分析学将面向全民开放。越来越多的人开始学习分析学,那么数据素养水平自然将会提高——越来越多的商人会知道他们需要哪些数据和工具,这些数据对他们的公司有什么用。信息激进发展也将进一步得到刺激。
4、扩容
两年前的用户导向数据挖掘已经成为当今企业广泛使用的商业智能。
2017年,这种商业智能将取代过时的报告优先平台。随着商业智能成为新的商业参考结构,它将成为面向全民的自助式数据分析工具。商业智能还将能承接扩容、运营、管理、安全方面不同的需求。
5、发展中分析
2017年,我们的关注点将从“高级分析”(advanced analytics)转向“发展中分析”(advancing analytics)。
高级分析是至关重要的,模型的创建、管理和策划只有高技术水平的数据专家才能够做到。但是一旦模型建立起来了,更多的人就能够从这些模型中受益,普通人也可以使用这些自助服务工具。
此外,通过赋予软件更多智能,提高模型的分析能力,降低复杂性和分析数据洞察的难度。但数据分析不应该被简单定义为黑盒子或过于规范化的概念。
最近“ 人工智能 ”被炒的火热,但它并不能取代人类分析,只能作为辅助人类的分析工具。人工智能固然能够帮助回答一些问题,但是和回答问题同样重要的是提出问题,这只能由人脑来完成。
6、可视化将从纯分析工具发展成为适用于全信息供给链的 重要概念。
可视化将成为统一中心的强大组件,它采用视觉方法管理信息资产,准备视觉自助服务数据,从而支持现实视觉分析。此外,可视化作为传达交流信息方式上将取得重大进展。这样一来,数据供应链可服务的用户数量将会增加。
7、从定制分析应用程序到应用程序内分析
应用程度的使用者不一定是这个应用程序的开发者。
但是我们也应该让这些使用者能够发掘他们自己的数据。提高大数据的素养水平之后,人们就能更好的从分析学中获益,因为他们可以使用各种应用程序来帮助他们结合自身情况进行数据分析,还能运用分析学工具自己进行数据分析的工作。由此看来,开放可扩展的、可定制的情景化的分析工具将在未来成为主流。
这些趋势为不仅能提高信息活动水平,而且将为提高数据素养水平提供了基础。毕竟,可以抓住“另一半人群”(即技术不熟练的信息工作者和行动工作者)的新平台和技术将帮助我们进入一个新时代,让合适的数据与合适的人以及他们的想法联系在一起——这将弥补我们现有数据水平与我们从中获得洞察力的能力之间的鸿沟。
这是我们应该选择的道路,它能带领我们走向一个更加开明的,信息驱动的和基于事实的新时代。
英文原文
2017: The Year Of Data Literacy?
We've seen an explosion of data in the past 12 months, so I'm sharing my predictions of what's in store for 2017
Over the past 12 months, we’ve seen an explosion of data, an increase in processing it and a move towards information activism. This means the number of employees actively able to work with – and master – the huge amounts of information available, such as data scientists, application developers, and business analysts, have become a valuable entity.
Unfortunately, however, there still aren’t enough people with the expertise to handle the ever-increasing, vast levels of data and computing. You would assume, with all the information currently being produced and held by businesses, that 2017 would see us in a new digital era of facts. But, without the right number of specialists to consume and analyse it, there’s a gap in resources. Data is, unfortunately, growing faster than our ability to make use of it.
For many business leaders then, this means a reliance on gut instinct to make even the most important decisions. Unable to hone in on the most important insights, they’re presented with multiple – and sometimes conflicting – data points, so the most important ones seem unreliable.
The situation needs to change. Yes, that will mean upskilling more data scientists in 2017, but there will be a greater focus on empowering more people more broadly1. That will go beyond information activists and towards providing more people with the tools and training to increase data literacy. Just as reading and writing skills needed to move beyond scholars 100 years ago, data literacy will become one of the most important business skills for any member of staff.
So, what will change to see culture-wide data literacy become a reality? Here are my predictions:
1. Combinations of data – Big data will become less about size and more about combinations.
With more fragmentation of data and most of it created externally in the cloud, there will be a cost impact to hoarding data without a clear purpose. That means we’ll move towards a model where businesses have to quickly combine their big data with small data so they can gain insights and context to get value from it as quickly as possible. Combining data will also shine a light on false information more easily, improving data accuracy as well as understanding.
2. Hybrid thinking – In 2017, hybrid cloud and multi-platform will emerge as the primary model for data analytics.
Because of where data is generated, ease of getting started, and its ability to scale, we’re now seeing an accelerated move to cloud. But one cloud is not enough, because the data and workloads won’t be on one platform. In addition, data gravity also means that on-premise has long staying power. Hybrid and multi-environment will emerge as the dominant model, meaning workloads and publishing will happen across cloud and on-premise.
3. Self-service for all – Freemium is the new normal, so 2017 will be the year users have easier access to their analytics.
More and more data visualization tools are available at low cost, or even for free, so some form of analytics will become accessible across the workforce. With more people beginning their analytics journey, data literacy rates will naturally increase — more people will know what they’re looking at and what it means for their organization. That means information activism will rise too.
4. Scale-up – Much a result of its own success, user-driven data discovery from two years ago has become today’s enterprise-wide BI.
In 2017, this will evolve to replace archaic reporting-first platforms. As modern BI becomes the new reference architecture, it will open more self-service data analysis to more people. It also puts different requirements on the back end for scale, performance, governance, and security.
5. Advancing analytics – In 2017, the focus will shift from 'advanced analytics' to 'advancing analytics.
Advanced analytics is critical, but the creation of the models, as well as the governance and curation of them, is dependent on highly-skilled experts. However, many more should be able to benefit from those models once they are created, meaning that they can be brought into self-service tools. In addition, analytics can be advanced by increased intelligence being embedded into software, removing complexity and chaperoning insights. But the analytical journey shouldn’t be a black box or too prescriptive. There is a lot of hype around 'artificial intelligence,' but it will often serve best as an augmentation rather than replacement of human analysis because it’s equally important to keep asking the right questions as it is to provide the answers.
6. Visualization as a concept will move from analysis-only to the whole information supply chain – Visualization will become a strong component in unified hubs that take a visual approach to information asset management, as well as visual self-service data preparation, underpinning the actual visual analysis. Furthermore, progress will be made in having visualization as a means to communicate our findings. The net effect of this is increased numbers of users doing more in the data supply chain.
7. Focus will shift to custom analytic apps and analytics in the app – Everyone won’t — and cannot be —both a producer and a consumer of apps.
But they should be able to explore their own data. Data literacy will, therefore, benefit from analytics meeting people where they are, with applications developed to support them in their own context and situation, as well as the analytics tools we use when setting out to do some data analysis. As such, open, extensible tools that can be easily customized and contextualized by application and web developers will make further headway.
These trends lay the foundation for increased levels of not just information activism, but also data literacy. After all, new platforms and technologies that can catch 'the other half' (i.e., less skilled information workers and operational workers on the go) will help usher us into an era where the right data becomes connected with people and their ideas — that’s going to close the chasm between the levels of data we have available and our ability to garner insights from it. Which, let’s face it, is what we need to put us on the path toward a more enlightened, information-driven, and fact-based era.
注:本文摘自数据观入驻自媒体—灯塔大数据,转载请注明来源,微信搜索“数据观”获取更多大数据资讯。
责任编辑:汤德正