








-   分析以往入侵的特征,来建立已知网络威胁模型

-   分析多个数据点(如活动时间、活动频率、活动地点),并分析这些数据点与单个用户及过去趋势之间的关联

-   分析社交网络上异常情况与公司主要成员之间的关联



•      783起:2014年美国公司报告泄露数据事件783起创历史新高;

•      2万亿:至2019年全球泄露造成的损失已经达到2万亿美元;
引人深思的案例分析: 阿什利•麦迪逊案

•      3200万:2015年遭受攻击的网络数据服务商的客户数高达3200万家;

•      15,000:有约15,000个美国政府雇员曾暴露过涉及国家安全的信息;

•      7亿6千万:集体诉讼案要求的损失赔偿高达7亿6千万美元;



•    98天:对金融公司来说需要98天;

•    7个月:对零售商来说需要7个月;



-  复合吸收路径方案,花费过高且很难维持

-  应对某些非法入侵,如可接受查询时间的预计算子图,效率低,成本高,因此功能效果欠佳。



ThingSpan™是objectivity的快速数据应对方案平台,结合了Hadoop和Spark, 帮助公司企业设计可靠的网络安全应对方案。可以帮助公司企业吸收、转变和消耗大量不同的数据流,以产生并维持复合的可拓展的图像结构。这些结构可以运行拍字节,且能有效支持复杂的连续的查询。
ThingSpan采用开源代码,支持建立在高性能、分布式图像数据库上的 Hadoop和Spark生态系统。ThingSpan作为一种YARN应用,在分布式文件系统中能够本地运行,同时运用Spark来转换工作流和数据。它还支持基于Kafka、Flume和其他分布式通讯工具的流系统。通过DataFrames,ThingSpan与Spark联合,ThingSpan能吸收流数据,同时还能维持一级逻辑模型关联。


How Discovering Data Relationships Can Fight Cybercrime

The Business Problem

Every day that cyber threats go undetected results in the potential for more data theft, creating increased long-term repercussions to businesses. In order to minimize the damage from cybercrime, organizations need the ability to quickly identify abnormal activity on their networks so that they can quickly isolate the problem and react accordingly. They need the ability to access historical data and analyze it to uncover patterns, so that they will be able to more quickly discern when unusual activity is occurring. This involves advanced relationship and pattern discovery processes.

While “known” threats can often be identified by common anti-virus software, firewalls, and event management tools, “unknown” threats take new forms, and may not be immediately spotted based on common queries.

The Technical Challenge

Organizations today make use of historical data and logs to recognize patterns and connections within their data, analyzing archival data alongside streaming data to quickly ascertain discrepancies and potential threats. This typically involves analyzing the following:

- Signatures of past breaches to identify known instances of cyber threats.
- Multiple data points (e.g. time of activity, frequency of activity, location) and how they relate to historical norms for both the individual user and past trends.
- Relationships between anomalies in social networks associated with key individuals in the organization.
Systems set up for these include a complex data ingest layer–where streaming data is transformed–and a graph-like storage layer where this data and the relationships between various transactions can be persisted, then rapidly and continuously queried.
This can often be a challenge, as the volumes of data that must be consumed and analyzed continue to increase. The creation of an ingest layer that can consume, transform and store streaming data while creating and maintaining information about the relationships between transactions becomes very complicated. At many times, it becomes a stumbling block.
Data Breaches on the Rise
· 783: The number of reported breaches at U.S. organizations in 2014, a record high
· $2 trillion: The global cost of breaches expected by 2019 ¹
A Cautionary Case Study: Ashley Madison
· 32 million: Users of the online dating service that were hacked in 2015
· 15,000: U.S. government workers exposed, implicating national security
· $760 million: Damages claimed in a class action lawsuit ²
The Need for Real-Time Response
The average time taken to discover data breaches:
· 98 days for financial firms
· 7 months for retailers ³
In addition, most of the graph stores available today are not designed to scale to multi-billion nodes and edges while supporting billions of transactions that need to be analyzed and queried per day: this is the level of performance and scalability needed to identify emerging threats quickly enough to stop them before significant damage is done. As a result, most organizations rely on solutions based on a custom-built ingest layer feeding into a graph database to maintain relationships, which neither scale nor support required query times. The resulting solutions suffer from limitations that include:
- Complex ingest path solutions, which are expensive and difficult to maintain.
- Inefficient and expensive hacks like pre-computing sub-graphs for acceptable query times, and therefore limited functionality.



The ThingSpan™ Solution
  ThingSpan™, Objectivity’s Fast Data solution platform, is integrated with Hadoop and Spark to give organizations the capability to build a fully supportable cybersecurity solution. It does this by enabling organizations to ingest, transform and consume massive and varied data streams to create and persist complex, scalable graph structures. These structures can operate at petabyte scale and efficiently support complex, continuous queries.
ThingSpan leverages open-source tools by supporting the Hadoop and Spark ecosystem atop a high-performance, distributed graph database purpose-built for relationship and pattern discovery. It runs natively on top of HDFS as a YARN application while using Spark for workflow and data transformation. It is also designed to support streaming systems based on Kafka, Flume and other distributed messaging tools for streaming data. Integration with Spark via Data Frames allows ThingSpan to ingest streaming data while maintaining and persisting relationships as first-class logical models.
This model allows for enriched and transformed data to simplify the support of complex, multi-dimensional queries associated with cybersecurity applications and analytics. With its relationship-oriented approach to information fusion involving fast, streaming data and static, historical and transactional data, ThingSpan delivers optimal intelligence to fight cybercrime. Now organizations can achieve business insights from Big Data and real-time streaming data with a high degree of efficiency at scale, thereby preventing future security breaches.




首发媒体 灯塔大数据 | 转发媒体

