Big data and scraping have transformed many industries. Until recently, big data’s focus has been primarily on marketing. Businesses can take advantage of data to learn more about their customers and market their products in a more relevant way.
Today, big data is creeping into other fields, including law, helping transform the legal landscape.
What is Big Data?
Big data is exactly what it sounds like – large volumes of data. This data, both structured and unstructured, can be used to gain insight and make smarter decisions.
Businesses often employ the use of big data when creating strategies and marketing plans. But big data can be used for other purposes, such as improving services, education and even city traffic flow.
Data can be collected from a variety of sources, such as:
- Social media
- Business transactions
- Industrial equipment
- Smart devices
It can also come in many forms, from structured numeric data to unstructured emails, text documents, videos and financial transactions.
Ultimately, big data’s importance comes down to how the information is analyzed and not how much data is collected.
Big Data Analysis and Analytics
Analysis and analytics are the key to using big data successfully. Without analyzation, big data is nothing more than a pile of information.
Analytics aims to make sense of big data by uncovering valuable information, like market trends, hidden patterns, correlations and customer preferences. The technologies used to analyze big data allows the organization to walk away with new information that can be used to make better decisions and build better strategies.
Data analysis involves the use of software with tools that help with:
- Data mining, which sifts through the data to find relationships or patterns defined by the user.
- Machine learning or deep learning, which analyzes big sets of data using algorithms.
- Predictive analytics, which can create projections for behavior and other developments.
While big data analysis and analytics are typically used by businesses, other industries and fields can take advantage of this technology to improve services and gain insights. One such field that is being transformed by big data is the legal industry.
Legal Industry’s Ability to Generate a Lot of Data
The legal industry produces more data than most other industries, and its data volume is growing every year. It’s true that the legal field has been slow to adopt big data technology, but that is quickly changing.
Law firms are entirely reliant on data to build defense strategies and cases for clients. Until recently, firms were still storing all of their case files in hard copies. Now, more firms are going the digital route and even storing their case files on the cloud.
Cloud storage of their data has made it quicker and easier to gather and analyze data. As a result, firms are better able to help law enforcement agencies and reduce trial lengths.
Quicker analysis of their data streamlines the decision-making process for most firm. Algorithms allow firms to make predictions based on the outcomes of past cases. It can help firms decide whether a case is worth taking, and it can also help them build case strategies.
Big Data is Being Used Worldwide by Law Firms
When large sets of data are available, it can be overwhelming to see how the data may be useful in real-world applications. Automated data analysis has been used with great success in many fields, and now this data is starting to be used extensively in the legal field.
You can find examples of law firms and prosecutors across the world using the data that they collect to push their cases further and win cases.
A few prime examples of this are:
- One prosecutor in Germany had a client that was accused of dragging a body at a specific time of day. The prosecutor had the client’s iPhone, which collects data to better help the person keep track of their health. The prosecutor was able to determine that the person was going up flights of stairs rather than dragging a body at the time of the killing.
- United States prosecutors have also used similar data in an attempt to show how their clients’ injuries have led to a drop in quality of life. For example, fitness tracker data can show that a person’s activity levels are down by 50% following an injury. These lifestyle changes can be used in in court as a way to show that the person is more sedentary today than prior to his or her injury.
The devices that people use today are being utilized to collect data that law firms can use to build solid defenses for their clients. But the data from devices isn’t the only data that law firms are starting to use to their advantage.
Cases, for example, may have tens of thousands of similar cases that law firms can use to build their own case. For example, a law firm working to determine how an accident occurred can scour all cases that are similar and involve a similar vehicle.
Let’s assume that a driver was driving a Toyota and claims that the vehicle accelerated on its own, leading to an injury. Gathering data from cases around the world may indicate a larger problem, such as a vehicle defect that has yet to be reported by the manufacturer.
The firm may use this information, along with automotive experts, to win a case.
Data is all around us, and with law firms building massive amounts of data, as seen in previous sections of this article, it’s easy to see how this data can be helpful in the courtroom. The problem is that some data is not available in a format that is easy to understand.
Some data also isn’t available at all.
Scraping and custom solutions are able to gather and make sense of data so that it can be used in the courtroom.
Web Scraping in Practice: Legal Investigation of Unfair Sales Practices
Two years ago, we were tapped by a large law firm in the United States (no names – sorry, we’re under an NDA) for a long-term collaborative practice. The law firm needed to collect massive amounts of data from Walmart and Amazon.
The firm worked with companies to fight back against unfair competition rules.
Collecting data from these two massive retailers was a massive undertaking, as they contain millions of products.
When they approached us, it was clear that we needed to develop a custom solution that would eventually:
- Collect 20 million reviews from Amazon
- 4 million products from Walmart
The number of products and pages that needed to be scraped was massive, and there are strict protocols in place to stop scraping in the first place. Page structures are also different for certain pages or product types, which made it more difficult to use an out-of-the-box scraping solution effectively.
Our Solution
We took our time to analyze the needs of the client and the requirements of the pages available. It became apparent that we had to build a custom scrape and admin panel. We used numerous technologies to do this:
- Laravel
- Scrapy Python
- Puppeteer
- mSQL
- RabbitMQ
We didn’t want to bog down the servers, nor be detected when scraping the site. We decided to use a large pool of proxies to continually change IP addresses. The entire pool of proxies was changed every 24 hours, and we chose proxies that were in close proximity to local stores.
Using randomization, we were able to streamline the scraping process, with 100 to 150 scrapers running at any given time.
All of the data that we collected had to be cleansed and then exported back to the admin panel for our client to see easily. The goal was to use the big data in a meaningful way, and the admin panel allowed for this data to be digested in a way that made sense for the client.
Ultimately, the client was able to enjoy a massive set of data, on a large set of keywords, that is collected to fight back against unfair competition for big brands. We continue working with this client to provide the intense amount of data they need to make stronger cases for their clients and also speed up the legal investigation process.
Big data may be available to you already, and if it is, it’s a matter of using analytics and extracting the data to make it useful. When the data isn’t available, web scraping can help capture millions of data points and extract it in a way that makes it usable.