In the first part of this series Getting Started with Azure Cosmos DB, we looked at using Azure Cosmos DB for saving an individual’s fitness routine and why this database structure is better for this data than a SQL database while also showing that we still have to organize our structure like a file system organizes files. In this part of our series, we’ll begin looking at the terminology translation between NoSQL and SQL along with running updates for our documents and queries with filters that return some fields in our document, but not other fields.
- Data Masking or Altering Behavioral Information - June 26, 2020
- Security Testing with extreme data volume ranges - June 19, 2020
- SQL Server performance tuning – RESOURCE_SEMAPHORE waits - June 16, 2020
In the past two years, we’ve seen an explosion in growth with document-oriented databases like Azure Cosmos DB. MongoDB – one of the major document databases – went live on the Nasdaq and attracted some attention in the past year as well. While more developers are using the document structure for some appropriate data models, less than 10 years ago, some in the industry were predicting that document databases were unnecessary and wouldn’t last because all data could be flattened to fit the SQL model. I took the opposite approach, being an early adopter of MongoDB along with continuing to use SQL databases as I saw opportunities in both SQL and NoSQL for various data structures. While some data do fit the SQL model and SQL will continue to exist, some data are best for document databases, like Azure Cosmos DB. In this series, we’ll be looking at the why and how of document databases.
Since we know that the SQL CTE (common table expression) offers us a tool to group and order data in SQL Server, we will see an applied example of using common table expressions to solve the business challenge of re-basing identifier columns. We can think of the business problem like the following: we have a table of foods that we sell with a unique identifier integer associated with the food. As we sell new foods, we insert the food in our list. After a few years, we observe that many of our queries involve the foods grouped alphabetically. However, our food list is just a list of foods that we add to as needed without any grouping. Rather than re-group or re-order through queries using a SQL CTE or subquery, we want to permanently update the identifier.Read more »
In this article, the latest in our series on Common table expressions, we’ll review CTE SQL Deletes including analyzing the operation before the delete, actually removing the data as well as organizing and ordering the deletes with CTEs.Read more »
In CTEs in SQL Server; Querying Common Table Expressions the first article of this series, we looked at creating common table expressions for select statements to help us organize data. This can be useful in aggregates, partition-based selections from within data, or for calculations where ordering data within groups can help us. We also saw that we weren’t required to explicitly create a table an insert data, but we did have to ensure that we had names for each of the columns along with the names being unique. Now, we’ll use our select statements for inserts and updates.Read more »
Common table expressions (CTEs) in SQL Server provide us with a tool that allows us to design and organize queries in ways that may allow faster development, troubleshooting, and improve performance. In the first part of this series, we’ll look at querying against these with a practice data set. From examples of wrapped query checks to organization of data to multiple structured queries, we’ll see how many options we have with this tool and where it may be useful when we query data.Read more »
This article will cover SQL bulk insert operations deterministic outcomes and responses covering not allowing any bad data to allowing all data to be inserted, regardless of errors.Read more »
One of the challenges we face when using SQL bulk insert from files flat can be concurrency and performance challenges, especially if the load involves a multi-step data flow, where we can’t execute a latter step until we finish with an early step. We also see these optimization challenges with constraints as well, as fewer steps to complete a data flow results in saved time, but possibly less accurate data.Read more »
In this article, we’ll discuss security implications of using SQL Bulk Insert and how to mitigate those risks.Read more »
In the first part of reviewing the basics of bulk insert, we looked at importing entire files, specifying delimiters for rows and columns, and bypassing error messages. Sometimes we’ll want to skip first and ending lines, log errors and bad records for review after inserting data, and work with data types directly without first importing using a varchar and converting to the data type later. In this part, we look at these techniques using T-SQL’s native bulk insert.
We’re facing a challenge with several of our data flows that use more time than they have in the past and we’re not sure when this trend started. We know in the past month, our reports have been delayed by over a day from the start to the finish. For some of our data flows we use SQL Server Agent that calls SSIS packages or procedures, while some of them use a custom data import and reporting application we’ve created. How can we track the length of time for these data flows, since we’re using a combination of tools for importing data?
From troubleshooting many data flow applications designed by others, I’ve seen a common pattern of over complexity with many designs. Putting aside possible risks by introducing too much complexity, troubleshooting these designs often involves opening many different applications – from a notepad file, to SSIS, to SQL Server Management Studio, to a script tool, etc. It may sound like many of these are doing a hundred steps, yet many times, they’re simply importing data from a file, or calling five stored procedures and then a file task of moving a file. This complexity is often unnecessary, as is opening many different tools when we can use a few tools and solve issues faster.Read more »
In this article, we are going to talk about discovering and archiving SQL references to invalid procedures.
We’ve recently had production failures because our developers changed an important reference. In this case, we had a view which several procedures and views referenced. A developer made a change to the referenced view by removing columns, which caused several procedures and a view that referenced it to fail. We’re considering whether we should stop this practice, or if there are other ways we can prevent changes to an object that’s being referenced by other objects (in our case, a view).Read more »
When we’re developing solutions, we can sometimes forget useful commands we can use in T-SQL that make it convenient to remove data, eliminate objects, or carefully remove data. We look at three of these commands with a few examples of where we might consider using them in development, or in rare production cases. While they may offer us speed and convenience in some cases, we also look at some situations where they may not be the best tool to use.Read more »
A client recently discovered a discrepancy on one of our reports that showed an improvement in performance metrics but was inaccurate. Our reports came from a software tool, which showed the average performance throughout the day. It derived this number from periodic checks and the frequency changed, which affected our report. When we showed an improvement in the metrics on a report, the client showed us that the frequency change may have impacted this, not necessarily any improvement in performance. In situations where we’re measuring values and comparing them to other values, how can we prevent a change in measurement from impacting our reports? Read more »
As our company has grown, we’ve recently added developers to our team who want to use open source tools (open source languages and libraries). In the past, we built and used our own custom libraries, but our new developers to prefer to use open source libraries or add new languages that require new libraries. We’re concerned that the use of open source libraries may not be secure and may introduce new inputs and outputs in our system that we don’t fully understand. What should we consider when we think about allowing open source software, tools or languages in our environment from the standpoint of security? Read more »
Our organization must restrict permissions and prove to an independent party that we investigate access and restrict permissions. We restrict permissions to objects using least permissions and give full access to an exceptional few based on an organizational design that follows best practices. What can we do additionally that will help us prevent unauthorized access or catch when someone who shouldn’t have access is able to infiltrate our systems, especially in the context of showing a third party that we track this.Read more »
This article explains data security for accessing sensitive data and restricts access in application using SQL Views and stored procedures.
We recently faced a leak of information for one of our employees that involved data which caused a conflict within our company, even if it was not personally identifiable information (PII data). When we investigated the issue, we uncovered that we need to organize data access for our teams and review who has access to what information. As of right now, all our users either have access to all tables directly or a subset of all our tables. What are some practices we can use in SQL Server to avoid giving direct table access to our users? Read more »
We receive many database alerts with many of the alerts logging some of these same alerts or information to files or tables. What we’ve found over time is that the logging is now costing us quite a bit of resource. Our logging server (where both files and table logging are stored) has had a few outages related to conflicts from messages for other servers. We’ve considered scaling the alerting by environment type, but we’ve also considered that we may be logging too much information about our databases. In addition, since we receive so many alerts each day, it’s impossible for us to resolve them and assist with other issues that arise. What are some techniques that we can use to help us with the issue of too much logging information and too many alerts? Read more »
We have some custom files that we receive from different providers and for these situations we are unable to use standard ETL programs without any customization. Since we’re expanding our ability to read these custom files with .NET, we’re looking for efficient ways to read files with PowerShell that we can use in SQL Server Job Agents, Windows Task Schedulers, or with our custom program, which can execute PowerShell scripts. We have many tools for parsing data and wanted to know efficient ways of reading the data for parsing, along with getting specific lines of data from files by number, or by the first or last line of the file. For reading files efficiently, what are some functions or libraries we can use?Read more »
We manage data in a growing environment where our clients query some of our data, and on occasion will query past data. We do not have an environment that scales and we know that we need to archive some of our data in a way that allows clients to access it, but also doesn’t interfere with current data clients are more interested in querying. With the current data in our environment and new data sets will be using in the future, what are some ways we can archive and scale our environment?Read more »
We recently inherited a database environment where we’re facing significant data growth with limits on the sizes we can allow our databases to grow. Since we maintain multiple development, QA and production environments and these environments must be sized appropriately. We set a few standards about tables that exceed certain sizes – rows, data or both, or have certain growth patterns and our standards force compression at thresholds we set for our different environments (including using page, row, or clustered columnstore index compression) We’re looking for how we can get information about data and compression in tables and options we have so that we can quickly determine candidates that don’t currently match our best practices design we’ve set for our environments.Read more »
This year, digital security has risen as a top concern for many people, especially after the recent security compromises with Cloudflare in February and ransomware in May. Throughout the last two years, we’ve also seen the rise of sim-swapping where a hacker is able to extract a user’s cell phone data and compromise all emails and two-factor authentication. Unfortunately, I expect these security issues to continue and cause compromises for user’s data while negatively impacting companies involved.Read more »
As a database administrator, sometimes you need to identify details about a problem as quickly as possible and being able to build and analyze data for analysis will help you solve the problem. From getting information about the latest backups, to saving information about waits or indexes and comparing that to other captured metrics, we will run into issues where being able to get, store and analyze data are important for decisions to solve urgent problems.Read more »