Introduction
It is convenient to store different data in different formats. For example, it is most logical to save the text in a text file like .txt or .docx. But what if the data is in a form of a table with a lot of headers and cells? There are different formats for storing such tabular data. And today we will get acquainted with two of them: the CSV format and the TSV, which is similar to it. Let's start with CSV.
What is CSV
CSV stands for comma-separated values. In fact, a CSV file is a simple text file with numerous lines. Each line of the file is a data entry containing elements separated by a comma. Let's have a look at an example of CSV:
Our file is called cars.csv, we opened it with Libre Office Calc (see picture on the left). The right picture shows the same file opened in Notepad. It contains information about different cars, their models, and the production year.
The first line is a line with column titles separated by commas: Car brand, Car model, and Production year. It's an optional header line, so if you are going to carry out any manipulations with the data, then you need to make sure that the line with the headers is separate from the values. All the other lines (entries) with certain values contain information about cars.
After we got acquainted with CSV, let's find out how to work with it.
Working with CSV
To work with a CSV file one just needs to open it. The CSV file format is supported by almost all spreadsheets and database management systems, including LibreOffice Calc, Gnumeric, Emacs, Microsoft Excel, Numbers, SpreadsheetPro, CSVed, KSpread, and Google Sheets. Also, CSV import and export is possible in many engineering packages such as ANSYS and LabVIEW.
For example, that's how a CSV file in Libre Office Calc looks:
Moreover, one can work with a CSV file using programming. For example, in Python one can import either csv or pandas module to open, edit and create tables in the CSV format. In Java one may use CSVReader, and in C Sharp there is a CsvHelper module. It's also possible to work with CSV files in C++ and PHP and some other programming languages.
So, now we know how to work with CSV files. But what about the other similar TSV format? Let's take a look at it.
What is TSV
TSV is the second most popular delimited file format, which is the reason why it got its name and even its .tsv extension. It stands for tab-separated values. So, TSV also allows you to store tabular data, but it provides another delimiter. Each record in the table is a line of a text file. Each field of the record is separated from the others by a tab character, more precisely a horizontal tab. It is so because the comma is quite common in text data, in the spelling of numbers according to some national standards, in contrast to the tabulation. Thus, it does not need to escape the comma in the middle of the values.
Visually, the TSV file looks exactly the same as the CSV file. So we can open a TSV file with the same spreadsheets and database management systems as CSV.
Why CSV
CSV and TSV files are often used for data interchange between software with different internal file formats. It is especially useful in business tasks. More specifically, one can move data between different vendor implementations of spreadsheets, databases, and data from websites such as banking transactions.
That's why some choose CSV. Moreover, the CSV format has several notable advantages. CSV files:
- Provide a straightforward information schema;
- Can be viewed and edited even in a text editor;
- Are human-readable;
- Are simple to create and parse;
- Are compact.
On the other hand, there are some disadvantages:
- There can be only a single sheet in a CSV file;
- They keep only raw data, no macros, and formulas;
- No distinction between text and numeric values;
- No standard way to represent binary data;
- Problems with the distinction between null values and empty strings.
And now you know, what CSV is for and what advantages and disadvantages this format has.
Conclusion
To sum up,
- One can store tabular data in CSV and TSV formats;
- CSV stands for comma-separated values and TSV means tab-separated values;
- These file formats are supported by almost all spreadsheets and database management systems;
- CSV also denotes the more general concept of "delimited data", where the delimiter can be almost anything.
So, are you up for some challenge? Proceed to the tasks and see how well you understood this topic!